The Political Data Nerd: 2020

Monday, October 5, 2020

The DeSart & Holbrook 2020 Presidential Election Forecast

For the past 5 presidential elections, Tom Holbrook and I have been generating forecasts of the national popular vote and Electoral College vote using a model that we developed in 1999. Applying that same model to the data from 2020, we are projecting that Joe Biden will win a majority of the national two-party popular vote by a fairly wide margin of 54.4 to 45.6 over Donald Trump. It also suggests that Biden will win 358 Electoral Votes to Trump's 180.

Our model uses state- and national-level polling in the month of September, along with each state's electoral history, to generate a prediction for each state's outcome. In addition, we can use this data to calculate a probability for each candidate on how likely it is that they will win each state. We can then extrapolate those predictions up to the national-level to project the popular vote and Electoral College vote outcomes a month in advance of the election.

Using this method, we have correctly predicted the winner of the national popular vote in each election from 2000 to 2016. Our track-record for our Electoral College projections is a little more mixed. We incorrectly projected that Al Gore and Hillary Clinton would win in 2000 and 2016, respectively. On the other hand, we went 51 for 51 in predicting the winner of all 50 states and the District of Columbia in 2012.

In 2016, our model projected that Hillary Clinton would win 52.05% of the national two-party popular vote, an error of just .95. Our Electoral College projection estimated that Clinton would defeat Donald Trump 326 to 212. We got five states wrong: Florida, Michigan, Ohio, Pennsylvania, and Wisconsin. It's notable to point out that our model gave Hillary Clinton less than a 90% chance of winning of each of those states. It's important to point that out because our model has never incorrectly predicted a state where it predicted a greater than 90% chance that it will be won by a candidate.

This is particularly relevant, because as you can see from the figure of Predicted Win Probabilities below, there are a number of states that the model predicts Joe Biden has a greater than 90% chance of winning. More importantly, the number of Electoral Votes associated with these states totals 279, nine more than a candidate needs in order to win the election. This suggests that even if Biden were to lose every other state, he would still win an Electoral College majority.

Of note is the location of the "tipping point" state. This is the state where, when all states are arranged in order of their win probabilities, either candidate would achieve an Electoral College majority. That state is Pennsylvania. The problem for Trump is that it is located well within the Biden column. The model suggests that Biden has a 93.5% chance of winning Pennsylvania. To win the election, Donald Trump is going to have to win seven states that the model suggests Joe Biden has a better than 50-50 chance of winning, two of which are over 90%. That's not an impossible task, but it just doesn't seem likely.

Simulated Election Outcomes

As I mentioned above, we are able to take each of the model's predicted state-level outcomes and extrapolate national-level outcomes from them. We can project the national popular vote by calculating a national popular vote total by taking each state's predicted outcome, weighting it by its contribution to the total national vote in the previous election, and summing it up. That's how we derived the national popular vote projection of 54.4% for Biden.

We derive the Electoral College vote by simply awarding a state's Electoral Vote on the basis of the model's point estimates. One thing new this year, is that we are also incorporating projections for the Electoral Votes tied to the Maine and Nebraska Congressional Districts. Doing this yields the Electoral College map pictured below.

You can see that the model suggests a split result in Nebraska. While Trump is the clear favorite in the statewide vote, as well as in the First and Third Congressional District, the model indicates that Biden has a 74% chance of winning the Second Congressional District just as he and Barack Obama did in 2008. On the other hand, the model suggests that Biden will win all four Electoral Votes in Maine, denying Trump the Electoral Vote associated with the the Second District that he won in 2016.

When we factor in the the uncertainty of the model, we can create confidence intervals around these projections. We run simulated elections while allowing each state to randomly vary around the standard error of the estimate, and aggregate the outcome in 100,000 simulated elections. Doing this yields the frequency distributions displayed below.

As you can see, Donald Trump does not win a majority of the national two party popular vote in any of the 100,000 simulations. 95% of Biden's outcomes fall within a range of 53.5 and 55.4%, so that serves as our Confidence Interval. When we expand the level of confidence out to 99%, the interval ranges from 53.1 to 55.8.

The distribution of Electoral College outcomes reveals the only ray of hope for Trump, but even so, the ray is very dim. The most frequently occurring outcome is the one based on the point estimates of the model: Biden over Trump 358-180. However, given the uneven relationship between the popular vote and the Electoral College Vote, the average outcome is 338-200. 95% of the outcomes fall within a range where Biden wins between 291 and 384 Electoral Votes. 99% fall between 279 and 401.

In other words, there is a greater than 99% chance based on this analysis, that Joe Biden will win and Electoral College majority and win the election. There is a small collection of outcomes where Donald Trump manages to win an Electoral College majority. In only 105 of the 100,000 simulations does Donald Trump win re-election, but it would entail a repeat of the 2016 election where he wins the Electoral College but fails to win the popular vote.

But what about 2016?

It's legitimate to question this prediction given that we, and a lot of other forecasters, missed the mark with our model in 2016. But there are some key substantive differences between 2016 and 2020 that leads us to have a bit more confidence in this forecast despite the 2016 misfire.

First, the lead that Joe Biden has in the national polls is substantively different than the lead that Hillary Clinton had in 2016. Despite the widespread perception that "polls are broken" after what happened in 2016, national polls were not really as inaccurate as people think they were. In September 2016, Hillary Clinton had an average two-party share of just 51.9% in national polls. This is remarkably close to the actual result. She ultimately won 51.1% of the two-party popular vote.

In contrast, Joe Biden's average share of the national polls in September was 53.8%, considerably higher than that of Clinton in 2016. The polls have been remarkably stable over the past 12 months. Biden has consistently led Trump since October of last year. Over that time, Biden's monthly average two-party share in the polls has never dropped below 52.3% Given that, it seems highly unlikely that this lead will simply evaporate in the campaign's final weeks.

Of course, as we learned in 2016, it's what happens in the states that really matters when it comes to the deciding the Electoral College outcomes. It was the state-level polling that had the biggest issues in 2016, missing the mark in key states that ultimately tipped the balance in favor of Donald Trump.

Here, again, the situation is different than it was in 2016. The figure below shows the comparison of average September poll results for each of the 50 states. In general, there has been an average shift towards Biden of a little over 2% across the states.

Over the past five elections, without exception, when a candidate has a statistically significant lead in a poll in a state (ie, the lead is beyond the margin of error) for the month of September, they end up winning that state.

The table below dives a little further into this comparison of the polling in 2016 with that of 2020. It shows how the September state polls compared to the eventual outcome. Generally speaking, you can see evidence of what I mentioned above: September polls in 2016 actually did a reasonably good job of telling us what was going to happen in November, even at the state-level. To be sure, there were polls in key states that ended up over-estimating Clinton's support, but her lead in those states was not statistically significant.

Simply put, we could not be confident that she was actually leading in those states given the margin of error, so it should not have really been a surprise that she did not win those states. Most important is the fact that the disposition of many of these states this year is different than they were in 2016. I have marked those states with an arrow showing how they've shifted.

All 12 states that have shifted since 2016 have moved away from Trump and towards Biden. Trump does not have statistically significant leads in two states that were statistical locks for him in 2016: Alaska and Texas. That doesn't mean he will lose those states, but it suggests that his position there, at least according to the polls, isn't as firm as it was four years ago.

Four states where Trump held statistically insignificant leads in 2016 have shifted towards Biden as well: Arizona, Georgia, Nevada, and Ohio. Biden holds slight leads in all four. Again, we can't say with any confidence that Biden will necessarily win those states simply based on these polls, but it is indicative of the general shift away from Trump compared to 2016.

Most relevant are the six states that have moved from being states that Clinton held insignificant leads in 2016 to states where, in 2020, Biden has a lead that is beyond the margin of error: Colorado, Maine, Michigan, Minnesota, New Hampshire, and Virginia. As I've stated above, in every election we've looked at going back to 2000, a candidate goes on to win a state where they hold a statistically significant lead in September.

The states where Biden holds statistically significant leads account for a total of 240 Electoral Votes, meaning he only needs to find 30 more in order to win a majority. A combination of just two or three of the eight states where he holds slight leads is all he needs to get him across the finish line. At this point it seems improbable, but not impossible, that Trump could win re-election. The map, and the context, looks considerably more difficult for him than it was in 2016.

If 2016 taught us anything, however, it's that you shouldn't take anything for granted. When the results come in next month, I'll be there looking at the numbers, because that's what a nerd does.

Monday, July 13, 2020

Long-Range Election Forecast - July 2020

For the first time this election cycle, my Long-Range Election Forecast model projects an Electoral College margin of victory for Joe Biden that is beyond the 95% confidence interval. It suggests that Biden will win anywhere between 286 and 396 Electoral Votes. The most likely outcome has him winning around 334 Electoral Votes.

The model looks at the previous election outcome in each of the 50 states and projects an outcome for them based on three additional factors:

Monthly national head-to-head polling between the two major party candidates
Home State Advantage
Regime age (i.e. how long the incumbent party has held the White House)

I can then extrapolate the projected state-level results up to the national-level for both the popular vote (by taking a turnout-weighted sum of each state's contribution to the national-level total) and the Electoral College total (by awarding each state's Electoral College votes to the candidate that the model projects will win it).

It is also possible to calculate a win-probability for each state based on the model's projected outcomes. Not all states are equal. Some states are just more likely to be won by one party's candidate over another. Simply put, it's highly likely that a Democrat will win states like California, Massachusetts, and Illinois, and that a Republican will win states like Wyoming, North Dakota, and Alabama. Other states are more likely to be considered battleground states like Florida, Arizona and Ohio.

Using these win-probabilities, I can run simulated elections to see who will win and who will lose, while taking into account a certain amount of uncertainty in the state-level outcomes. I had the computer do that 100,000 times. I present the results of that analysis in the table below. In those 100,000 simulated elections, Joe Biden won the national popular vote in every single one of them, and the Electoral College in 99,503 times. Donald Trump's 400+ victories all came as a result of an Electoral College misfire, just as it did in 2016.

Beyond the election simulations, the key statistic for me is what can be called the "tipping point state." This is the state that, when you arrange them all based on their likelihood that they'll be won by one candidate over the other, will be the one that puts either one over the top and give them the 270 votes needed in the Electoral College in order to win a majority.

The model suggests, as can be seen in the figure below, that Wisconsin is that state. Right now, the model suggests that Joe Biden has an 82.1% chance of winning Wisconsin. From a strictly statistical perspective, that's not high enough to be "certain." Us statistics nerds like to see a likelihood above 90-95% for us to have confidence. But it certainly is suggesting that it is more likely than not to happen.

For comparison sake, at this point in 2016 this model was projecting a very close election, but gave Trump a 65.9% chance of winning the Electoral College, which of course he did. This was in spite of the fact that Hillary Clinton held substantial leads in the national head-to-head polling in June of 2016.

The context of 2020 is substantively different than that of 2016. You don't have a party attempting to hold on to the White House for a third consecutive term, which is notoriously difficult to do. Instead, we've got an embattled President running for re-election after winning a close election four years earlier, and doing so with a discrepant outcome between the national popular vote and Electoral College.

Of course something could happen between now and November that could alter the trajectory of the race. We saw that in 2016. Until that happens, however, this model suggests that it doesn't seem likely that Donald Trump will be successful in winning a second term.

But if and when that happens, I'll be there looking at the numbers, because that's what a nerd does.

Sunday, March 15, 2020

The Long Difficult Road Ahead for Sanders

If 2016 taught us anything it would be to never say "Never." There's always a chance that expectations could miss the mark, and the unexpected event could happen. But sometimes things seem so unlikely that it's difficult to not want to use that word. That's sort of the situation Bernie Sanders finds himself in right now. Based on my analysis of the outcomes of the primaries and caucuses so far, it really doesn't seem likely that he is going to become the Democratic nominee.

Primaries and caucuses are actually fairly predictable events once you have a frame of reference to go by. We're now almost halfway through the nomination season and so we've had ample opportunity to observe how the candidates have performed in a variety of states. If we know the characteristics of the states that were relevant in explaining the results, we can attempt to use those characteristics to help us project what will happen in the races yet to come.

For example, one very important characteristic that can explain how well Biden and Sanders performed in each of the two dozen states that have had their contests to date is their level of racial diversity, specifically the percentage of their population that is African-American. As the figure below shows, a state's percentage of black population has almost exactly the opposite relationship between how well Joe Biden performed in its contest and how well Bernie Sanders fared. As the figure below shows, Biden has done substantially better in the more racially diverse states, while Bernie Sanders did substantially worse.

If that pattern remains the same for the remaining contests, and there's no reason at this point to expect that it won't, that alone is bad news for Sanders.

To understand why this is a problem for Sanders, all you have to do is take a look at the chart above again and note the location of the intersection between the two lines. Once a state's black population reaches 10%, Sanders hasn't been able to win. His only victories have come when the state's black population has been eight percent or below.

Making matters worse for Sanders is that most of the remaining delegates left to claim are also in states where the percentage of the population is 10% or above. There are 26 states left that have not yet held their delegate selection contests. 27 if you include the District of Columbia. All told, those states and DC represent 1949 delegates left to claim.

Of those 1949 remaining delegates, over two-thirds (1347) of them are in states where African-Americans make up more than 10% of the state's overall population. Unless Sanders can break through and raise his appeal among that segment of the population, it will be all but impossible for him to overtake Biden's growing delegate lead.

One place where Sanders has done quite well up to this point has been in caucus states, just as he had done in 2016. Caucuses, as opposed to primaries, tend to have lower levels of participation and that typically benefits candidates who have very dedicated followers. So far in 2020 Sanders, on average, yielded 9% more of the vote in Caucus states than he did in primary states. But even here, he has a problem. Of the remaining states left to hold their contests, only one of them is a caucus state, Wyoming, and there just aren't many delegates at stake there.

He has also tended to do better in open primaries where participation is open to anyone regardless of their party registration. All other things being equal, when participation is limited to only those who are registered as Democrats as it is in a closed primary, Sanders averages 9% less of the vote compared to primaries where anyone is allowed to participate.

Here, once again, Sanders has a problem. Of the 26 remaining states yet to hold their delegate selection contests, only 5 are open primaries. The rest limit participation based on party registration, and he has fared less well in those.

Finally, he also did much better when there were more candidates in the running. He garnered more votes and delegates when he was up against a larger, more divided field. Now that the list of contenders has winnowed its way down to him and Biden (again, sorry Tulsi) it has meant he's had a tougher time growing his vote share as more and more Democratic voters have coalesced around the former Vice President.

What this all translates into is that the remaining contests present a rather daunting gauntlet for Sanders to clear through. Most of the contests that presented favorable conditions for him have already passed leaving him not much left to work with. Biden's growing lead is quickly becoming insurmountable. As the Figure below suggests, the South Carolina primary on February 29th, was a key turning point early in the race, and since then Biden has begun to amass a growing delegate lead over Sanders.

When putting together all of these important state characteristics into an explanation of the results of the contests so far, and then using that explanation to try and predict what will happening the remaining contests, it does not present a very rosy picture for Bernie Sanders. The figure above shows the projection of how the race will turn out without some substantial shift in the dynamics of the campaign.

This figure shows how Biden's and Sanders' delegate totals are expected to grow throughout the remainder of the nomination season. The most notable takeaway from this is that the gap between Biden and Sanders never closes. It just keeps growing. This clearly suggests that Biden has the momentum and unless something significant happens, Sanders will not be able to stop him.

These projections also suggest that Biden will win a majority of the pledged delegates much earlier than Hillary Clinton did in 2016. Four years ago, Clinton was not able to lock up a majority of pledged delegates until the California primary took place on June 7. This, by the way, was the exact date my projections four years ago predicted using this same forecasting method. When all was said and done, the model missed Clinton's (and, consequently Sanders') final delegate total by just 16 delegates.

This time around, this model suggests that Biden will reach the threshold of a majority of pledged delegates (1991) on May 2nd. After big projected wins in the delegate-rich states of New York and Pennsylvania on April 28, the model suggests that he will be just shy of the threshold. It predicts that he will win the Kansas Primary on May 2, and that will put him over the top to become the presumptive nominee well before the the Convention in July.

Even accounting for the model's uncertainty doesn't really help Sanders. When I ran 100,000 simulations of the remaining contests using this model's estimates and factoring in its level of uncertainty, in none of those instances did Sanders' delegate count get over 1388, still well short of the 1991 he would need. On the other hand, running the same routine for Biden yielded majority outcomes in every single simulation. Simply put, unless something changes, this model suggests that there is a less than 1 in 100,000 chance that Sanders will be able to overcome Biden's lead in the remaining contests, almost assuring that the former Vice President will secure the nomination.

Of course, the dose of humility that 2016 fed to most of us election forecasters does give me some pause. There are several contests, and weeks, to go. There's always the possibility that something unforeseen could occur and fundamentally alter the trajectory of these projections. But the window of opportunity for something like that to happen is quickly closing for Sanders. Without something big happening in the next couple of weeks, it doesn't seem like there's much of a chance for Sanders to recover.

But if it does, I'll be there looking at the numbers, because that's what a nerd does.

Thursday, March 12, 2020

Long-Range 2020 Election Forecast - March Update

A little over five months ago, I posted a forecast of what would happen in the 2020 presidential election based on a long-range election forecast model that I developed back in 2015. That forecast suggested a strong possibility of Democratic victory in 2020, but it really depended upon who won the nomination.

That forecast indicated that Biden, Sanders, and Warren all stood the best chance of defeating Trump in November, but that Biden was best positioned. It was much less bullish on Harris and Buttigieg. Klobuchar and Bloomberg weren't even on the radar yet.

A lot has happened since November 2019, as often happens during the nomination phase of a Presidential Election cycle. The Democratic field has winnowed down to essentially two candidates, Joe Biden and Bernie Sanders. (Sorry, Tulsi) So, there's really two questions to ask at this point:

Has anything changed since November?
Does it matter which candidate the Democrats nominate?

The answer to the first question is simple: Not much. The only thing that has really changed is that it's a lot easier for me to generate the forecasts because now there's only two candidates for which I need to generate forecasts. Beyond that, the forecast hasn't shifted much since November. So the answer to the second question also seems to be "Not much."

Both Biden and Sanders are projected to win the popular vote with ease under this update. The only real shift is that the model is slightly more bullish on their chances at winning the Electoral College as well.

While the projected Electoral College vote totals are slightly higher for both Biden and Sanders than they were 5 months ago, the 95% confidence intervals still leave open the possibility for Trump to win reelection. Simply put, there's more than an insignificant chance that Donald Trump could win a second term. The interesting thing about that is that the model suggests that pretty much the only path to victory for Trump is through a repeat of 2016: an Electoral College misfire where the Democrat wins the popular vote but Trump wins at least 270 Electoral Votes and claims victory.

Using the model's point estimates for each of the state-level outcomes, and factoring in its level of prediction error, I ran a simulation of 100,000 elections for each potential pairing to get a range of possible outcomes. The breakdown of those 100,000 simulated elections is shown in the bottom half of the table above. In addition, the chart below presents how often each possible outcome appeared.

Out of the 100,000 simulated elections between Joe Biden and Donald Trump, Biden won both the popular vote and Electoral College vote 91.9% of the time. In none of those elections did the same happen for Trump. Every single one of the simulated elections where Trump defeated Biden in the Electoral College, it came when he lost the popular vote.

The outcomes were similar in the matchup between Bernie Sanders and Donald Trump, although there was a slightly lower occurrence of an outright Sanders victory (90.4%), and a slightly higher occurrence of an Electoral College misfire benefiting Trump. (8.4%). Regardless of which candidate the Democrats nominate, this model suggests that there is a very slight, (less than 1%) of an Electoral College 269-269 tie, necessitating a vote by the U.S. House of Representatives to determine the winner.

For comparison purposes, when I used this model in March of 2016 and generated a similar distribution of possible outcomes, the model projected an Electoral College victory for Donald Trump 66.3% of the time. Simply put, the model saw a Trump victory in 2016 as much more likely than a repeat performance in 2020.

We'll just have to wait another eight months to see how well this projection holds up. A lot can happen between now and then, and that could alter the context and trajectory of this race. I will be posting periodic updates to the forecast in that time.

So, stay tuned. In the meantime, I'll keep looking at the numbers, because that's what a nerd does.