The Political Data Nerd: 2019

A little over four years ago in October of 2015, I got up in front of a bunch of political scientists at a small conference in Iowa, the Iowa Conference on Presidential Politics, and suggested that there was a very strong possibility that Donald Trump could win the 2016 Presidential election. I showed the results of a new long-range presidential election forecast model that I had developed to generate a prediction a year in advance of the election.

It should perhaps come as no surprise that the prediction was met with a great deal of incredulity.

"Really? Donald Trump?! No way!"

I admit I was a bit skeptical myself. The brash, unpolished reality TV host with no experience? It did seem a bit incredible. But, it's what the numbers showed, and for as long as I've been doing election forecasting I've always just gone with what the numbers showed. Right or wrong, the numbers are what they are and all I do is report what they say.

But still, it just didn't seem likely at that point. Of course, looking back now we can clearly see that the skepticism was misplaced.

So, what does this model say about the next election? We are now one year away from the 2020 election, so who does it say will win?

The answer is: It depends.

It depends, in large part, on who the Democrats nominate to run against Donald Trump. But even then, the answer is far from certain. The main challenge of attempting to forecast an election outcome this far in advance of the election is that we don't even really know who the nominees will be at this point. This necessitates a model that can generate a set of conditional forecasts that present different possible match-ups of candidates and can show how the parties' choice of a nominee can effect the outcome.

At this point, the Democrat that the model suggests has the clearest path to victory over Donald Trump next year is Joe Biden, but that victory is not necessarily certain. The model also suggests that Bernie Sanders and Elizabeth Warren also have decent chances of defeating the president, but their victories are even less certain. For the remaining two candidates in the apparent "Top Five" of the Democratic field, Kamala Harris and Pete Buttigieg, the forecast is even murkier, with the model suggesting that it is more likely than not that they could meet the same fate as Hillary Clinton did in 2016 winning the popular vote, but failing to win in the Electoral College.

The forecasts are given in the table below:

You can see, all of the top five Democratic candidates are projected by the model to win the national popular vote, but only Biden, Sanders, and Warren have projected wins that are beyond the 95% confidence interval. The possibility that Buttigieg and Harris would lose the popular vote to Trump cannot be discounted. On the other hand, if we were to broaden the confidence interval to 99% confidence, Biden, Sanders and Warren all still have projections that are above the magic 50% threshold. Simply put, if one of those three end up winning the nomination, the model suggests it is highly likely any one of these three would win the national popular vote against Donald Trump.

Of course, we learned in 2016 that the national popular vote isn't what matters most in U.S. Presidential elections. Instead, it's the Electoral College vote that ultimately determines the winner. It's about putting together the right combination of states in order to get the required 270 Electoral College votes to win. As you can see, the model projects that the three candidates it suggests will win the popular vote will also win the Electoral College vote as well. For Buttigieg and Harris, the model suggests that a repeat of 2016, where one candidate wins the popular vote but loses the Electoral College, is likely should either one of them become the nominee. In fact, if you look in the bottom two rows of the table above, you can see that the model suggests that an Electoral College misfire favoring Trump is more likely than not for both of them.

However, even though Biden, Sanders and Warren have Electoral College projections that are above the required 270, the model is far less certain about their victory than it is of their popular vote wins. For all three, there is a distinct possibility of an Electoral College misfire, just to varying degrees. Biden fares the best in this situation. The model projects a clear Electoral College majority for him, while leaving a 11.8% chance that he could fall victim to the same fate Hillary Clinton met in 2016.

Sanders and Warren fare less well in that regard. Again, while the model projects that both would win an Electoral College majority in 2020, the possibility that they would fail to do so cannot be discounted. For Sanders, the likelihood that he would win the popular vote, but lose in the Electoral College is a little greater than one in four (27%). For Warren, it's a little greater than one in three (34.1%)

So, you might be asking, where do these numbers come from and, more importantly, can we trust them? Both very fair questions, and I'll try to answer them as best and as simply as I can.

Let's get nerdy.

The Model

These forecasts come from a very simple model, which I have presented at a few different conferences over the past four years, most recently at the American Political Science Association conference in Washington, DC this past Summer. I talked a little bit about this model in an earlier post, but you can actually download the paper and read it yourself here.

It generates predicted outcomes for each of the 50 states and the District of Columbia based on four variables: the state's result in the previous election, the match-up between the various candidates 13 months in advance of the election, a specification of home state advantage based on where the candidates come from, and a regime age variable that captures the notion that it is more difficult for a party to hold on to the White House the longer it is there. It is this last variable that was key in generating a prediction favorable to Donald Trump in 2015.

There is a very strong correlation from one election to the next in how the parties fare in each of the fifty states, as shown by the figure below:

There is one weird outlying case there on the left side of the chart. That's Utah. Trump fared much less well in Utah compared to Romney's performance there in 2012. When you think about that, it's pretty easy to understand why. As a member of the Church of Jesus Christ of Latter Day Saints, Romney's identification as a Mormon was a key topic of discussion during the campaign of 2012. And Utah just happens to have the highest concentration of LDS population in the country. Fast-forward to 2016, and there is widespread disenchantment in Utah about Trump as the Republican nominee. So much so that a local, Evan McMullin, declares his candidacy and manages to win 21% of the vote and significantly eats into the vote margin that a Republican candidate would likely win in Utah. If you take that one outlier out of the analysis, and the relationship between 2012 and 2016 state-level outcomes becomes even stronger.

This strong correlation is present going back for the past seven elections and serves as a solid foundation for the rest of the model. The remaining variables are intended to capture elements about the current election that make it different from the previous one: who the candidates are and how they match up against each other, what states those candidates are from which would affect a home-state advantage in one state more than others, and how long a party has occupied the White House.

Using those four variables, I can generate a prediction of the result for each of the 50 states and the District of Columbia. From there I can extrapolate the national-level outcomes. For the national popular vote, I calculate the weighted sum of the predicted state outcomes based on each of their contribution to the overall total. For the Electoral College, it is based on whether the model predicts a candidate will win the state or not.

I incorporate the uncertainty into the model by constructing confidence intervals around the state-level predictions based on the model's standard error of the estimate. Simply put, this is roughly a measure of how much the model "mispredicts" the actual state-level outcomes, on average. Currently, based on the past seven elections' worth of data, that average misprediction is about 3.2%.

Factoring that uncertainty into the prediction, I have the computer run a series of hypothetical elections. 100,000 of them to be precise, and I plot the outcomes. That's where the confidence intervals come from. So, to put the numbers in the table above into perspective, it's essentially saying that of the 100,000 hypothetical elections I had the computer run between Joe Biden and Donald Trump, 11.8% of them had outcomes where Joe Biden won the popular vote, but lost the in the Electoral College.

It's not shown in the table, but in three of those 100,000 hypothetical elections (.003%) the President defeats Biden in both the popular vote AND Electoral College. So, it's not impossible that Trump could defeat Biden outright, but the model suggests that it's highly unlikely.

Granted, we've got 12 long months to go until we get there and many things could happen between now and then. Ultimately, the main takeaway from this forecast is that the outcome is far from certain. So we'll have to do the only thing we can do... wait and see how this plays out.

And when it does, I'll be there, looking at the numbers, because that's what us nerds do.

The Political Data Nerd

Tuesday, November 5, 2019

Who Will Win in 2020? It Depends - The Long-Range State-Level Forecast