Thursday, November 10, 2016

On the (Dis)Comfort of Numbers

I love numbers. I guess that's obvious given the title I chose for this blog.

Politics is a subject that is laden with subjectivity. People have their biases, and when they engage with the world they view it through the lens of their biases.  When there are political discussions, especially online, they often devolve into over-simplified presentations of the opposing point-of-view and discounting them as simply the product of blind bias.  That latter part may be true most of the time, but it isn't always.

But that's why, as a Political Scientist I take comfort in numbers. It's why my Political Analysis course is my favorite course to teach. It's why I started this blog to try and explain to people who don't work with numbers what they can tell us about the political world.

Numbers are solid. Numbers are certain. Numbers can tell you things you would not otherwise know. Numbers can provide a sense of confidence in a position, or course of action.

Unless, of course, the numbers are flawed.

And that certainly seems to be the case with the polls leading up to the election on Tuesday.

Yes, the polls got it wrong.

And as a consequence, election forecast models, like ours, that relied upon the polls got it wrong.

As it turns out, national polls carry very little weight in our model, but state polls are quite important. - TPDN


I'll leave it up to those inside of the polling industry to figure out how they got it wrong and why, and I have confidence that they will.  The allegations that I saw hurled in the run-up to the election that pollsters were biased and deliberately cooking the numbers to show Clinton had more support than she really did are, simply put, ill-informed.

But it would also be wrong to say that those of us who work with numbers are unbiased.

We have biases, but they're not necessarily the one that people accuse us of.

I've been forecasting elections for many years now. Ours was actually one of the first to do what many of the most well-known models do now: generate a state-by-state Electoral College prediction. We first used it to generate a forecast for the 2000 election.  Like this one, that was an election that most of us forecasters got wrong.

But that was a different time. Election forecasting was a purely academic exercise within a relatively small corner of Political Science. Even within the discipline it was viewed by many as not really "science." I had colleagues question whether a publication I had in which we first presented our forecast model should even be considered "research."

My response then, as it still is today, was that it most definitely is scientific research. It is an attempt to apply the principles we've learned from decades of empirical Political Science research in an applied way. We're testing the explanations of voter behavior that we've gotten from that research to see if we can then predict what voters can do. Explanation and prediction are at the core of what science is all about.

So, getting the 2000 forecast wrong was a learning experience. Personally, it was humbling. I'd gone out on a limb: I'd publicly said what I thought was going to happen, and it didn't.

But it happened in a relatively small space. I announced our forecast at a small gathering of students and faculty at the small and relatively unknown regional state university where I worked at the time. It was a relatively low-risk move. Humbling, but not very embarrassing.

Tom Holbrook and I saw it as an opportunity to go back to the drawing board, as did all of the other forecasters who got 2000 wrong.  It offered up more data for us to make our models better, and we did.  We took that data, revised the model, and used it to generate an extremely accurate forecast in 2004, missing the national popular vote by less than half a percentage point.  Even so, there were issues: We got three states wrong. So we went back to the drawing board again to make it better.

We got the next two elections right. In 2008, we under predicted Obama's level of support somewhat, and we got two states wrong. Indiana and Virginia.  That was a reasonable amount of error, we figured, but we took the data, updated the model, and moved forward.  In 2012, we were near perfect, getting all 50 states correct in the preliminary forecast based on September data, but getting Florida wrong in the final Election Eve forecast. (We felt somewhat vindicated by the fact that it took Florida a few days to finally decide). We missed the popular vote by just over 1% in the September model, but by just under that with our final projection.

Even so, we made a minor adjustment (which, as a side note, wasn't the reason why we got it wrong this time. In fact, we would probably have been off by even more had we not made that adjustment), and went into 2016 with a great deal of confidence. In our tests of the model and seeing how it would have worked if we'd used it for previous elections, it was our best yet.

That confidence, of course, proved to be unfounded.

We didn't just get it wrong. We got it wrong by a lot.

We over-predicted Clinton's share of the popular vote by 2% and got 5 states wrong in our September forecast. We were even worse than that with our Election Eve forecast, getting six states wrong and missing the popular vote by an even wider margin. When the final analysis comes down, I won't be surprised if ours ends up being the worst performing.

We were right back to where we were in 2000.  But, of course, things are different now than they were back then.  Election forecasting is no longer a quaint cottage industry in Political Science. Thanks to folks like Nate Silver popularizing it, it's now a much more public "game" and we've seen an explosion in the number of models all trying to predict the outcome.

Compared to 538, New York Times Upshot, The Daily Kos, The Princeton Election Consortium, and the Huffington Post, Tom and I are relatively unknown outside the little election forecasting community in Political Science. Even so, our miss was more public this time compared to 2000. There's this thing called the internet that is much more ubiquitous now than it was in 2000, and I worked to try and get our model noticed. (In retrospect, maybe I should have been more quiet) But even so, we've not experienced anywhere near the level of vitriol and ridicule that others have seen. Natalie Jackson, the Political Science Ph.D. at Huffington Post has had the bear a good deal of it and it just sickens me.

A Tweet to Natalie Jackson, forecaster at Huffington Post... WTH?
For the first time since 2000, I made a public presentation of our forecast on the day that it came out. It was to a somewhat larger group of students and faculty than the one I had in 2000, and the local press was there as well. I was on public record with our prediction more than with any other election.

The numbers failed us, and the failure was more public this time.

Our model relies very heavily on polling data, and the polls were quite a bit off this time. As I stated before, I'll leave the polling post-mortem to the pollsters. I've done polling, but I don't consider myself a pollster by any stretch of the imagination. So I'll let the people who do it figure it out, because they know better than I do about what went wrong and how to fix it.

For Tom and I, we'll adjust like we always do. Science moves on, and failures are an opportunity to learn and improve. I've already got some ideas of what we can do to make the model better because, as it turns out, I think the polling error we saw might actually be predictable for reasons I'll save for another post. I look forward to diving into the data and figuring it out.

But for me, right now, this failure is more personal. I had friends and colleagues who were, and are, incredibly worried about the outcome. They trusted me to tell them what was going to happen; to give them assurance that, to them, the "unthinkable" wasn't going to happen.

My bias in this is, and always has been, in getting the forecast right.  It's not ideological. I don't push a forecast because it predicts what I want it to predict. I push it because I have confidence in it.  The numbers showed a Clinton win, and they showed it pretty convincingly.  The reports are now that even those inside the Trump campaign were preparing for a loss because their internal numbers showed them the same thing.

But our numbers were more than just numbers to a lot of people.  They provided certainty and comfort. All of a sudden, I became much more than scientist working with numbers. I'd become a counselor, listening to people's fears and anxieties, and attempting to comfort them with the numbers I'd come to trust.

It was a role that I was not entirely comfortable with.  Not because I didn't have confidence in the numbers, but because... WHAT IF we were wrong? I was going to let a lot of people down. They were trusting me, but I knew I couldn't control the outcome. I could only tell them what the numbers said.

My discomfort reached a high point when Nate Silver's win probability started to diverge significantly from the rest of the pack a week or two before the election.  Perhaps unbeknownst to most outside the forecasting community and those who follow it closely, a rigorous debate ensued about methodological issues, and what the win probability represents, and what we should actually infer from it.

For the most part, I stayed out of the fray.  I had our numbers and I had confidence in them. But I really couldn't deny the methodological soundness of some of the arguments that Silver was making. But by that point, I was set in my role as Election Therapist and I knew that if I openly acknowledged the growing uncertainty, that was going to upset a lot of the people who had been relying upon me for comfort. I took on their anxiety for them.

That was really stupid.

I'd let biases get in the way of scientific objectivity. Still, though, the bias wasn't ideological. It wasn't because I wanted Clinton to win and, therefore, want to push that narrative. A good part of the bias was simply because I knew friends and family were going to be upset if she didn't. They were worried and I didn't want them to worry more.

But beyond that, and enabling it, was the bigger bias was towards the numbers and the confidence I had in them. I had a sound, empirical reason to believe our numbers were right: they'd been right before and they were numbers that were similar to those that many others had as well. I even privately consulted with other forecasters asking how they felt about it.  The shared my same level of concern that projections might be a bit too bold. But there, in the end, was validation.

I took comfort in the herd.

Of course, now we know that comfort was ill-placed, and hind-sight is 20-20.  But now that the dust has settled I can see that there were signs that I missed simply because I took comfort in the numbers of the herd.

You see, Tom's and my forecast isn't the only one I do. A year ago, I developed another one on my own. It's partly based on the one we use, but it was an attempt on my part to push the envelope as far as the forecast lead-time was concerned.  One of the drawbacks to our model is that it has a comparatively short lead-time: because of the limitations on the availability of the data for the variables we use, ours comes out just one month before the election.

To the consumers of popular election forecasting that's probably not that big of a deal, but to those within the academic forecasting community it is.  Most models in the political science and forecasting literature come out two or three months in advance, others are even longer.  So, in that regard, ours comes pretty late.  What I was really interested in was to test how far I could push it, so I started putting a model together in the Fall of 2015, and called it my "Long-Range" model.  It was pretty simple, but it was unique in that it did something no one else did: Generate a state-by-state Electoral College and Popular Vote forecast a year before the election, months before we even knew who the nominees were. If you're interested, you can look at it here: State Electoral Histories, Regime Age,and Long-Range Presidential Election Forecasts:Predicting the 2016 Presidential Election

I presented the first prediction from it, which I updated monthly after that, at a small conference at the end of October: The Iowa Conference of Presidential Politics.  The conference was so small that my presentation was made to an "audience" five people, all sitting around a study table in the Library Reference Room at Dordt College in Sauk Center, Iowa.  The audience was comprised of mostly other academics who were also there to present a paper, none of which were forecasting papers.

That first prediction showed a strong likelihood that the Republican candidate would win. Given that the nominations hadn't happened yet, I had to generate a matchup matrix of possible candidates.  Here is that first matrix:

You can see that it indicated potential trouble for the Democrats, no matter who they nominated. When you look at the matchup between the two eventual nominees, it showed a very close race, but one that Donald Trump had a 75% chance of winning.

The comments I got from the discussant and the rest of the audience were incredulous:  "Donald Trump?  Really? How could that be? He doesn't even have any experience," they said.  Similarly, I had colleagues who scoffed at the finding that Ben Carson had the best chance of winning.  Mind you, this was not projecting who would win the nomination, but just who would win if they did win the nomination. I was equally incredulous but I still, like many, figured it would likely be Bush or Rubio in the end.

Nonetheless, I continued to generate my monthly forecasts whittling down the matrix as candidates dropped out.  The final forecast that I presented at the American Political Science Association Conference in September gave Trump a 93.8% probability of winning.  It, too, was met with smirks.

Why? Because at that point the consensus had pretty much built up among forecasters that Clinton was going to win. Even those who had models that predicted a Trump victory hedged a bit, some offering explanations of why they were probably going to be wrong.  

I followed the comfort of the herd.

I had a strong methodological reason to do so. There is strong research that shows that the herd is usually right.  If my model was wildly out-of-step with the herd, which this certainly was, I was probably wrong, I figured.

So when the end of September came, I set aside the Long-Range Forecast and focused on the tried and true one that I had been using with Tom Holbrook for years, and on October 3 I announced to the University community and to the local press that Hillary Clinton had a 90% probability of winning the election.  I was comfortable with that.  I was well within the herd and my Democratic friends were relieved that I'd stopped saying that Trump was going to win.  Some of my Republican friends weren't happy, but they didn't really challenge it because I think many of them believed he wouldn't win either.  I think many resigned themselves to the idea that Clinton would win, and outcome expectation polls, which have been shown to be highly predictive of the outcome in the past confirmed that belief. Furthermore, let's face it, I live in Utah. This isn't really what one would consider "Trump Country." Most of them didn't want Trump to be President either.

But of course, the numbers, the expectations, and the herd proved to be wrong. The forecasters who got it right, notably Helmut Norpoth and Alan Lichtmann, preserved their decades long streaks of getting it right with their models. I'd been asked about them in the weeks leading up to the election. I dismissed them as outliers.  Their streaks were about to be broken, I said, because the data clearly pointed in another direction. If I'd had more faith in myself, and in the numbers from my Long-Range model, then I would be writing a very different post here.

But I take this now, as it should to all who rely on data, as a simple but important reminder: 

Sometimes the data, and the herd, can be wrong.  

Nate Silver was right. He got his forecast wrong like most of us did, but he was absolutely right to point out that there was a lot more uncertainty in the numbers than many of us were saying. It's something that those of us who work with data need to remember: the result is only as good as the data that goes into it. It is incumbent upon each of us who do this to hold onto a healthy respect for that fact as we go back to the drawing board, figure out what went wrong, fix it, and try again.

Just like we always do, because that's how science works. And leave the therapy work to therapists.

2 comments:

  1. Norpoth and Lichtmann didn't really get it right, strictly speaking. They both forecast Trump winning the popular vote, which he probably did not.

    ReplyDelete
    Replies
    1. The margin is close enough that once you put a confidence interval around his estimate he'll be "correct." And once 2016 data is incorporated into his model, I suspect in-sample projections may also show a "correct" result.

      Delete