I'll be running a live model of the final outcome in each statewide race based on partial vote counts as they come in. Doing so on this year's election could be very interesting given the nature of high volume mail-in voting during the pandemic. I am hoping that the models can give us better insight on election night as to what the reporting results say about the final outcome than the raw vote counts can, and I'm publishing the model results both via the interactive web app below and on my Twitch stream.
What each model does is that it looks at the votes being reporting alongside the political geography in each state to provide its best estimate of the final outcome. The nature of this model is similar to the New York Times Election Needle. However, my approach is a bit different. Instead of having a single model with a confidence interval of the final outcome, I try the spaghetti model approach, similar to how a hurricane forecast works. Four models will be running, each using separate assumptions and looking at the data in a different way, and one is not fundamentally more correct than another. I strongly caution against reading too much into what the models are guessing until different parts of the state are reporting Each model has its strengths and weaknesses, but eventually, they should converge and stabilize as more votes come in. When there are sufficient samples across different political geographies of a state and the models converge, then we can become more confident of the final outcome.
Live forecasts of the 2020 final election outcome with commentary begins at 18:00 EST on November 3, 2020, on Twitch. Until then, enjoy the pre-election forecasts!
In the model, light represents 'likely', moderate represents 'high probability', and dark represents 'called'. All times EST.
The election analysis uses the following terminology:
In the color scheme, light represents 'likely', moderate represents 'high probability', and dark represents 'called'.
For reference, this map shows the pre-election ratings in the Electoral College.
In terms of methodology, this model essentially guesses the final results of the election based on the geographic distribution of results that are reporting so far and the historical partisan lean of that region. Because it analyzes historical partisan lean, it only works on elections for which there are only two major candidates: one Democrat and one Republican. The model is fundamentally biased for historical results, so there's a reasonable margin of error of about a point or so. The margin of error is typically higher when:
An example of a substantially different turnout pattern that causes the model to be off is the Texas Senate election in 2018, when Beto O'Rourke's candidacy caused a surge in Democratic turnout, while Texas also reported a lot of early vote that favored O'Rourke.
This models are currently being run for statewide elections in all states with the exception of Alaska. The following map shows how accurately we anticipate the models to perform:
This year, I'm running four models:
These models all walk a balance between variance and bias, according to the bias-variance tradeoff that governs statistical modeling. These models are designed so that when one model suffers from high variance but has low bias, another suffers from high bias but has low variance, so looking at all of the models together will mitigate the problems present in one model alone. Both type A and B models are guaranteed to converge once all votes have been counted, but type 1 and 2 models are not. In fact, only type 2 models are guaranteed to converge to the final outcome. Additionally, type 2 models are seeded with an initial bias to stabilize the model during initial reports. That bias is devised from pre-election polls with an underestimate the expected winner, which means that most battleground races are seeded with a bias of 0.
Addendum about mail-in voting. Of course, this year is 2020, and the coronavirus pandemic is upending everything, even how accurate the models will be on election night. This year, a large number of absentee ballots are expected, and polls suggest that absentee ballots will tend to favor Democratic candidates. Hence, if a pattern arises where the in-person votes are reported first, and mail-in votes are reported later, then we could have a situation where the Republicans have an early lead in the results reported. This is not something that we can easily detect and correct for with a model, because we do not know which votes being reported are mail-in or in-person, and we do not have historical data to estimate how much of that mail-in vote will be Democratic. My models are good for breaking down the results geographically, but not for mail-in or early voting. Something similar happened in Arizona in 2018, when my models were unable to detect a surge in late mail-in ballots for the Democrats based on geography alone. However, given the urban-rural split we are seeing between Democrats and Republicans, we may be able to forecast the final outcome a bit better than the raw vote count, but it will not be foolproof. Hence, I am expecting that the models perform a bit more poorly than we would otherwise during a typical election cycle. That being said, I do expect that the models will be a better indication of the final outcome than the raw vote count.