2020 Election Results Analysis

| President | Senate | Governors | Results |

I'll be running a live model of the final outcome in each statewide race based on partial vote counts as they come in. Doing so on this year's election could be very interesting given the nature of high volume mail-in voting during the pandemic. I am hoping that the models can give us better insight on election night as to what the reporting results say about the final outcome than the raw vote counts can, and I'm publishing the model results both via the interactive web app below and on my Twitch stream.

What each model does is that it looks at the votes being reporting alongside the political geography in each state to provide its best estimate of the final outcome. The nature of this model is similar to the New York Times Election Needle. However, my approach is a bit different. Instead of having a single model with a confidence interval of the final outcome, I try the spaghetti model approach, similar to how a hurricane forecast works. Four models will be running, each using separate assumptions and looking at the data in a different way, and one is not fundamentally more correct than another. I strongly caution against reading too much into what the models are guessing until different parts of the state are reporting Each model has its strengths and weaknesses, but eventually, they should converge and stabilize as more votes come in. When there are sufficient samples across different political geographies of a state and the models converge, then we can become more confident of the final outcome.

Reported Results Interactive

2020 President Results

2020 Senate Results

2020 Governors Results

2020 House Results

Results Live Stream (November 3 at 6:00 PM EST)

Live forecasts of the 2020 final election outcome with commentary begins at 18:00 EST on November 3, 2020, on Twitch. Until then, enjoy the pre-election forecasts!

In the model, light represents 'likely', moderate represents 'high probability', and dark represents 'called'. All times EST.

The election analysis uses the following terminology:

Likely means that based on exit polls, pre-election polling data, and initial reports, it is very likely that a candidate will win the race.
High probability means that based on our model, it is almost certain that a candidate will win the election, or that a candidate will win unless there's an extremely unlikely shift in the results being reported and that absentee and preliminary ballots shift the election outcome.
Called means that a major news organization, such as the Associated Press, has called the election for a candidate.

In the color scheme, light represents 'likely', moderate represents 'high probability', and dark represents 'called'.

For reference, this map shows the pre-election ratings in the Electoral College.

Live Model Methodology

In terms of methodology, this model essentially guesses the final results of the election based on the geographic distribution of results that are reporting so far and the historical partisan lean of that region. Because it analyzes historical partisan lean, it only works on elections for which there are only two major candidates: one Democrat and one Republican. The model is fundamentally biased for historical results, so there's a reasonable margin of error of about a point or so. The margin of error is typically higher when:

The pattern of voter turnout is substantially different from historical trends.
A realigning election occurs.
During special elections and off-year elections, when voter turnout tends to be abnormal.
There is a lot of early vote reported that is not representative of the whole.
There are still many mail-in, absentee, or preliminary ballots not counted.

An example of a substantially different turnout pattern that causes the model to be off is the Texas Senate election in 2018, when Beto O'Rourke's candidacy caused a surge in Democratic turnout, while Texas also reported a lot of early vote that favored O'Rourke.

This models are currently being run for statewide elections in all states with the exception of Alaska. The following map shows how accurately we anticipate the models to perform:

States in green are the states that we expect the models to perform up to expectations. However, the models can be susceptible to outstanding absentee or preliminary ballots that are not reported until the very end.
States in light green are the states that we expect the models to somewhat well or mostly well. These states tend to be:
- Small states with very few jurisdictions, which makes it hard to analyze, such as Delaware.
- States with a lot of early vote, which tends to throw off early reports, such as Florida, Georgia, Nevada, North Carolina, and Texas.
- States with a high percentage of votes in concentrated in very few jurisdictions, such as Illinois and New York.
- States with substantial vote by mail, which can take time for votes to arrive, such as Colorado, Oregon, and Washington.
States in yellow are the states that we expect the models to be the most susceptible to inaccuracy. These states have substantial vote by mail and take days to weeks for votes to be received and counted, and the late-reporting votes tend to sway election results decisively.
The state in red is Alaska, for which there is not a model.

This year, I'm running four models:

The type 1 models look at what votes have been counted.
The type 2 models look at what votes remain outstanding.
The type A models analyze what precincts are reporting.
The type B models analyze what votes are reporting.

These models all walk a balance between variance and bias, according to the bias-variance tradeoff that governs statistical modeling. These models are designed so that when one model suffers from high variance but has low bias, another suffers from high bias but has low variance, so looking at all of the models together will mitigate the problems present in one model alone. Both type A and B models are guaranteed to converge once all votes have been counted, but type 1 and 2 models are not. In fact, only type 2 models are guaranteed to converge to the final outcome. Additionally, type 2 models are seeded with an initial bias to stabilize the model during initial reports. That bias is devised from pre-election polls with an underestimate the expected winner, which means that most battleground races are seeded with a bias of 0.

Addendum about mail-in voting. Of course, this year is 2020, and the coronavirus pandemic is upending everything, even how accurate the models will be on election night. This year, a large number of absentee ballots are expected, and polls suggest that absentee ballots will tend to favor Democratic candidates. Hence, if a pattern arises where the in-person votes are reported first, and mail-in votes are reported later, then we could have a situation where the Republicans have an early lead in the results reported. This is not something that we can easily detect and correct for with a model, because we do not know which votes being reported are mail-in or in-person, and we do not have historical data to estimate how much of that mail-in vote will be Democratic. My models are good for breaking down the results geographically, but not for mail-in or early voting. Something similar happened in Arizona in 2018, when my models were unable to detect a surge in late mail-in ballots for the Democrats based on geography alone. However, given the urban-rural split we are seeing between Democrats and Republicans, we may be able to forecast the final outcome a bit better than the raw vote count, but it will not be foolproof. Hence, I am expecting that the models perform a bit more poorly than we would otherwise during a typical election cycle. That being said, I do expect that the models will be a better indication of the final outcome than the raw vote count.