Friday, February 6, 2015

Updated NBA Win Probability Calculator

The odds of winning a game when down by 6
with 18 seconds left are approximately 250 to 1.
Last month, I rolled out a new version of my NBA Win Probability Graphs and Box Scores (new link | old link). In addition to adding some new features, such as the option of displaying real time along the horizontal axis, the underlying win probability model was rebuilt as well. The dataset was updated and expanded, model parameters were further optimized, and handling of late game situations was improved, particularly in the final seconds.

Until now, that new model was only used to generate the graphs. The interactive Win Probability Calculator was still using the old model. The calculator tool has now been updated with the new, improved model. I have also removed the "Beta" tag that had been there since its inception.

But how do I know the model is improved, and not just new? One way to assess a probability model's accuracy is by measuring log-likelihood. Likelihood, in this context, signifies the probability the model assigned to any specific game outcome. For example, if the model says that the win probability for a team is 15%, and the team actually goes on to win the game, the likelihood is 0.15. If the team lost, the likelihood was 85%. We can do this calculation for all game situations in which the model estimates a win probability. The total likelihood is just the product of all of those individual likelihoods. As a mathematical convenience, one often takes the natural logarithm of that product.

A higher likelihood signifies a better model. A model that routinely assigns 1% win probability to teams that ultimately win will generate low likelihood scores. A model that routinely assigns 95% win probability to teams that actually win will have high likelihood scores. It rewards confident, accurate predictions, while penalizing overconfident, inaccurate predictions.

To apply this test, I took all games from the current NBA season (through January 7, 2015) and had each model assign a likelihood score to each play. The old model was developed from 2004-2011 season data. The new model is based on games from 2000-2012. The 2014 season data is out of sample for both, so this should be a fair, unbiased test. Log likelihood has its own version of an R-squared measure (I'm using the McFadden version). The old model had an R-squared of 0.281 against the 2014 season data. The new model showed modest improvement, with an R-squared of 0.285.

The graph below shows average log likelihood improvement from the new model by minute of game time.

As you can see, there was improvement across the board, with just a few steps back. I think (hope) the dips shown for minutes 41 and 47 are just noise.


  • Data: Play by play data for nearly all NBA games from the 2000 to 2012 seasons. I merged that data with point spreads from
  • Model inputs: The win probability is a function of game time, point differential, possession, and the Vegas point spread. I use the Vegas point spread to control for differences in team strength that could bias the results. The calculator tool returns probability assuming the two teams are evenly matched (a 0 point spread). While not available in the calculator tool, the point spread adjusted probabilities are available as an option in the new win probability graphs.
  • Modeling approach: Locally weighted logistic regression (with the assistance of R's locfit package). It is an extension of the more common LOESS methodology to logistic regression. Logistic regression is more appropriate for modeling probabilities. The smoothing window was calibrated via cross validation. The optimal smoothing window shrank as time remaining in the game approached zero. For the final few seconds of game time, I abandoned regression entirely and built a simple decision tree to calculate the probabilities.
  • Non-possession states: With basketball, there are game situations that don't qualify as "pure" possession states. For example, after a missed shot, but before the rebound. Or if a team has two free throws to shoot off of a personal foul. To calculate those probabilities, rather than building separate regression models, I derive them from the base "pure" possession model, and applied some simple assumptions. For example, on average, missed shots are rebounded by the defense 69.5% of the time. So, the win probability after a missed shot is just 0.695 x the win probability of the team with possession plus 0.305 times the win probability if the team's opponent has possession. See below for the parameters used to derived win probabilities for these interstitial states:
    • Probability of:
      • Defensive rebound (off of field goal): 69.5%
      • Offensive rebound (off of field goal): 30.5%
      • Defensive rebound (off of free throw): 86.6%
      • Offensive rebound (off of free throw): 13.4%
      • Made free throw: 75.6%
      • Missed free throw: 24.4%


  1. Hey Michael, cool stuff. I did something similar for soccer, thought you'd be interested:

    1. Hi Ford. I'm a fan of your work. I actually built my own soccer calculator last year for the World Cup. I used your model as a reality check on mine.
      Blog post

  2. The advice in this article has been carefully compiled from sources considered to be reputable, but it's truth is not guaranteed. Use it at your own personal risk. There's risk of loss in all trading. Past performance is not necessarily indicative of future results. Dealers should read The Option Disclosure Statement before trading options and should understand the hazards in option adjusted spread trading, such as the truth that any time an option is sold, there is an endless risk of loss, and when an option is bought, the whole premium is at risk.

  3. UFC puts out TUF 26 throwing call for three weight classes. The UFC issued a throwing call Wednesday for contenders in three weight classes for the most recent period of the FS1-communicated arrangement: ladies. 'TUF 26' tryouts slated for May 23; male and female UFC hopefuls. UFC on FOX 24's "Jacare" Souza won't down from unsafe battles. RIO DE JANEIRO, Brazil – Ronaldo Souza wasn't shocked when the UFC reacted to his demand for a battle with the name of rising.
    UFC 210
    UFC 210 Live , UFC 210 Fight Card , UFC 210 Fight , UFC 210 Card , UFC 210 Live Stream , UFC 210 PPV

    Two years into UFC title rule, Daniel Cormier as yet battling for regard Meanwhile, after a long keep running as the prevalent belt in the UFC, light ... Daniel Cormier, the current UFC light heavyweight champion, is set to. UFC 210 early say something comes about and live video stream (9 am ET Predictions for each battle on the UFC 210: Cormier versus Johnson 2. UFC 210 Odds And Best Bets For The Biggest Fights And Bonuses.
    UFC 210
    UFC 210 Live , UFC 210 Fight Card , UFC 210 Fight , UFC 210 Card , UFC 210 Live Stream , UFC 210 PPV