On NFL Sundays, I find my attention split pretty evenly between DirecTV's Red Zone channel, and the Advanced NFL Stats win probability graphs. If it were up to me, the win probability graphs would be a permanently scrolling sidebar on Red Zone, and Fox and CBS would incorporate them into their HUD displays used for broadcast. Although I imagine networks probably aren't interested in making it plain to their viewers when a game is effectively over.
In the event I have been tricked somehow into going outdoors on Sunday, I find the mobile version the best way to get a status update on the day's games. Not only does it tell you what the current scores are, you also get a very data-rich snapshot of how each game has progressed.
For other sports, there's the Fangraphs live scoreboard for baseball fans, with play by play win probabilities updated realtime. And Ken Pomeroy (of kenpom.com) produces win probability graphs for men's college basketball (I'm not sure if they are updated real time though as they are behind his paywall). But as far as I know, there are no corresponding resources for the NBA. Brian Burke took a stab at it several years back, but it was a one-off for the 2009 playoffs only.
NBA Win Probability
While still a work in progress in many ways, I've got enough of the pieces in place to start sharing results. Using play by play data going back to 2004, I built an in-game win probability for the NBA that is a function of the following:
- Score Difference
- Time Remaining
The approach I took was a locally weighted logistic regression, with R's locfit package doing most of the heavy lifting. It's effectively a modification of the LOESS technique, but using logistic regression instead of standard least squares regression (logistic regression is more appropriate for when you're modeling probabilities). Here's a sample of the model output. The graph below shows win probability for a team with possession, with one line representing being up by 3 points, and the second being down by 3 points. The lines represent the model, and the scatter points represent actual win percentages (bucketed in 15 second increments of game time).
As I mentioned above, this is still a work in progress, evidenced by the lack of smoothness in the model lines, but overall, this looks like a reasonable fit.
But wait, there's more
I was able to throw one additional variable into the mix: the Vegas point spread. This allows me to build probabilities that properly reflect differences in team strength. In other words, a heavy favorite won't start the game at a 0.50 win probability. I believe the kenpom win probability graphs also factor in team strength (but both Advanced NFL Stats and Fangraphs use a more generic model that assumes an initial 0.50 win probability, regardless of the matchup).
The graph below shows how win probability differs as a function of point spread. Both lines represent win probability when up by 3, with possession. But one line is for a 5 point favorite and the other is for a 5 point underdog.
As one would expect, the probabilities rapidly converge as the end of the game approaches. Interesting to note that a 5 point underdog that finds itself ahead by 3 points at the half is still more likely than not to lose (win probability = 45%).
Putting it all together
With a model that seems to be generating reasonable results, I can now use it to visualize any NBA game. As an example, see below for an interactive visualization of Game 1 of the Eastern Conference Finals, between the Heat and the Pacers. There are sliders at the bottom of the chart that allow you to zoom in on any section of the game. Owing to the alternating possession format of NBA games, the win probabilities have a lot of "jiggle" to them, so a zoom feature is practically a necessity.
Eastern Conference Finals - Game 1 - Indiana Pacers 102, Miami Heat 103
Game 1 of the Heat-Pacers series was one of the best of the playoffs so far, with Paul George's desperation three at the end of regulation sending the game to overtime (worth +33% in win probability). In overtime, it was Paul George again, hitting three clutch free throws to give the Pacers a 1 point lead with 2 seconds left (worth +65% in win probability). And finally, it was Lebron James with a layup at the buzzer to give the Heat the victory (worth +73% in win probability).
Note that the game probability starts out at 79% (the Heat were 8 point favorites). The moneyline for the game was +400 for the Pacers and -450 for the Heat, which implies an 81% win probability, in line with my model's regression estimate.
You can mouse over the graphs to get the situational details, as well as the win probability change of that play (in parentheses following the win probability). There is also a scoring margin graph below the win probability graph, for a more conventional view of the game.
I plan on publishing the visualization above for each game of the Heat-Spurs series, with some added metrics. For example, I can borrow the Advanced NFL Stats concepts of Excitement Index and Comeback factor. I can also highlight what the "top plays" were in the game, as defined by their impact on win probability. Beyond that, I hope to compile a database for pubic use where you can pull this visualization for any NBA game, going back several seasons.
There's about 20 other things I would like to do with this tool (including modeling in-game cover probability) that I may or may not be able to get to. Sharing these realtime during the game would be great as well, but I don't even know where to start to make that happen. As always, comments or suggestions are welcome. And check back for recaps of each game of the Finals.