## Saturday, December 28, 2013

### NBA Win Probability Graphs and Box Scores

 Box Score from Wilt Chamberlain's 100 Point game
I'm excited to announce a new feature to this site: NBA Win Probability Graphs and Box Scores for the 2013-2014 season. I first published these graphs for each game of the 2013 NBA Finals (relive the drama of Game Six...in chart form!). In November, I rolled out an NBA Win Probability Calculator tool that generates win probability as a function of game state (quarter, time remaining, margin, and possession). And over the past few months, I have worked on refining and enhancing my methodology, as well as building an easily accessible database to store win probabilities for all NBA games.

### Model Enhancements

But in order to fully roll out this latest feature, I needed a model with the ability to calculate win probability at a finer level of game state. For example, after a missed shot, but before the rebound. Or, when a player has been fouled and has been awarded two free throws. With this level of detail, I can properly apportion Win Probability Added (WPA) at a player level (the key feature of my box score).

For example, when a player misses a shot, it can either be rebounded by the offense (retain possession) or the defense (lose possession), with each outcome associated with a distinct win probability. Missed field goals are rebounded by the offense 36.0% of the time, and by the defense 64.0% of the time. So, the win probability of a missed field goal is a weighted average of the two possible outcomes (retain possession, lose possession), using a 36/64 weighting.

Missed free throws are modeled similarly, but with different weights (19/81 - it is harder for teams to get an offensive rebound off of a missed free throw). Note that, in general, missing the first of two free throws is more detrimental to a player's Win Probability Added, because of the lack of upside that exists when missing the second of two free throws (there's a 19% chance of getting an offensive rebound and continuing the possession).

### Win Probability Box Scores

To illustrate the various components of this new feature, I will use the December 20th game between the Brooklyn Nets and the Philadelphia 76ers. Here is a link to the Win Probability Box Score. This game featured two clutch shots in overtime, including Evan Turner's game winning nothing-but-rim-backboard-rim-net runner at the buzzer.

The top left of the box score is just the final score (home team on the bottom). At the top right are two numbers, labeled "Excitement" and "Comeback". I have borrowed these concepts wholesale from the Advanced NFL Stats Win Probability Graphs. They are defined as follows:
• Excitement - Measures how far the win probability graph "travels" over the course of a game. The Nets-76ers game featured a lot of back and forth swings, leading to a high Excitement Index of 12.6
• Comeback - For the winning team, this is the odds of winning at the team's lowest win probability. With 6 seconds to go in overtime, Alan Anderson of the Nets blocked Evan Turner's shot, with the Nets leading the 76ers by one point. At that point in the game, the 76ers win probability was just 17.7%, or 4.6 to 1.
Just below the final score and Excitement/Comeback numbers are two sections, labeled MVP and LVP. As one would expect, MVP stands for "Most Valuable Player" and denotes the player with the highest total Win Probability Added for the game (more on how Win Probability Added is calculated below). In our example, Paul Pierce was the MVP (in a losing effort), having added an impressive 78% in Win Probability on some very efficient shooting (9 field goal attempts, with a 100% eFG%). Nearly half of Pierce's WPA came on a single shot: a three pointer with 16 seconds left in overtime that put the Nets up by one point.

LVP stands for "Least Valuable Player" and denotes the player with the lowest Win Probability Added for the game (I considered naming this "The Cone of Shame"). In our example, the Nets' Deron Williams is the recipient of this dubious honor. Deron had a so-so night shooting the ball (42% eFG% on 12 shots), but it was his turnovers that made him LVP-worthy. Williams forfeited 6 of the Nets' possessions, costing them 31% in Win Probability. Note that MVP Paul Pierce also had six turnovers, but these only counted for a loss of 16% in Win Probability, meaning that Deron's turnovers came at more crucial stages of the game.

To calculate Win Probability Added at a player level, I am following the approach I outlined in this post from the summer, with the following plays being counted in each player's WPA ledger:

• Missed field goals
• Getting to the free throw line
• Missed free throws
• Turnovers
This is by no means intended to be a complete measure of a players contribution to a team's win probability. For one, it is wholly an offensive metric, with no attempt to quantify defensive contributions to win probability. It also ignores assists and rebounds (both offensive and defensive). I can quantify the WPA impact of these types of plays in a separate calculation, but I don't feel it is appropriate to try to incorporate them into a PER-style, all-encompassing metric.

Rebound WPA is problematic in that it will always be positive because it is easy to credit the rebounder, but more difficult to debit players when they don't grab the rebound (it amounts to a form of censoring). And I'm ignoring Assist WPA because I don't feel like delving into the chicken-egg quagmire of determining how you apportion WPA credit between the assist-ee and the assist-or.

With that being said, the box score located below each Win Probability graph shows the various components of the WPA metric I've devised, splitting WPA into its shooting and turnover components (shWPA and toWPA, respectively). I also show WPA due solely to free throw shooting, with the intent of highlighting players with crucial free throw makes or misses.

### Next Steps

I will do my best to keep this updated with each day's games in a timely manner, but I'm already a couple days behind (I have all games from the 2013-2104 season through Christmas). I'm working on a way to auto-update, but in the meantime, updates may be sporadic. I also plan on working backwards through the NBA seasons, with the ultimate (possibly unrealistic goal) of making available win probability graphs for all seasons in which play by play data is available (I believe this goes back to the 1996-97 season).

I intend to add other flavors of WPA, as complements to (but not components of) the main WPA metric (e.g. rebound WPA, assist WPA, steal WPA, etc.). A key feature I hope to add as soon as possible is a player summary tool that will rank players by their total WPA contributions (similar to the Advanced NFL Stats player pages), as well as show player game-by-game WPA performance.

WPA is also the ideal metric for quantifying "clutch" performance, and "clutch WPA" is a stat I intend to add once I work out the details of the calculation.

Once again, here is the link for the Win Probability Graphs: Happy Clicking.

1. Do you think it would be better if your WPA model incorporated the spread. For example: The Miami Heat play against the Utah Jazz in Miami, a theoretical spread would be 10+ points, but the WPA thinks the game starts out as an even match up.

1. I think you can make an argument for both (my model has the capability of factoring in the point spread - it's how I displayed the 2013 NBA Finals graphs). The main reason I'm using the 50/50 versions is so I can properly account for Win Probability Added at a player level. I may add the point spread adjusted version at a later date.

2. Why individual WPA's do not sum to 0.5 for winning team and -0.5 for losing team?

1. It's due to which plays I am counting at the player level. If teams had equivalent rebound WPA, then the difference between the winning team and losing team WPA should be close to 0.50. But if a team accumulated significant win probability by winning the rebound battle, that currently does not show up in my box score.

The losing team's WPA would sum to -0.50 only if I had a way of allotting defense at the player level. There are also plays which cannot be attributed easily to a specific player (e.g. shot clock violations, not getting a play in at the end of a quarter).

3. How important is the shot clock in the late-game WPs? Last night JR Smith took a wide open 3 with 21 seconds left of a tie game and a full shot clock, and everyone is unanimous that it is a horrible shot. This shows up as a 60.5% WP situation, which actually makes the shot extremely defensible given plausible estimates for his uncontested, assisted 24 footer %. But how much is the 60.5% affected by the fact that the sample includes teams with shot clocks < 21. I assume not much since the WP for a team with 30 seconds left in a tie game only falls to 59.6%. Thanks!

1. Late reply, but here goes: My model only looks at the beginning of each possession, so in the vast majority of cases, it is with a full shot clock. The exceptions would be things like: missed shots that didn't touch the rim, certain foul situations, etc.

So, the 60.5% estimate is based on a sample which is most likely dominated by situations in which the shot clock was off. Hope that makes sense.

4. This comment has been removed by the author.

5. I love it. Check out my blog at http://databallgoodman.blogspot.com/

6. you should use the 'kitchen sink' wpa metric- that one is the truest reflection but doesn't seem to be incorporated into your live version.