Monday, February 16, 2015

How NBA Games Are Won

Basketball netWhat's more important in basketball: rebounding or getting to the foul line? Field goal percentage or forcing turnovers? These questions aren't new, but for this post I will use my win probability model to provide a new perspective on what matters most when it comes to winning basketball games.

Dean Oliver, pioneer of basketball statistical analysis, identified what he termed the "four factors" of basketball success in his influential book Basketball on Paper. Those four factors are:
  • Shooting
  • Free Throws
  • Rebounding
  • Turnovers
Nearly everything that is important to the game of basketball can be attributed to one of those four factors. But is each factor created equally? Or is one "more equal" than the others? Oliver himself tackled this question, using his futuristically-titled RoboScout program. Here is how Oliver assessed the relative importance the the four factors:
  1. Shooting: 40%
  2. Turnovers: 25%
  3. Rebounding: 20%
  4. Free Throws: 15%
To be honest, I'm not sure what kind of Skynet-style algorithm RoboScout employed to arrive at these relative weights, but let's take them at face value for now.

That shooting is the most important of the four factors is fairly non-controversial. However, there is lukewarm disagreement within the basketball statistics community as to whether Oliver's weights are correct.

Basketball analyst and blogger EvanZ took a stab at this question in a two part post in 2010. By modeling point differential as a function of the four factors, he could assess which factor was most responsible for variation in scoring margins. You can refer to Evan's excellent writeup for the details, but here is the end result, laid alongside Oliver's original estimates:

factor DeanO EvanZ
Free Throws15%10%

While Dean and Evan agree on the order of importance, Evan's analysis ranks shooting significantly more important than Oliver's analysis would indicate, with field goals being more than twice as important as the next most important factor, turnovers.

So what can a win probability model add to this debate? For one, it allows for a more direct measure of what leads to wins in the NBA. Every play that takes place in an NBA game can be assessed according to its impact on win probability. And, nearly every one of those plays can be attributed to one of the four factors. An offensive rebound will increase a team's win probability in most situations, and that increase can be attributed to the "rebounding" factor. A game winning buzzer beater will surely impact win probability, and that impact would, of course, be attributed to the "shooting" factor.

As I have modeled it, every team starts a game with an equal share of win probability: 50%. Winning a game amounts to accumulating an additional net 50% in win probability, bringing the total to 100% (and dropping your opponent's total to 0%). Assessing the relative importance of the four factors can then be boiled down to measuring how much of that needed 50% in win probability gain comes from each of the four factors. 

Using all games from the 2011-2014 NBA seasons, here is the average win probability gain for winning teams, broken down by the four factors:
  • Shooting: +34.6% (69% of total)
  • Turnovers: +5.8% (12% of total)
  • Rebounds: +4.5% (9% of total)
  • Free Throws: +3.9% (8% of total)
  • Other: +1.3% (3% of total)
What does "Other" represent? It's any play not easily attributable to one of the four factors, but that still has impact on win probability. Examples include technical fouls, jump balls, and failing to get a shot off at the end of a quarter. It may be coincidental, but EvanZ found that only 4% of point differential was unexplained by the four factors, which is very close to my 3% attributable to "Other" above.

How does this result stack up against Oliver and EvanZ's estimates? See below (note: to keep things consistent, I am renormalizing the totals to sum to 100%, excluding "Other"):

factor DeanO EvanZwinProb
Free Throws15%10%8%

In order of importance, I am in agreement with Oliver and EvanZ, but that's where the similarities largely end. In terms of win probability impact, shooting is nearly six times more important than the next most important factor, turnovers. This is significantly at odds with the other two estimates. Before I propose some theories as to what is driving this difference, I'm going to drill down on these results a bit more.

The nice thing about the win probability approach is that it is merely an aggregation of very granular data, rather than a top-down regression model. So, I can attribute win probability added at a variety of detailed levels. The table below breaks down average win probability added (for the victorious team) by quarter:

Win Probability Added Percent of Total
quarter total FG FT RB TO FG FT RB TO
Q1 5.6% 4.1% 0.5% 0.5% 0.6% 72% 8% 9% 11%
Q2 7.6% 5.8% 0.6% 0.4% 0.8% 77% 8% 6% 10%
Q3 9.9% 7.1% 0.7% 1.0% 1.0% 72% 7% 10% 10%
Q4 21.7% 15.3% 1.8% 2.3% 2.3% 71% 8% 11% 11%
OT 3.1% 2.2% 0.3% 0.3% 0.3% 71% 9% 9% 10%

The first thing to notice is that win probability added is not evenly distributed among the four quarters of regulation play. Each quarter is more important than the one that precedes it, with the fourth quarter more than twice as important as the third.

Within each quarter, the win probability contribution of each of the four factors is relatively consistent (see the "Percent of Total" columns). Prior to pulling the data, I would have expected Shooting (FG) to have a larger impact in the fourth quarter, particularly the final minute. Here is how the fourth quarter breaks down:

Win Probability Added (4th Qtr) Percent of Total
minute total FG FT RB TO FG FT RB TO
first 10 12.2% 9.0% 0.7% 1.4% 1.1% 74% 6% 11% 9%
penultimate 2.6% 1.7% 0.2% 0.4% 0.3% 66% 7% 16% 11%
final 6.9% 4.6% 0.9% 0.5% 0.9% 67% 13% 7% 13%

Of the 21.7% in win probability added in the fourth quarter, a third of that comes in the final minute of play. And in that final minute, free throws become more important in determining the outcome of the game, making up 13% of the total, compared to an average of 8% over the entire game.

So why are my numbers so different from previous estimates of the four factors' importance? EvanZ's analysis focused on point differential, and perhaps it is possible that what drives differences in scoring margin may not always align with what wins games. For example, let's say a game has entered its "garbage time" phase. With victory (or defeat) assured, teams are more likely to pull their starters and play backups. Perhaps starters are more likely to make contributions via field goal shooting, to the exclusion of the other three factors. 

So, when the bench players are in during garbage time, their contributions may be more evenly spread among the four factors, which does affect scoring margin. It can turn a 25 point blowout into a 10 point victory, the impact of which would show up in EvanZ's analysis. But win probability added largely ignores these garbage time contributions.

Or maybe the difference is due to matters more mundane and technical. Assessing "relative importance" is not exactly a well defined measure. I find no fault in EvanZ's approach to the problem, but it is difficult to say whether his approach leads to numbers that are directly comparable to mine. Also, it is hard to make sense of what these numbers actually mean. That shooting is important is hardly a penetrating insight, and all three analyses agree on that point. 

But what does it mean to say that shooting has a relative importance of 71%, instead of 54%? What could an NBA GM do with that information? And how it would it change things if instead that value was 75%? At this point, I can't really say, but I think it is worthwhile to explore further. Alternate theories welcome in the comments below or on Twitter.


  1. Good stuff as usual!

    Is it possible that the discrepancy has to do with non-linear scaling of variance with sample size?

    1. Very possible. Similar theory at APBR:

    2. I tried to respond at the APBR forums, but their registration system doesn't work right.

      " ... If I created a fifth factor called `Winning the Coin Flip`... I'm trying to see if there are ways to account for the random variation with in my win probability framework. ..."

      The formula here is - roughly speaking - variance * significance = "importance". The difference between your approach and EvanZ's is that you're using different notions of variance and significance. If you want something that corresponds to the difference in teams, then using the per-team variance, rather than the per-event variance might be more appropriate.

      That doesn't directly address luck factors. 'Winning the coin flip' could still show as significant due to sampling issues. However, if we assume the data for team performance in each of the 'four factors' is representative of team quality, it should be pretty good.

  2. It is a top level sports network and there is a big opportunity for making career to connect with this network. To enhance your sporting passion find the location to play or team up and follow the interest in players. Find Basketball Players Near Me