Saturday, April 25, 2015

The Improbability of the Warriors' Comeback

Hindsight has a way of making the improbable seem inevitable. Of course the Warriors erased a 20 point deficit in the fourth quarter (despite being only the third playoff team to do so). Of course Steph Curry hits that game tying three from the corner to force overtime (despite having missed from the wing just three seconds prior).

But a 20 point comeback is anything but inevitable, and we tend to forget the games in which a blowout stays a blowout because, well, those games are forgettable. So what do the numbers say?

My own win probability model put the Warriors chances as low as 0.2%. That low point occurred after a miss by the Warriors' Shaun Livingston with 6:24 left in the game and Golden State down by seventeen. Livingston would rebound his own miss for the put back slam dunk, tripling his team's chances to 0.6%.

But that win probability estimate assumes teams that are evenly matched. The Warriors, however, are anything but an even match for the Pelicans. At home, they were 12.5 point favorites over the Pelicans in the first two games of their first round series. With game three being in New Orleans, the Warriors were still favorites, but not overwhelmingly so at just 5 points. If we use the win probability model calibrated to pre-game odds, the Warriors comeback becomes slightly less improbable, with a low point of 0.4%.

How does this compare to other estimates of the Warriors' chances? I am aware of two others:
  • Gambletron 2000 - A site that aggregates in-game betting data. Think of it as a stock ticker for each NBA game (along with many, many other sports)
  • numberFire - The popular fantasy/analytics site which has recently begun publishing in-game probabilities for both football and basketball.
According to Gambletron, the Warriors chances sunk as low as 2.4% with around 6 or 7 minutes left in the fourth quarter. This amount is somewhat higher than my low point estimate of 0.4%. 

numberFire's estimate is even further off from mine, with a Warriors' win probability of 6% right around the seven minute mark of the fourth quarter. While neither model is going to be objectively "right", this is clearly a significant difference. 

So let's look at the raw data. I built my win probability model from play by play data spanning the 2000-2012 seasons. With over a thousand games per season, this makes for a fairly robust dataset. Here is how often teams came back from large deficits midway through the fourth quarter:

minutes to go:
seven minutes six minutes five minutes
trailing by games won pct games won pct games won pct
20 594 0 0.0% 613 0 0.0% 587 0 0.0%
19 703 0 0.0% 724 0 0.0% 708 0 0.0%
18 764 3 0.4% 760 1 0.1% 754 1 0.1%
17 839 6 0.7% 887 1 0.1% 884 2 0.2%
16 914 2 0.2% 960 3 0.3% 921 1 0.1%
15 1059 16 1.5% 1004 7 0.7% 1036 2 0.2%
14 1138 19 1.7% 1129 14 1.2% 1194 4 0.3%

The Warriors were down 17 with six minutes to go in their game. Over the course of 13 NBA seasons, there were 887 games in which a team trailed by that many with that much time remaining. Only once did that team go on to win - a raw frequency of just 0.1%. Eyeballing these numbers, my model estimate of about 0.5% seems most in line with the actual data, compared to numberFire and Gambletron.

But this dataset includes all games, underdogs and favorites alike. What if we restrict the view to favorites? The Warriors were 5 point favorites at New Orleans. The table below only looks at outcomes for trailing teams that were favored by 2.5 to 7.5 points:

2.5 to 7 point favorites, minutes to go:
seven minutes six minutes five minutes
trailing by games won pct games won pct games won pct
20 55 0 0.0% 65 0 0.0% 68 0 0.0%
19 74 0 0.0% 78 0 0.0% 91 0 0.0%
18 83 0 0.0% 96 0 0.0% 83 0 0.0%
17 117 1 0.9% 121 0 0.0% 99 0 0.0%
16 116 0 0.0% 107 0 0.0% 102 0 0.0%
15 152 1 0.7% 132 1 0.8% 125 0 0.0%
14 161 4 2.5% 133 1 0.8% 142 0 0.0%

And here is the data for 7.5 to 12 point favorites:

minutes to go:
seven minutes six minutes five minutes
trailing by games won pct games won pct games won pct
20 13 0 0.0% 12 0 0.0% 15 0 0.0%
19 17 0 0.0% 16 0 0.0% 16 0 0.0%
18 16 0 0.0% 19 0 0.0% 26 0 0.0%
17 28 1 3.6% 24 1 4.2% 21 0 0.0%
16 28 0 0.0% 35 0 0.0% 30 0 0.0%
15 27 1 3.7% 34 1 2.9% 31 0 0.0%
14 43 2 4.7% 35 1 2.9% 40 1 2.5%

As you can see, the data gets fairly sparse and noisy once we start slicing and dicing. The art of building a win probability model is drawing smooth, rational lines through a messy cloud of data points. Feel free to draw your own conclusions, but the data above gives me confidence in my model's estimates, as well as a proper appreciation of what Steph Curry and the Warriors pulled off Thursday night in New Orleans.

4 comments:

  1. > ... But that win probability estimate assumes teams that are evenly matched. ...

    I was under the impression that your win probability model takes the Vegas spread as an input: (http://www.inpredictable.com/2015/02/updated-nba-win-probability-calculator.html)
    "... The win probability is a function of game time, point differential, possession, and the Vegas point spread. ..."

    > ... How does this compare to other estimates of the Warriors' chances? ...

    There's always the classic: Assume the score differential is a 1-dimensional random walk with a per-game standard deviation of 12.3 points. Then with 0.13 of a game left, the standard deviation over the rest of the game will be about 4.5 points. That makes the comeback a four-sigma event with a probability around .0001.

    I also wonder how much market factors like the bid-ask spread and the granularity of contract sizes distort the numbers that Gambletron2000 produces in extreme situations like this.

    ReplyDelete
    Replies
    1. > ...I was under the impression that your win probability model takes the Vegas spread as an input...
      It does, but the default graph uses zero as the point spread. So it explicitly assumes evenly matched teams.

      I agree with you that the Gambletron numbers get harder to interpret as you get closer to a 1.0 probability. You rarely get fair odds on extreme long shot bets.

      Delete
  2. Hi Mike,

    Here are my brief thoughts on win probability models.

    1- They are really hard to do, and I appreciate the effort. They add much to the context of the game.

    2- As you describe here, you are using past results to predict the future. As a result, you are reporting estimating probabilities, not true probabilities. As with any estimates, they come with error. For example, I don't think your approaches can accurately distinguish between a 1 in 500 chance of a Warriors comeback and a 1 in 50 chance. That uncertainty is important, albeit hard to report. Even things like saying "The Warriors had about a 1 in 200 chance of a comeback" is far preferred over "The Warriors had a 1 in 200 chance of a comeback.

    3- Not sure which of your approaches incorporate pre-game knowledge, but any probability estimate that doesn't is missing context.

    -Mike

    ReplyDelete
    Replies
    1. Thanks Michael, appreciate the feedback. I agree that my model doesn't represent "true" probabilities, but I doubt any model could given that we're trying to predict human behavior.

      I view the model as saying "Here is how past NBA teams have fared when in a similar situation". It is up to the modeler to define "similar situations", and how granular you go (and how you smooth out the noise in the historical results).

      As you call out, that is difficult to communicate. I could preface every statement with "according to my modeling", or "based on historical results", but that gets tiresome after awhile. Although using "about" as you suggest may be a compromise. But these caveats will be ignored by most.

      To your last question, the underlying model takes in pre-game knowledge. Point spread is an independent variable in the regression. For the 50/50 graphs, I deliberately set the point spread input to 0 to arrive at an unbiased view of win probability. So it deliberately strips away pre-game knowledge, which is a bad thing if you're using the model to bet, but a good thing if you want to measure things like win probability added.

      Delete