Friday, November 23, 2012

In-Game Cover Probability: A Start

This is the first post in what I hope to be a series on the topic of in-game cover probability.  I view the idea as an extension of the Win Probability graphs at Advanced NFL Stats.  For those of you not familiar with the win probability graphs, the general idea is to estimate a team's chances of winning a game given down, distance to go, field position, time remaining, and point differential.  As I understand it, the probabilities are not derived from a Madden-style simulation, but are rather based on actual game outcomes, with some LOWESS smoothing to fill in the gaps where there isn't sufficient data.

What I hope to explore here is whether a similar model can be built to determine a team's probability of covering the spread.  As an example, take the week 7 matchup between Chicago and Detroit.  With about 40 seconds to go in the game, Detroit trailed by 13, but had the ball on the Chicago 12.  The ANS probability model gave Detroit a 1% chance of winning the game at this point, so not a lot of drama, unless you had bet on the spread.  Detroit was a 6.5 point underdog.  A touchdown by Detroit would put the margin at 6 points and Detroit would cover the spread.

This is in fact what happened.  Detroit scored a touchdown, but failed to recover the onside kick, and Chicago won the game.  A cover probability graph would have looked far more "interesting" than the win probability graph at that point in the game.  I imagine this would be a tool that would be of interest to those who have bet money on an NFL game.


In order to achieve a cover probability model, I think the following two (non-trivial) enhancements would be needed to the ANS WP model:

  1. Scoring Margin Probability Distribution: Instead of a binary win-loss probability, one would need a way to project the probability of each scoring margin (e.g. 30% probability of the team winning by 7, 10% of the team winning by 8, etc.).
  2. Matchup Dependent Probabilities: The ANS WP model, by design, starts each team with a 50% win probability and the subsequent probability changes are based solely on the in-game situation.  For a proper estimation of cover probability however, accounting for matchup is a necessity.  Otherwise, a 7 point underdog would start the game at a >50% cover probability.

With regard to the first needed enhancement, I initially thought that I could model things on a binary basis by just replacing win/loss with cover/not cover.  I now think that approach would not work given that teams strategize and call plays to win the game, not cover the spread (if they did the latter, the NFL would have a big problem on their hands).  Modeling margin distributions will be a challenge for sure, but I think it could provide some interesting insights into in-game dynamics.

On the topic of matchup-dependent probabilities, Neil Paine at Football Perspective took a look at this in a recent post.  The approach he took was to model scoring margin as a normally distributed variable with a mean based on the Vegas line.  This generates a scoring distribution that you can then roll up into a win probability.  In a nifty trick, he then scaled down the mean and standard deviation to project out the remaining incremental score distribution at each quarter.  This can be added to the actual margin at the quarter to get a projected final margin distribution and thus a projected win probability.

Tom Baldwin at Advanced NFL Stats Community also took a look at combining the prior estimate of win probability with the live win probability graphs, finding a clever way to boil things down to a simple equation.

While these approaches work well enough for projecting win probability, I don't think it will be sufficient for projecting margin distributions and cover probabilities.  We know that NFL scoring margins are not normally distributed as scoring tends to come in chunks of threes and sevens (and sometimes sixes, eights, and twos).

Projected Margin Distribution - At Kickoff

Consider the following a baby-step (fetus-step?) towards the ultimate goal of building an in-game cover probability model.

My initial baby step is to create an expected margin distribution (at kickoff) for each possible Vegas point spread.  The mathematical technique I intend to use for generating margin probability distributions is known as ordered logistic regression.  This technique is well suited to situations where your outcome variable is ordered, but not necessarily linear.  It is a straightforward, non-kludgey way to generate probability distributions that reflect actual NFL scoring margins.

See below for an example of the projected margin distribution for a team favored by 7 points (note that I am combining margins of less than -28 points or more than 28 points into single buckets on the graph):

It is interesting to note that, even for a seven point favorite, a three point margin is more likely than a seven point margin (the median is 7 but the mode is 3).  As a reality check, here are the actual margin distributions for all seven point favorites going back to 1999 (218 games):

As expected, the actuals are a bit noisier, but seem consistent with the smoothed, theoretical distribution above.  In a separate post, I'll share a tool (see here) that will generate the theoretical distributions for any line you choose (3.5 point favorite, 7 point underdog, etc.)

Next Steps

Now that I have a theoretical margin distribution for any starting point, I now need to figure out how to allow that distribution to evolve as the particulars of the game play out (margin, time remaining, field position, etc.).  I plan on breaking things up into the following (progressively harder) steps:
  • Quarter by Quarter - Along the lines of Neil's analysis, project out margin distribution and cover probability by quarter, as a function of the point spread and the ending quarter point differential.
  • Minute by Minute - Next, I plan on extending things down to each minute (or possibly 15 seconds).  This is where things get challenging as smoothing will become difficult.  With a minute to go, there is a huge difference between a three point differential and a four point differential.
  • Down, Distance, and Field Position - The final step will be to create margin distributions and cover probabilities for any possible game situation, basically the same variables used in the Advanced NFL Stats win probability model.  I hope to have this ready sometime before Andrew Luck announces his retirement.
  • Bonus Points - Once I have the methodology worked out, I think I could extend this to in game over/under probabilities as well.  For example, what is the probability of the over, if the over/under was set at 42 points and 28 total points had been scored by the end of the third quarter?

1 comment:

  1. Here is another take on matchup-dependent win probability, courtesy of Andrew Foland at Advanced NFL Stats Community.