Saturday, October 12, 2013

The Turnover Index (possibly overfit edition)

Andrew Luck fumbles at 2009 Big GameThe Turnover Index makes a belated return this season, with some additional tweaks and enhancements. The Turnover Index is a simple betting strategy based on the theory that the betting market overvalues defensive turnovers when evaluating future team performance. I laid out the evidence in this post: NFL Turnover Differential and the Point Spread. Using data from the 1998-2011 seasons, I found that teams that had generated at least 10 fewer defensive turnovers (season-to-date) than their opponents covered the spread 58.7% of the time.

In a series of weekly posts, I tracked the performance of this betting strategy (bet on the team with at least 10 fewer defensive turnovers) throughout the 2012 NFL season. Initial results were promising, but regressed somewhat near the end of the season. At season's end, the strategy had gone 18-16-1 against the spread, for a whopping 1% return on in investment.

In this post, I will lay out a somewhat revised version of the turnover index which will allow a more sophisticated betting strategy based on the Kelly criterion for bankroll management.

Turnovers per game

In my recap post from last year, I speculated that the betting strategy may do better if one focused on turnover differential per game, rather than total turnovers. Back-testing this new strategy seemed to show better returns, so for 2013, the new Turnover Index criterion will be based on turnovers per game, rather than total turnovers.

I then built a very simple logistic regression with turnover differential per game as the sole independent variable. Using data from the past ten seasons (2003-2012) generated a coefficient of -0.11341. So, the probability of team A covering the spread against team B is as follows:
  • Atodiff = Team A defensive turnovers per game
  • Btodiff = Team B defensive turnovers per game
probability of A covering = 1 / ( 1 + exp(-0.11341*(Btodiff - Atodiff))

So, if team A had 0.5 turnovers per game and team B had 2.5 turnovers per game, team A's probability of covering the spread would be 55.6%.

The one problem with this strategy is that it is now overconfident in the early weeks of the season, where you can get high probabilities of covering with a differential of just 3 or 4. So, I need to add an additional betting rule (danger! danger!): Only bet if the there is at least an 8 turnover gap between the teams.

Fortune's Formula

Now that I have a probability estimate of covering the spread, I can use what is known as the Kelly criterion to adjust bet size. The Kelly criterion is a betting/investment strategy that tells you what fraction of your bankroll you should invest in a particular bet in order to maximize your long term return. As of October 2013, the Wikipedia page on the Kelly criterion is a bit of a mess, but provides a decent enough overview. For a more in depth (and entertaining) take, I highly recommend William Poundstone's 2006 book Fortune's Formula: The Untold Story of the Scientific Betting System that Beat the Casino and Wall Street.

If you have a single bet to make, the recommended fraction of bankroll one should bet is:
  • (p * (b+1) - 1)/b
where p is the estimated "true" probability of the bet succeeding and b is the odds given for the bet. For this feature, I am assuming that all point spread bets are at the standard bet $110 for a $100 profit, which is equivalent to odds of 10 to 11 (or ~0.91). So, if you thought that a particular team had a 55% chance of covering the spread, the Kelly criterion recommends that you bet 5.5% of your bankroll on that team.

Multiple Independent Bets

The "single bet" formula above is fairly straightforward to derive with a little bit of calculus. Things get a bit hairier though if you have multiple, simultaneous bets to make whose outcomes are uncorrelated, which is the exact situation we're in on NFL Sundays. There is no closed formula solution in that case, and one must turn to numerical methods. With multiple independent bets, the resulting fractions all get scaled down from their single bet versions so that you don't end up with non-sensical results like betting 200% of your bankroll when you have ten independent bets with Kelly fractions of 20%.

Back Testing

So, armed with our logistic regression model and the Kelly criterion, how would our bankroll have fared in prior seasons? The chart below displays cumulative bankroll growth from seasons 1999-2012. Here is a summary of the betting strategy:
  • Probability of covering based on the difference in per game defensive turnovers
  • Run logistic regression model off the past 10 seasons of data (e.g. season 2005 bets based on data from 1995-2004)
  • Bets placed during weeks 4-16 of each season
  • Only bet if there is at least an eight turnover differential between the teams
  • Use the Kelly criterion to determine what fraction of your bankroll to bet on each game

If we had started with a bankroll of $1000 in 1999, by season's end in 2012, that bankroll would have grown to $5600, for an annualized return of 13%. Promising for sure, but it's easy to find profitable betting strategies by mining the past. Carrying that forward into the future is much more difficult (see last year's Turnover Index results or home underdogs).

2013 Season

There were no bets to make in week 5 according to the strategy. There were, however, two recommended bets for week 4: Texans over Seahawks (12.7% of the bankroll) and Steelers over Vikings (14.6% of the bankroll). Both teams failed to cover (the Texans missed by half a point). So, for any of you foolish enough to bet real, American dollars on what I post here, my procrastination saved you 27% of your bankroll.

For tracking purposes, I will include those first two losses in the overall performance for 2013. So let's see if we can dig ourselves out of this initial hole.

I will probably create a separate post for the week 6 bets for completeness, but in case I don't get to it, here are the recommended bets (or bet, rather) for week 6 of the NFL season:

  • Chargers (+1.5) over the Colts
    • Chargers defensive turnovers: 2 (0.4 per game)
    • Colts defensive turnovers: 10 (2 per game)
    • Probability of covering: 54.5%
    • Fraction of bankroll to bet: 4.5%


  1. How well did the predicted edge match up to the realized edge? E.g., if you bucket into 50%-52%, 52%-54%, 54%-56%, etc. for predicted edge, how do the ATS results for each group look?

  2. It's not the tightest of correlations:

    Here the results (rounded to nearest 0.01, and 0.57 and above grouped into one bucket):
    predicted | games | actual
    0.52 | 55 games | 52.7%
    0.53 | 150 games | 48.0%
    0.54 | 115 games | 55.7%
    0.55 | 77 games | 57.1%
    0.56 | 69 games | 60.9%
    >0.56 | 99 games | 52.5%

  3. Another way to deal with multiple simultaneous bets - as long as there aren't too many of them - is to use parlays.
    Let's say you have bet A and bet B which call for a of your bankroll and b of your bankroll respectively and you're getting 10-to-11 odds on both. Then you can bet (1-b)*a of your bankroll on bet A, (1-a)*b of your bankroll on bet B, and ab of your bankroll on the 2-way parlay.
    With standard payouts, a two-way payout is slightly worse than a sequential bet payout, but IIRC the lower payout can be offset by the ability to wager more efficiently.

    This method can be generalized to any number of simultaneous bets, although the parlay wagers quickly become numerous and small.