Methodology

Team Ranking Methodology

Imagine a sports league with only three teams: Team Rock, Team Paper, and Team Scissors. Paper plays Rock tonight and Paper is favored by 5 points. Rock plays Scissors tomorrow night, and in that game, Rock is favored 3 points. Paper is not due to play Scissors, so we don't know how the betting market evaluates the relative strengths of those two teams. However, since the market thinks Paper is better than Rock by 5 points and Rock is better than Scissors by 3 points, we could make a guess that Paper would be favored over Scissors by 8 points (= 5 + 3).

So, if you were to make a rudimentary power ranking from those point spreads, you could come up with the following:

1 - Paper: 5
2 - Rock: 0
3 - Scissors: -3

That is basically what I am doing with the Betting Market Power Rankings, only with messier data. I look back over the season's games and use standard linear regression techniques to come up with a set of rankings that are most consistent with the point spreads for those games. The market's evaluation of those games changes over time, and I'm always looking for what the market is thinking as of today, so I weight recent games more favorably. Here are the weights I use:

NFL

Weight = 1 / (weeks ago + 0.4), so current week gets a weight of 2.5, the prior week a weight of 0.71, etc.

NCAA Football

Weight = 1 / (weeks ago + 0.2), so current week gets a weight of 5, prior week a weight of 0.83, etc.

NBA

Weight = 1 / (games ago + 1.5), so the most recent game gets a weight of 2/3, the prior game a weight of 2/5, etc.

MLB

Weight = 1 / (days ago + 1), so today's game gets a weight of 1, yesterday's game a weight of 0.5, etc.

NCAA Basketball

Weight = 1 / (games ago + 0.5), so the most recent game gets a weight of 2, the prior game a weight of 2/3, etc.

The form of the weight function I am using (= 1/ (time elapsed + constant)) is equivalent to assuming that market evaluation of team strength "jiggles" randomly up or down week to week (or day to day) according to a normal distribution. The same type of assumption, a random walk, is used quite often in modeling stock price movements (e.g. the Black-Scholes model).

The value of the constant chosen for each sport was derived by looking at past seasons' data and choosing a value that minimized the prediction error of future point spreads.

How the Betting Market Reacts to Game Results (Gamblers are Bayesians)

Although the approach above generates a set of rankings, it ignores some potentially useful information that could be used to better match the coming week’s point spreads. For example, in week 9 of the 2011 NFL season, New England was favored by 9.5 points over the New York Giants. However, the Giants ended up winning by 4 in that game. So, the outcome of the game deviated from the market’s expectation by 13.5 points. One would expect that the market would factor that result into future estimates of both New England’s and New York’s strength. I assumed that the betting market would recallibrate itself according to the following formula:

revised “best estimate” spread = original spread + (credibility coefficient) x (deviation from expected)

I then determined what that credibility coefficient was by trial and error optimization. For the NFL, I found that a coefficient of 15% generated the most accurate prediction of the coming week’s spreads. In other words, the betting market appears to treat the outcome of each game with 15% credibility when revising its estimates of each team’s strength. So, in the New England/ New York example above, if those two teams had been scheduled to play each other at New England again, the new spread would have been revised down from 9.5 points to 7.5 points ( = 9.5 + 0.15 * (-9.5 - 4)

Here are the credibility coefficients I use for each sport, where once again the value was chosen by looking at past seasons and picking the value that minimized the prediction error of future point spreads: