Methodology

Team Ranking Methodology


Imagine a sports league with only three teams: Team Rock, Team Paper, and Team Scissors.  Paper plays Rock tonight and Paper is favored by 5 points.  Rock plays Scissors tomorrow night, and in that game, Rock is favored 3 points.  Paper is not due to play Scissors, so we don't know how the betting market evaluates the relative strengths of those two teams.  However, since the market thinks Paper is better than Rock by 5 points and Rock is better than Scissors by 3 points, we could make a guess that Paper would be favored over Scissors by 8 points (= 5 + 3).

So, if you were to make a rudimentary power ranking from those point spreads, you could come up with the following:
  • 1 - Paper: 5
  • 2 - Rock: 0
  • 3 - Scissors: -3 
That is basically what I am doing with the Betting Market Power Rankings, only with messier data.  I look back over the season's games and use standard linear regression techniques to come up with a set of rankings that are most consistent with the point spreads for those games.  The market's evaluation of those games changes over time, and I'm always looking for what the market is thinking as of today, so I weight recent games more favorably.  Here are the weights I use:
  • NFL
    • Weight = 1 / (weeks ago + 0.4), so current week gets a weight of 2.5, the prior week a weight of 0.71, etc.
  • NCAA Football
    • Weight = 1 / (weeks ago + 0.2), so current week gets a weight of 5, prior week a weight of 0.83, etc.
  • NBA
    • Weight = 1 / (games ago + 1.5), so the most recent game gets a weight of 2/3, the prior game a weight of 2/5, etc.
  • MLB
    • Weight = 1 / (days ago + 1), so today's game gets a weight of 1, yesterday's game a weight of 0.5, etc.
  • NCAA Basketball
    • Weight = 1 / (games ago + 0.5), so the most recent game gets a weight of 2, the prior game a weight of 2/3, etc.
The form of the weight function I am using (= 1/ (time elapsed + constant)) is equivalent to assuming that market evaluation of team strength "jiggles" randomly up or down week to week (or day to day) according to a normal distribution.  The same type of assumption, a random walk, is used quite often in modeling stock price movements (e.g. the Black-Scholes model).

The value of the constant chosen for each sport was derived by looking at past seasons' data and choosing a value that minimized the prediction error of future point spreads.


How the Betting Market Reacts to Game Results (Gamblers are Bayesians)

Although the approach above generates a set of rankings, it ignores some potentially useful information that could be used to better match the coming week’s point spreads. For example, in week 9 of the 2011 NFL season, New England was favored by 9.5 points over the New York Giants. However, the Giants ended up winning by 4 in that game. So, the outcome of the game deviated from the market’s expectation by 13.5 points. One would expect that the market would factor that result into future estimates of both New England’s and New York’s strength. I assumed that the betting market would recallibrate itself according to the following formula:

revised “best estimate” spread = original spread + (credibility coefficient) x (deviation from expected)

I then determined what that credibility coefficient was by trial and error optimization. For the NFL, I found that a coefficient of 15% generated the most accurate prediction of the coming week’s spreads. In other words, the betting market appears to treat the outcome of each game with 15% credibility when revising its estimates of each team’s strength. So, in the New England/ New York example above, if those two teams had been scheduled to play each other at New England again, the new spread would have been revised down from 9.5 points to 7.5 points ( = 9.5 + 0.15 * (-9.5 - 4)

Here are the credibility coefficients I use for each sport, where once again the value was chosen by looking at past seasons and picking the value that minimized the prediction error of future point spreads:

  • NFL - 15%
  • NCAA Football - 20%
  • NBA - 20%
  • MLB - 5%
  • NCAA Basketball - 15%

11 comments:

  1. have you tried apply your analysis to horse racing?

    ReplyDelete
    Replies
    1. Funny you should ask. I've got an approach mapped out in my head that would work for horse racing.

      What's holding me back is data access. Killersports.com (and their related websites) have been a fantastic resource for the NFL, NCAA, and NBA. But there just doesn't appear to be anything like that for horse racing. Data, in general, is very locked down. I think it's hurting the sport and the interest it could attract.

      Let me know ff you're aware of any open data sources for past final odds information on horse racing. I know there's Equibase, Brisnet, and the Daily Racing Form, but they all charge for their data.

      Delete
    2. I would definitely be interested in knowing how you would go about modelling horse racing. I was thinking about using a chess rating system...but it would have to be multivariate in the sense of horse*jockey*trainer combinations (as jockeys and trainers frequently change). How to incorporate all these factors...I don't know.

      as for data, horseplayerinteractive.com keeps all horse races and results on archive. For each race, you can also see morning line odds and its progression to final odds closer to post time. However, the data isn't available in a readily downloadable format. So the only way to make use of this is to build a dataset manually...which will take quite some time.

      I am considering putting together a data set but I want to know that I have some kind of model for it before I do.

      Delete
  2. btw...you have to be a member of horseplayerinteractive.com to view the information...its easy to make an account

    ReplyDelete
  3. I like your approach, have you considered posting college football figures?

    ReplyDelete
    Replies
    1. As luck would have it, I have. I've got most of it put together, I just need to find time to post it. Probably this weekend.

      Delete
  4. There have been discussions elsewhere, but I figured I'd ask for confirmation here: do you use a flat 2.5 point home field advantage for NFL teams?

    ReplyDelete
  5. I like your analysis, my only question is about your assumptions that the "sports market" is essentially a random walk. Do you have any data to back this up? I would think the random walk model cannot be applied to sports because of inherent biases prevalent in a majority of bettors (hometown bias, short-term memory), that are not apparent in stocks due to economically rational, institutional investors crowding out irrational decisions.

    ReplyDelete
    Replies
    1. I don't have any hard data. I can say that the predictive accuracy of the rankings improved under the random walk assumption, compared to the prior, more arbitrary weights I had been using.

      In regards to bettor bias, I think these are not as significant as some may believe (i.e. the betting market is fairly efficient). Was there a particular bias you had in mind?

      Delete
  6. This may have been suggested already, but shouldn't you use a betting wager as the starting point for your NBA win probability instead of each team having a 50/50 chance?

    ReplyDelete