Sunday, October 6, 2013

Early Season Power Rankings


When it comes to power rankings, week 5 makes for interesting times. We have just enough data (four games) to start questioning our pre-conceived notions of who the strongest and weakest teams are. Are the Chiefs as good as their undefeated record? Are the Steelers and Giants really as bad as their 0-4 marks? Once you get past the midpoint of the season, most power rankings, for better or worse, tend to coalesce around win-loss record. But with just four games, even the pundits try to be good little Bayesians when it comes to evaluating small sample sizes.

We tend to forget how off the mark our early season evaluations can be, so for this post, I decided to take a look at how various NFL power rankings evolved over past seasons, and, most importantly, how well those rankings correlated with future wins.

A Cross Section of Power Rankings

I compiled results for five different power rankings, with the goal of getting a broad cross section of the different types of rankings that are out there.
  • Subjective, journalist/pundit rankings - These tend to be the most popular, and you can find them on most of the major sports websites: ESPN, CBS Sports, NFL.com, etc. For this analysis, I am using ESPN's official NFL power rankings.
  • Quantitative, proprietary rankings - Football Outsiders' DVOA rankings get a lot of attention, and are based on a proprietary evaluation of play by play team performance.
  • Quantitative "open source" rankings - A main competitor to DVOA are the Team Efficiency rankings published by Brian Burke of Advanced NFL Stats. In contrast to DVOA, Brian is much more upfront about how he is measuring team strength.
  • Quantitative "simple" rankings - One could get really simple and just rank based on win-loss record or average margin of victory, but for this analysis, I will use what is known as the Simple Ranking System, which you can find versions of at Football Perspective or Pro Football Reference. The ranking is based on average margin of victory (or loss), with an adjustment for strength of schedule.
  • Betting market rankings - You can find these here at inpredictable, the Linemakers at Sporting News, or the handicapper rankings at ESPN. Predictably, the analysis below uses my rankings.

Evaluation

The challenge in evaluating different rankings is that most are just that, a ranking. As far as I know, only Advanced NFL Stats uses its rankings to publish specific game predictions. So, in order to evaluate the various ranking systems consistently, I am going to use Spearman's rank correlation coefficient, which is just a way to measure the similarity of two different ordinal rankings. A value of 100% means complete agreement from #1 to #32. But if a model ranks a team #25 and they go on to rank #3 in future wins, that will reflect poorly in that model's score.

For the prior four NFL seasons (2009-2012), I compiled the following:
  • Week 5 Rankings: rankings after completion of week four of the season
  • Week 17 Rankings: rankings after completion of week sixteen of the season (I'm excluding week seventeen as many teams have nothing to play for at that point)
  • Future Win-Loss Record Ranking: I ranked each team's win-loss record for weeks five through sixteen combined. I can then see how that ranking of future wins correlates with the Week 5 ranking.
The tables below summarize the Spearman coefficient of the Week 5 rankings when compared to both the Week 17 rankings (testing for consistency) and the Future Win-Loss ranking (testing for predictive accuracy). This is an imperfect measure of predictive accuracy, but the only one available for ordinal rankings, so keep that in mind when evaluating the results below.

Results

The first table below tests correlation between each ranking system's Week 5 ranking with its Week 17 ranking. A higher score doesn't necessarily mean better in this context (there's consistency and then there's foolish consistency).

Week 5 to Week 17 Correlation
ranking average 2009 2010 2011 2012
espn 50% 67% 29% 65% 41%
dvoa 58% 69% 44% 64% 54%
ans 65% 79% 67% 69% 44%
market 55% 66% 49% 60% 46%
srs 51% 62% 39% 58% 46%

The Advanced NFL Stats efficiency ranking seems to have the most consistency throughout the season, which is not too surprising as the metrics used for that rankings were deliberately chosen for in-season stability (and their correlation with wins, of course).

The next table shows how the Week 5 rankings correlated to future wins, using the same Spearman coefficient methodology. A higher value means a better correlation to future win-loss record.


Week 5 Ranking Correlation to Future Wins
ranking average 2009 2010 2011 2012
espn 47% 51% 55% 43% 39%
dvoa 49% 47% 46% 41% 65%
ans 37% 51% 15% 49% 32%
market 55% 67% 45% 52% 56%
srs 49% 47% 38% 56% 54%

The results here surprised me. I had never been a fan of the DVOA rankings for several reasons: it's a black box, it's unitless, and it's not clear to me what, exactly, it is attempting to measure (despite a rather lengthy methodology description). Setting all that aside, DVOA seems to do a decent job at predicting future wins within a season, at least according to this metric. 

Just as surprising was the Advanced NFL Stats model performance. 2010 was a particularly rough year for that model, and a big driver of that low 15% score was the Falcons' strong performance (the Falcons finished with a 13-3 record and the #1 seed, but rarely, if ever, cracked the top 20 in the ANS model). Although the model whiffed in the regular season, one could argue that ANS got the last laugh in the post-season, with the #3 ranked (but 10-6) Packers demolishing the Falcons at home en route to a Superbowl victory.

The model with the best performance over the four year period was the market based ranking published here. Further confirmation that the NFL betting market is fairly efficient. And despite its simplicity, the Simple Ranking System held its own against the more sophisticated algorithms of Football Outsiders and Advanced NFL Stats.

The ESPN rankings fared better than I expected, but still behind DVOA, SRS, and the market-based rankings.

For anyone that is interested, I saved the raw data for this analysis in a google docs spreadsheet (the market ranking is prefixed with "VR", for Vegas Rank, and "wrnk" is the ranking of the team's win-loss record for weeks five through sixteen). As I said, these results suprised me, so any feedback or double-checking would be appreciated.

I'll do a follow up post later this season on how these models performed in 2013.

5 comments:

  1. Do these change much if you take a different beginning point like week 8? I know specifically the ANS model regresses team stats heavily to the mean until week 8.
    Just curious if it fares better after week 8.

    http://www.advancednflstats.com/2007/10/instability-compensation.html

    ReplyDelete
  2. I looked at week 8 as well (which I may share in a follow up), and while the results converged a bit, the overall positioning was roughly the same.

    ReplyDelete
    Replies
    1. Looking fwd. to the week 8 update. Thanks.

      Delete
  3. There's obviously a sample size issue, but looking at how much the ANS rankings are changing this week compared to the Vegas based ones you have here, (161 vs 44 team rank changes) it's a little surprising that ANS is more self-predictive.

    ReplyDelete
  4. Thanks for posting nice content. Here you may also gain information concerning research paper writing.

    ReplyDelete