Saturday, October 13, 2018

Betting Market Rankings for the NHL

In what is surely the second most exciting development of the NHL season, I have added hockey to my suite of betting market team rankings. For those unfamiliar, the basic idea of these rankings is to reverse engineer an implied team ranking from the game by game point spreads, moneylines, and over/unders. See my post at the Advanced NFL Stats Community site for a basic overview of the concept. With this latest addition, I now have daily market rankings for the NFL, College Football, NBA, WNBA, College Basketball, MLB, and the NHL.

The nice thing about market derived rankings is that you can get a reasonable ranking with a relatively small sample size. We are just a week into the season, and the rankings already pass a sniff test. The top 5 of Tampa Bay, Nashville, Winnipeg, Toronto, and Pittsburgh in my rankings are also the top 5 favored teams to win the Stanley Cup.

Another nice feature of the rankings is that they are quicker to react than traditional stat-based rankings. If a key injury or roster change materially affects team strength, the rankings tend to recalibrate within a week or so. Hockey outcomes can be quite random (similar to baseball), so any stat-based ranking is going to be like a large ship, steering slowly towards the right direction.

Here is what the ranking table looks like for the top 10 teams:

And here is an explanation of the fields:

  • LstWk: The team's ranking a week ago, with the day by day progress shown in the sparkline
  • GAA: stands for Goals Above Average. It's the expected margin of victory (or defeat) against a league average team.
  • oGAA: Offensive Goals Above Average. Using the totals, team strength can be decomposed into offense and defense. This is the component of a team strength attributable to its ability to score goals.
  • dGAA: Defensive Goals Above Average. Defensive component of GAA. This is the component of team strength attributable to its ability to prevent the other team from scoring goals
  • GWP: Generic Win Probability. Expected win probability against a league average team on neutral ice. The spread here is similar to Major League Baseball, with the best team at just 60% win probability. Contrast that to the NFL, were the best team in the league, the Los Angeles Rams, would have a a 78% chance of beating an average team on a neutral field.
  • Points: Standings in the NHL are determined via a points system: 2 points for a win (in regulation or OT), 1 point for an OT loss, 0 points for a regulation loss.
  • Projections: See next section


As I have done with other sports, I can use these rankings to simulate the remainder of the season and project out playoff seeding. Games are simulated via the following formula:
home team win probability = 1 / (1 + exp (-(0.28 + home GAA - visitor GAA)/1.318)
0.28 means that home ice advantage is worth 0.28 goals. Both that factor and the 1.318 factor were derived from actual game results and optimized for prediction accuracy.

Once the full season is simulated, seeding and tiebreaker rules are applied and the process is repeated 10,000 times. The outcomes are summarized for each team in terms of projected points, probability to make the playoffs, and those odd looking bar charts, which summarize each team's playoff seed probabilities. Here is the chart for the Florida Panthers:

Going left to right, the red bars are the non-playoff seeds and the eight blue bars on the right are the playoff seeds. Unlike the NFL and NBA, NHL playoffs seeds don't follow a straightforward ranking system, so here is an explanation:

  • Seed 1 (furthest right): Best record in the conference.
  • Seed 2: The division winner that is not Seed 1.
  • Seeds 3&4: The 2nd ranked teams from each division, with Seed 3 given to the highest ranked among those two.
  • Seeds 5&6: The 3rd ranked teams from each division, with Seed 5 given to the highest ranked among those two.
  • Seeds 7&8: The two wild card teams, with Seed 7 given to the highest seed among those two.


The methodology is very similar in structure to my other team rankings. At its core, it's a weighted linear regression, with separate dummy variables for each team's offense and defense. One challenge I had to overcome was that hockey betting markets, unlike the NFL and NBA, don't use point (or goal) spreads, but rather use moneylines, which in effect give you implied odds of winning, rather than implied margin of victory.

Fortunately, these odds can be converted into implied goal margins. To convert moneylines to goal margins, take the odds for both the favored and underdog team and convert to probabilities. Because of the vig, these will sum up to something greater than 1. Scale down the probabilities so that they do sum to one, and then convert those to log odds. Expected goal margin is then just:
expected goal margin =  - (log odds) x 1.318
The 1.318 factor was derived via linear regression over multiple seasons of NHL games. For example, for today's Bruins-Red Wings game, the Bruins moneyline is -260 and the Red Wings are +220. This works out to implied probabilities of 69.8% for the Bruins and 30.2% for the Red Wings. Converting to log odds and then multiplying by -1.318, the expected goal margin of this game is 1.10 goals. In other words, the Bruins could be said to be "favored" by 1.10 goals in this game.

With implied goal margins, I now have a parameter that adds and subtracts, and can then be plugged in as my dependent variable in the weighted regression. The weighting is as follows:
weight = 1 / (0.5 + games elapsed)
So, the most recent game gets a weight of 2, the game prior a weight of 2/3, and so on. If you are doing recency weights for any type of modeling, I highly recommend using this form and optimizing that 0.5 parameter in the denominator. I found significant gains in predictive accuracy when switching to this weighting approach. It also has a solid theoretical foundation, as it can be derived from assuming that team strength follows a random walk pattern as the season progresses.

Note that the weights above are for the betting market inputs (moneylines and totals). I also factor in actual results into the model, but the weight factor is as follows:
actual results weight = 1 / (25 + games elapsed)
So, for the most recent game, the actual result only gets a weight of 0.04, which is much smaller than the 2.0 weight given to the betting market information for that same game. The 25 parameter was chosen to optimize prediction accuracy, and it reflects the highly random nature of hockey games. The outcome of one single game is given very little credibility.

These rankings (and the playoff predictions) will update daily. I welcome any feedback on the approach, or if anything here looks off. I know very little about the sport of hockey, so this was largely a pure modeling exercise for me.