Sunday, July 17, 2016

Betting Market Rankings for the WNBA

The WNB-Ays
I have added the WNBA to my suite of betting market rankings, to go alongside those for the NBA, NFL, MLB, College Football, and College Basketball. The purpose of these rankings is to reverse engineer an implied power ranking from the Vegas point spreads, essentially distilling the combined wisdom of the market.

Here are the rankings as of July 17:

GPF stands for "Generic Points Favored". It is what you would expect a team to be favored by against a league average opponent on a neutral court. By combining the betting over/under with the point spread, I can decompose GPF into its offensive and defensive components, oGPF and dGPF (note: offense and defense are on a points allowed per game basis, rather than points per possession - there is no way to derive implied per possession metrics from the betting data). GOU stands for "Generic Over/Under" and it is what you would expect the betting over/under to be for that team when playing an average opponent.

These rankings largely track to win-loss records, although there are some differences. The defending champion Minnesota Lynx are at the top of the rankings, despite being several games behind the Los Angeles Sparks in the overall standings. The Atlanta Dream hold a better record than the Phoenix Mercury, but would be 3 point underdogs against them on a neutral court.

Mathematically, these rankings work very similarly to the NBA rankings. Home court advantage is worth 3.25 points to the point spread, consistent with the NBA. Unlike my NBA rankings, I do not make any adjustments for teams playing on no days of rest (not enough time to do it properly).

The model itself is a fairly straightforward weighted linear regression of the form:
  • Team A - Team B = point spread
Each team is assigned a dummy variable and the point spread is adjusted for home court advantage. For example, the Los Angeles Sparks are 8 point favorites on the road today against the Atlanta Dream. If home court advantage is worth 3.25 points, then the point spread would imply that the market thinks the Sparks are 11.25 points "better" than the Dream.

Because I want to know what the market is thinking now, I weight recent games more heavily. In my original formulation of these market rankings for the NFL, I used a simple 3-2-1 weighting for the most recent three weeks of games. Eventually though, I hit upon a more theoretically sound approach that also resulted in better accuracy in predicting future lines. The weights used in these rankings have the following form:
  • weight = 1 / (elapsed games + 0.25)
For example, for the most recent game for a particular team, the elapsed games is zero, and the weight would be 4 (= 1/(0 + 0.25). For the game immediately prior to that, the elapsed games is 1, so the weight would be 0.8 (= 1/(1 + 0.25)). And so on.

The form of the weight function, 1 / (elapsed games + constant), wasn't arrived at arbitrarily. It turns out you can derive the weight function by assuming the market's evaluation of team strength follows a random walk process. Under that assumption, the error term for a more distant game is larger than that for a more recent one. Using that modified error term and then deriving the linear regression equations via maximum likelihood results in a weighted regression with a weight term of the form above.

Random walk processes are sometimes referred to as a "drunkard's walk", and we can extend that analogy to this model. Imagine you are trying to figure out where your bar-hopping, inebriated friend is from a series of drunken texts they have sent you. The five minutes old text is a more reliable indicator of their location than the text you got from them an hour ago. But neither is completely reliable because, well, they're drunk. Today's point spreads are like that most recent text, and are given the most weight in the model.

I also include actual game results in the rankings because I found that it helps improve accuracy in predicting the Vegas point spread. However, the the error in a single game's results is far greater than the market point spread, so game results are weighted significantly less. The weight function is of the same form, but with a different denominator:

  • results weight = 1 / (elapsed games + 3.5)
The 3.5 term in the denominator means that the most recent game's results is treated with less than 10% of the weight that we give the point spread. Both parameters in the weight function were chosen because they optimized the accuracy of predicting future point spreads.

I'm glossing over some of the technical details in the modeling here, but if there is interest, I can lay this out a bit more formally in a future post. As with my other market rankings, these update daily with the latest game results and point spreads.


  1. Which betting line do You use? Open / closing?

    1. Closing, unless it's the day of. I take the median line from sbrodds.

  2. Did you calculate HCA separately for WNBA or just assume it was the same as NBA? Based on Massey ratings, it seems the average HCA is less than 3.25 this year, but has bounced around quite a bit over the years.

    1. To add to that, NBA regular season HCA is probably closer to 2.5 points now. The past few seasons the final HCA value on Sagarin has been: 3.14, 2.37, 2.49, 3.37. However, those numbers include playoff games, which had an average of 4.5 points of HCA from 2003 to 2013 according to the article below. So the regular season only value is likely less than the value shown on Sagarin's site.

    2. It is based on the average point spread given by Vegas for WNBA home teams over the past four seasons. The raw number is around 3.20, excluding playoffs. This year is running a bit lower at 2.71, but hard to see if that is just volatility.