NBA Home Court Advantage and Rest

UPDATE (4/7/2012): As mentioned in this post, there was an error in the original analysis. After the correction, the back-to-back penalty changed from 0.75 points to 1.25 points.

The concept of rest, and its impact upon the game, seems to come up most often in the NBA, as compared to other sports. In baseball, it's key when looking at pitchers, but doesn't seem to be a factor on a team level. In the NFL, with teams getting one bye week per season, there is evidence that the extra week's rest gives a team a slight advantage coming off that bye week. But this affects, at most, only 32 games of the 256 game regular season schedule.

The NBA, on the other hand, features matchups on a daily basis where one team has a rest advantage over the other. If rest is defined as days since most recent game, half of the games played in the NBA from 2003-2010 were games in which one team had more rest than the other. The purpose of this post is to determine to what extent rest asymmetries are factored into the betting lines of each NBA game. I plan on using the results in future NBA team rankings.

Results

I've saved the detailed description of the methodology for the end of the post. Here are the takeaways:

A team playing a game with no days rest is penalized by ~~0.75~~ 1.25 points in the point spread.

"True" home court advantage is closer to 3.25 points instead of 3.5 points. The average home court advantage shows up as 3.5 points because home teams are less likely to be playing a game on no rest.

The second result is consistent with a 2007 paper on NBA home court advantage and rest, which found that 0.3 points of home court advantage could be explained by rest differences.

I couldn't find any evidence of a point spread adjustment for teams playing 2 games in 3 days or 3 games in 4 days (beyond the existing 0 days rest adjustment). That's not to say it doesn't exist, but I couldn't derive it from the data.

The Data

I used data from the 2003-2010 NBA seasons. I first created a "residual line" my running my existing methodology to predict the point spread and then subtracting that from the actual line, leaving the remaining residual. I figured this would allow me to filter out the noise in the point spread due to varying team strength, leaving a "purer" signal to test against rest differences. Here are the results (the residual column is how much higher or lower the point spread was in favor of the home team):

home team	road team	games	residual	line	margin
>0 days rest	>0 days rest	4,795	-0.3	3.1	2.8
0 days rest	>0 days rest	485	-1.0	2.1	0.9
>0 days rest	0 days rest	1,887	0.5	4.4	4.8
0 days rest	0 days rest	681	-0.1	3.1	3.3

As you can see above, road teams play on 0 days rest 4 times as often as home teams do (1,887 games vs. 485 games). I added the average line and average margin for reference, but it's the residual column that I am using.

I prefer simple rules to complex ones, so I am simplifying the above results to the following (which is identical to the first bullet point up above). Basically, I now have four adjustments for home court advantage, based on rest, instead of the single 3.5 point adjustment I was using before.

New Home Court Advantage Adjustments:

home team	road team	HCA
0 days rest	0 days rest	3.25
0 days rest	>0 days rest	2.00
>0 days rest	0 days rest	4.50
>0 days rest	>0 days rest	3.25

Detailed Description of the Methodology

As mentioned above, to determine the impact of rest on the betting line, I first created a "residual line". The residual line is effectively the amount of point spread left over after subtracting the prediction derived from my existing ranking methodology (which is a function of team strength and home court advantage). I then used Decision Trees and the CART Algorithm to determine which rest variables could predict that remaining, residual line.

The advantage of the CART algorithm is that I could throw multiple rest-related variables at it, and then let the algorithm pick the predictive ones. Another advantage is its non-linearity. Decision trees are well suited for teasing out non-linear boundaries in your data (for example, it may be unlikely that a team with 1 day rest would perform three times as better with three days rest).

I fed the following variables into the CART algorithm:

rest1 - days of rest for current game
rest2 - days of rest for game prior to the current game
rest3 - days of rest for game two games prior to the current game
tot2 - rest1 + rest2 (this allowed me to mark teams playing two games in three days - tot2 would be 1)
tot3 - rest1 + rest2 + rest3 (this allowed me to mark teams playing three games in four days)

The decision tree was "pruned" using a cost-complexity parameter that was optimized using cross-validation (I used the R package "rpart" for this).

Aside from "rest1", none of the other variables proved to be predictive. Put another way, when trying to "train" the model on a subset of the data, if I let the model drill down to rest differences beyond rest1, it actually did a worse job of predicting the lines for the holdout data set (this is what cross validation is).

As with most NBA analysis on this site, my data source was killersports.com.