Introducing roboCap: An automated horse race handicapper

roboCap

For last year's Breeders Cup, I unveiled my first attempt at betting market rankings for thoroughbred horses. I publish similar rankings for a variety of sports, including the NBA, the WNBA, the NFL, College Football and Basketball, Major League Baseball, and, most recently, the NHL.

After quite a long hiatus, I have started republishing these rankings here, and they will update weekly going forward (after the Sunday races are run). In addition, I have created a new tool, roboCap, that uses these rankings to create projected closing odds for any chosen field of horses. The rest of this post gets into the technical details of how this all works, but if you just want to play around with it, here's a quick guide:

Enter the track take at the top. This is how much the track takes off the top from the win pool. You can find take percentages for most tracks here. This is needed to accurately project the odds. The higher the take percentage the lower the projected odds (lower odds = higher probabilities). The Churchill Downs track take is 17.5%.
Enter the horse names in the first column (autocomplete should help you out). To be in the tool, the horse must have run at least one race in the last 180 days.
Click "Handicap!" and the tool will return projected closing odds, along with true win probability and GLA, or Generic Length Advantage. Generic Length Advantage is the source of the projections and represents expected margin of victory (in lengths) against an average thoroughbred in a one mile race.
If a horse isn't in the database, or you disagree with the ranking, check the "Override" box and input your own GLA and click "Handicap!" to recalculate the odds.

So, is this tool any good? More detail is shared below, but here's what I have found:

By one measure, this tool does a better job than the morning line at predicting the closing odds of horse races. Over some 23,000 races over the past year, this model has a better Kullback-Liebler divergence score than the morning line.
However, I have found that when compared against the top tier tracks (tracks with an average win/place/show pool >$75k), the morning line does better than my simple model.
And when measured against actual results, this model holds up fairly well. It has a better log-likelihood score for winners than the morning line on average, even for the top tier tracks.
But before you use this tool to start making bets, it is important to call out that this tool is not as accurate as the closing odds when picking winners. And that applies across all tracks.

Given the simplicity of the model, I am truly surprised it performs as well as it does. It takes as inputs just closing odds and finishing margin for each horse. When making predictions for races, it assumes that everything about a horse's ability to win is reflected in a single number, Generic Lengths Advantage. Here is what roboCap does not consider when making predictions: track, surface, distance, jockey, weight, rest, recency, pace, medication, or any other myriad factors that are considered in traditional handicapping.

Building a betting market ranking for thoroughbreds

There is a ton of information encoded in the betting market information. My rankings employ a pretty simple weighted linear regression to harness that information and then reverse engineer an implied ranking. My initial attempt at this for the NFL has a pretty good overview, even though some of the math has since changed.

Creating these types of rankings is fairly straightforward for binary, point spread based sports. But for horse racing, I had to overcome a few difficulties:

Data access: As a sport, horse racing is simultaneously flush with data and stubbornly unwilling to share said data freely. However, I was able to find workarounds for this, and have a fairly robust database of all races run in the US and Canada, including, most importantly, closing odds and finishing position and margin for each horse.
Converting odds into lengths: Horse race betting is based on odds, not margin of victory. Odds and probabilities don't add and subtract nicely like point spreads do, so you can't easily plug them into a linear regression. But I have a formula (shared below) that can convert the odds of a horse into an implied average "margin of victory", in lengths.
Horse races are not binary contests: Unlike most sports, a horse race is a competition between at least 3 or more entrants. Mathematically, this is a bit more challenging to model. I overcame this by including a dummy variable for each race, which, in effect, represents the average strength of all horses in that specific race.

Converting odds to margin of victory

As mentioned above, to get the math to play nice, I first have to take the closing odds for a race and then convert that into an implied margin of victory. The approach is as follows:

Convert the odds into probabilities and remove the track take so that the probabilities sum to 1. Tracks take a percentage of the pool off the top, so the published odds imply a higher probability of victory than the "true" odds. To use this year's Kentucky Derby as an example, Justify's closing odds were 2.90. 2.90 odds implies a 25.6% win probability. To get the true win probability, multiply by 1 minus the track take (=1 - 0.175) to get 21.1%.
Multiply the true win probability by the number of horses, take the natural logarithm, and multiply by 4.222. There were 20 horses in the Derby this year, so: ln(20 * 0.211) * 4.222 = 6.08.
6.08 is the expected average margin of victory (in lengths) for a horse with a 21.1% win probability in a field of 20 for a one mile race. The 4.222 factor from the previous step was derived from a linear regression over 200,000 races run over the past 4 years.
The Derby, however, is a 1 1/4 mile race. So, the projected average margin of victory for Justify was 1.25 * 6.08 = 7.60 lengths.
This now also allows us to create an objective measure of how a horse performed against the market's expectations. Justify's actual average margin of victory in the Derby was 14.5 lengths. So, Justify outperformed the market's expectations by 6.9 lengths.

Building the regression model

Using the steps above to convert odds into expected margin of victory, the linear regression is built using dummy variables for each horse and race. Each horse/race combination is one row in the base dataset used for the regression. Here is what Justify's row would look like for the Derby:

[Justify] + [Kentucky Derby] = 6.08

As I do with other sports, I weight recent races more heavily than older races in an effort to get at what the market thinks now. The recency weight is:

weight = 1 / (races elapsed + 0.25)

For a horse's most recent race, the weight would be 4, the race prior gets a weight of 4/5, the race prior to that 4/9, and so on. The 0.25 factor in the denominator was calibrated to maximize accuracy of odds predictions.

In the previous section I showed how one could compare expected and actual margin of victory for a given horse in a race. Naturally, one would expect the market to react to a horses's performance against expectations in a race and adjust future odds accordingly. For example, Justify's market ranking should have gone up after his performance in the Derby. I found that the market treats a horse's performance in a single race with 25% credibility.

So, instead of 6.08 used in the regression model above, it would be a credibility blend of 6.08 at 75% and 14.5 (actual margin of victory) at 25%. However, to avoid actual margin of victory being skewed by horses that simply give up in a race, I cap all horses' lengths behind at 20, and then calculate average margin. Mendelsshon finished 53 lengths off the lead in the Derby, but Justify shouldn't get credit for most of that. After capping at 20, Justify's adjusted margin of victory is 12.0 lengths. Giving this a 25% weight, here is what the regression model ultimately uses for Justify in the Derby:

[Justify] + [Kentucky Derby] = 7.56

This adjustment for performance against expectations gave roboCap a significant boost in predictive power.

Now it's time to run the regression. The model looks back over all races run over the past 180 days. With a dummy variable for each horse and a separate dummy variable for each race, this works out to roughly 50,000 independent variables. Even with sparse techniques, this takes about an hour to run on my iMac. But what emerges is a surprisingly accurate model that can be used to predict odds and outcomes for horses across all classes, from $10k claiming races to the multi-million dollar stakes races run yesterday at Churchill.

To calculate odds for a future race, the math is very simple, and in effect just reverses the steps we took above. The roboCap tool does the math for you, but it could easily be done by hand. For each horse, you take their Generic Lengths Advantage (GLA). GLA is simply the regression coefficient that the linear model outputs for the horse. Divide GLA by 4.222 and then exponentiate. Sum these numbers for all horses in the race. A horse's win probability is just their share of that sum.

How accurate is roboCap?

Fortunately, I have a nice benchmark to compare against: the morning line. The morning line represents the projected odds for a horse in a given race, according to the track's official handicapper. For each race, we have the morning line odds, roboCap's odds, and then the actual final odds. If we view this as a prediction contest for a probability distribution. we can use Kullback-Liebler divergence to see which set of odds, the morning line or roboCap's, is closer to the actual odds.

Here are the results, split by "top" tracks and all other. A top track is defined as one with an average win/place/show pool of at least $75k. This includes the usual suspects like Churchill, Santa Anita, and the NYRA tracks, as well as tracks like Golden Gate Fields and Arlington Park. The results below span races over the past 18 months. To be counted, my model had to have a prediction for every horse in the race (i.e. any races with horses that hadn't run in the past 180 days are excluded).

kullback-liebler divergence (lower is better)
Tracks	races	morning line	inpredictable
top tracks	7,575	0.088	0.109
all other	15,910	0.124	0.104
total	23,485	0.113	0.105

Overall, roboCap is better at forecasting odds than the morning line, with a 0.008 lower divergence score in total. It does appear that the top tracks employ better handicappers, beating the lower tier track handicappers by 0.036 and beating roboCap by 0.019.

As another example, I tweeted out odds predictions for five Breeders Cup races yesterday:

The Classic: roboCap with a better divergence score than the morning line
The Dirt Mile: the morning line with a better divergence score than roboCap
The Twinspires Sprint: roboCap with a better divergence score than the morning line
The Longines Distaff: the morning line with a better divergence score than roboCap
The Chilukki Stakes: the morning line with a better divergence score than roboCap

Overall, roboCap went 2-3 against the Churchill Downs morning line, which I would consider a respectable performance.

I will reiterate that I am surprised how well roboCap works, given its simplicity. I think the lesson learned here is that what matters most when handicapping a horse is the horse's raw ability. Other factors such as distance, surface, weight, jockey, recency, pace, and rest may be important, but their importance is on the margins (but the margin is often where you can find your edge). I suppose this is not a groundbreaking insight, on par with my discovery that shooting is important to the game of basketball.

We can also evaluate roboCap on what we really care about: the ability to pick winners. For this test, we can compare against both the morning line and the closing odds. The test we will use is log-likelihood. For each race, we take the win probability of the winning horse according to the closing odds, morning line, and roboCap. The higher the probability for the winning horse, the better the score for the model. For example, for the 2018 Kentucky Derby, the morning line had an implied win probability of 20.6% for Justify (3-1 odds), while the closing line (2.90 odds) had an implied win probability of 21.1%. In this case, the closing odds had a slightly better likelihood score than the morning line.

For mathematical convenience, we take the natural logarithm of each probability to get log-likelihood. Here are the results. A higher score is better, but note that the log-likelihood numbers below are negative.

average log-likelihood of winning horse (higher is better)
Tracks	races	closing odds	morning line	inpredictable
top tracks	7,575	-1.646	-1.764	-1.760
all other	15,910	-1.598	-1.753	-1.696
total	23,485	-1.613	-1.757	-1.716

roboCap has a better log-likelihood score than the morning line, and it also beats out the top tier tracks as well (just barely). I have noticed that handicappers for the morning line seem reluctant to project significantly heavy favorites or extreme long shots, and try to "herd" the odds a bit, which could be driving a less than optimal log-likelihood score.

But the market is hard to beat. Closing odds are significantly better than roboCap or the morning line in predicting winners. A 0.10-0.15 gap in log-likelihood represents a meaningful difference in predictive accuracy.

As mentioned above, roboCap and the rankings tables will update weekly, after the Sunday races are run.

Introducing roboCap: An automated horse race handicapper