Saturday, May 5, 2018

Never Bet a Horse Named Joe: Update

Several years back, I tested a theory that horses with popular boys or girls names were overbet in parimutuel markets. My hunch was that the betting public is more likely to bet on a horse if that horse's name contained their own name (or that of their wife, son, daughter, etc.). I arrived at that hunch by extrapolating from a sample size of one - me.

What I found in that original analysis, using a limited dataset of races run in California, was weak evidence for my theory, but that fell short of statistical significance. However, I now have a much more robust dataset, consisting of nearly all races run in North America over the past four years.

With this new dataset of some 200,000 races, I ran a logistic regression on the probability of a horse winning a race using the following variables:

Saturday, January 27, 2018

Odds Predictions for the Pegasus Cup

The 2nd running of the $16 million Pegasus World Cup takes place later today at Gulfstream Park. As I did with the Breeders Cup, I will use my newly developed betting market rankings for horse racing to make odds predictions for this race. First, here are updated rankings, broken down by the following categories:
  • Dirt - Horses that run primarily dirt races of a mile or more
  • Turf - Horses than run primarily on the turf
  • Sprint - Horses that run races less than a mile
These rankings are based on odds data and results for all North American thoroughbred races run over the past 120 days. Using techniques similar to that for my other betting market rankings, I use multivariate regression to build connections between the races and derive what the market "thinks" are the best horses.

GLA stands for "Generic Lengths Advantage" and is the expected margin of victory (in lengths) over an average North American thoroughbred. So, Gun Runner would be expected to beat Arrogate by about 1.4 lengths in a mile race.

Sunday, January 14, 2018

Judging Win Probability Models

February 11, 2018 update: The Brier score chart at the bottom of this post had an incorrect value for the ESPN "Start of Game" score. The corrected numbers (with updates through 2/10/18) can be found in this tweet. With the update, my comments regarding the ESPN model being too reactive no longer apply.

Win probability models tend to get the most attention when they are "wrong". The Atlanta Falcons famously had a 99.7% chance to win Super Bowl LI according to ESPN, holding a 28-3 lead in the third quarter, and the Patriots facing fourth down. Google search interest in "win probability" reached a five year high in the week following the Patriots' improbable comeback.



Some point to the Falcons' 99.7% chances, and other improbable results, as evidence of the uselessness of win probability models. But a 99.7% prediction is not certainty, and should be incorrect 3 out of every 1,000 times. But it's not like we can replay last year's Super Bowl 1,000 times (unless you live inside the head of a Falcons fan).

So, in what sense can a probability model ever be wrong? As long as you don't predict complete certainty (0% or 100%), you can hand wave away any outcome, as I did above with the Falcons collapse. Or take another high profile win probability "failure": the November 2016 Presidential Election. On the morning of the election, Nate Silver's FiveThirtyEight gave Hilary Clinton a 71% chance of winning the presidency and Donald Trump a 29% chance.