Sunday, November 4, 2018

Introducing roboCap: An automated horse race handicapper

For last year's Breeders Cup, I unveiled my first attempt at betting market rankings for thoroughbred horses. I publish similar rankings for a variety of sports, including the NBA, the WNBA, the NFL, College Football and Basketball, Major League Baseball, and, most recently, the NHL.

After quite a long hiatus, I have started republishing these rankings here, and they will update weekly going forward (after the Sunday races are run). In addition, I have created a new tool, roboCap, that uses these rankings to create projected closing odds for any chosen field of horses. The rest of this post gets into the technical details of how this all works, but if you just want to play around with it, here's a quick guide:
  • Enter the track take at the top. This is how much the track takes off the top from the win pool. You can find take percentages for most tracks here. This is needed to accurately project the odds. The higher the take percentage the lower the projected odds (lower odds = higher probabilities). The Churchill Downs track take is 17.5%.
  • Enter the horse names in the first column (autocomplete should help you out). To be in the tool, the horse must have run at least one race in the last 180 days.
  • Click "Handicap!" and the tool will return projected closing odds, along with true win probability and GLA, or Generic Length Advantage. Generic Length Advantage is the source of the projections and represents expected margin of victory (in lengths) against an average thoroughbred in a one mile race.
  • If a horse isn't in the database, or you disagree with the ranking, check the "Override" box and input your own GLA and click "Handicap!" to recalculate the odds.

Saturday, October 13, 2018

Betting Market Rankings for the NHL

In what is surely the second most exciting development of the NHL season, I have added hockey to my suite of betting market team rankings. For those unfamiliar, the basic idea of these rankings is to reverse engineer an implied team ranking from the game by game point spreads, moneylines, and over/unders. See my post at the Advanced NFL Stats Community site for a basic overview of the concept. With this latest addition, I now have daily market rankings for the NFL, College Football, NBA, WNBA, College Basketball, MLB, and the NHL.

The nice thing about market derived rankings is that you can get a reasonable ranking with a relatively small sample size. We are just a week into the season, and the rankings already pass a sniff test. The top 5 of Tampa Bay, Nashville, Winnipeg, Toronto, and Pittsburgh in my rankings are also the top 5 favored teams to win the Stanley Cup.

Saturday, September 29, 2018

The NBA's new shot clock rule and its effect on pace

Earlier this month, the NBA formally approved a change to its shot clock rules. Now, following an offensive rebound, the shot clock will reset to just 14 seconds, instead of the usual 24.

Over at Nylon Calculus, Daniel Massop argues that the effect on pace will be minimal, given that only 6 percent of offensive rebound possessions lasted more than 14 seconds. For a deeper dive, check out Blake Murphy's piece at Uproxx, which uses, among other data points, stats from the NBA's G-League, which went to the 14 second rule two seasons back.

As it turns out, the WNBA was also an early adopter of this rule change, having switched to 14 seconds for the 2016 season. That rule's impact on pace can provide clues to what will happen in the NBA this season.

The chart below shows average seconds per possession in the WNBA for every season.

Saturday, May 5, 2018

Never Bet a Horse Named Joe: Update

Several years back, I tested a theory that horses with popular boys or girls names were overbet in parimutuel markets. My hunch was that the betting public is more likely to bet on a horse if that horse's name contained their own name (or that of their wife, son, daughter, etc.). I arrived at that hunch by extrapolating from a sample size of one - me.

What I found in that original analysis, using a limited dataset of races run in California, was weak evidence for my theory, but that fell short of statistical significance. However, I now have a much more robust dataset, consisting of nearly all races run in North America over the past four years.

With this new dataset of some 200,000 races, I ran a logistic regression on the probability of a horse winning a race using the following variables:

Saturday, January 27, 2018

Odds Predictions for the Pegasus Cup

The 2nd running of the $16 million Pegasus World Cup takes place later today at Gulfstream Park. As I did with the Breeders Cup, I will use my newly developed betting market rankings for horse racing to make odds predictions for this race. First, here are updated rankings, broken down by the following categories:
  • Dirt - Horses that run primarily dirt races of a mile or more
  • Turf - Horses than run primarily on the turf
  • Sprint - Horses that run races less than a mile
These rankings are based on odds data and results for all North American thoroughbred races run over the past 120 days. Using techniques similar to that for my other betting market rankings, I use multivariate regression to build connections between the races and derive what the market "thinks" are the best horses.

GLA stands for "Generic Lengths Advantage" and is the expected margin of victory (in lengths) over an average North American thoroughbred. So, Gun Runner would be expected to beat Arrogate by about 1.4 lengths in a mile race.

Sunday, January 14, 2018

Judging Win Probability Models

February 11, 2018 update: The Brier score chart at the bottom of this post had an incorrect value for the ESPN "Start of Game" score. The corrected numbers (with updates through 2/10/18) can be found in this tweet. With the update, my comments regarding the ESPN model being too reactive no longer apply.

Win probability models tend to get the most attention when they are "wrong". The Atlanta Falcons famously had a 99.7% chance to win Super Bowl LI according to ESPN, holding a 28-3 lead in the third quarter, and the Patriots facing fourth down. Google search interest in "win probability" reached a five year high in the week following the Patriots' improbable comeback.

Some point to the Falcons' 99.7% chances, and other improbable results, as evidence of the uselessness of win probability models. But a 99.7% prediction is not certainty, and should be incorrect 3 out of every 1,000 times. But it's not like we can replay last year's Super Bowl 1,000 times (unless you live inside the head of a Falcons fan).

So, in what sense can a probability model ever be wrong? As long as you don't predict complete certainty (0% or 100%), you can hand wave away any outcome, as I did above with the Falcons collapse. Or take another high profile win probability "failure": the November 2016 Presidential Election. On the morning of the election, Nate Silver's FiveThirtyEight gave Hilary Clinton a 71% chance of winning the presidency and Donald Trump a 29% chance.

Saturday, November 4, 2017

Betting Market Rankings for Horse Racing

In honor of today's Breeders' Cup races, here is my first attempt at creating a betting market ranking for thoroughbred horse racing.

My first real foray into sports analytics was a post to Brian Burke's Advanced NFL Stats Community page on how to derive an implied betting market ranking for the NFL from weekly point spreads. I have since refined that initial approach and extended it to additional sports: the NBA, Major League Baseball, College Football, College Basketball, and the WNBA.

The basic idea is to take the market odds and point spreads for each game and use them to reverse engineer an implied ranking. Horse racing odds use a parimutuel system, which doesn't require bookies/sharps to set prices, but are instead a pure reflection of the money bet by the wagering public. So, a betting market ranking derived from these odds would be a true distillation of the "wisdom of crowds".

But in order to extend my method to horse racing, I had to overcome the following challenges:
  1. Access to data
  2. Converting parimutuel odds to a parameter that "adds" like point spreads do
  3. Creating a method that works for contests with more than two participants
I've since solved the data access issue. On the second issue, I had to solve a similar challenge to develop my rankings for Major League Baseball. Betting markets in baseball use odds (the "money line") rather than point/run spreads, so I had to create a reverse Pythagorean theorem of sorts for baseball that translated win expectancy into run differential.