Saturday, January 27, 2018

Odds Predictions for the Pegasus Cup

The 2nd running of the $16 million Pegasus World Cup takes place later today at Gulfstream Park. As I did with the Breeders Cup, I will use my newly developed betting market rankings for horse racing to make odds predictions for this race. First, here are updated rankings, broken down by the following categories:
  • Dirt - Horses that run primarily dirt races of a mile or more
  • Turf - Horses than run primarily on the turf
  • Sprint - Horses that run races less than a mile
These rankings are based on odds data and results for all North American thoroughbred races run over the past 120 days. Using techniques similar to that for my other betting market rankings, I use multivariate regression to build connections between the races and derive what the market "thinks" are the best horses.

GLA stands for "Generic Lengths Advantage" and is the expected margin of victory (in lengths) over an average North American thoroughbred. So, Gun Runner would be expected to beat Arrogate by about 1.4 lengths in a mile race.

Sunday, January 14, 2018

Judging Win Probability Models

February 11, 2018 update: The Brier score chart at the bottom of this post had an incorrect value for the ESPN "Start of Game" score. The corrected numbers (with updates through 2/10/18) can be found in this tweet. With the update, my comments regarding the ESPN model being too reactive no longer apply.

Win probability models tend to get the most attention when they are "wrong". The Atlanta Falcons famously had a 99.7% chance to win Super Bowl LI according to ESPN, holding a 28-3 lead in the third quarter, and the Patriots facing fourth down. Google search interest in "win probability" reached a five year high in the week following the Patriots' improbable comeback.

Some point to the Falcons' 99.7% chances, and other improbable results, as evidence of the uselessness of win probability models. But a 99.7% prediction is not certainty, and should be incorrect 3 out of every 1,000 times. But it's not like we can replay last year's Super Bowl 1,000 times (unless you live inside the head of a Falcons fan).

So, in what sense can a probability model ever be wrong? As long as you don't predict complete certainty (0% or 100%), you can hand wave away any outcome, as I did above with the Falcons collapse. Or take another high profile win probability "failure": the November 2016 Presidential Election. On the morning of the election, Nate Silver's FiveThirtyEight gave Hilary Clinton a 71% chance of winning the presidency and Donald Trump a 29% chance.

Saturday, November 4, 2017

Betting Market Rankings for Horse Racing

In honor of today's Breeders' Cup races, here is my first attempt at creating a betting market ranking for thoroughbred horse racing.

My first real foray into sports analytics was a post to Brian Burke's Advanced NFL Stats Community page on how to derive an implied betting market ranking for the NFL from weekly point spreads. I have since refined that initial approach and extended it to additional sports: the NBA, Major League Baseball, College Football, College Basketball, and the WNBA.

The basic idea is to take the market odds and point spreads for each game and use them to reverse engineer an implied ranking. Horse racing odds use a parimutuel system, which doesn't require bookies/sharps to set prices, but are instead a pure reflection of the money bet by the wagering public. So, a betting market ranking derived from these odds would be a true distillation of the "wisdom of crowds".

But in order to extend my method to horse racing, I had to overcome the following challenges:
  1. Access to data
  2. Converting parimutuel odds to a parameter that "adds" like point spreads do
  3. Creating a method that works for contests with more than two participants
I've since solved the data access issue. On the second issue, I had to solve a similar challenge to develop my rankings for Major League Baseball. Betting markets in baseball use odds (the "money line") rather than point/run spreads, so I had to create a reverse Pythagorean theorem of sorts for baseball that translated win expectancy into run differential.

Monday, May 29, 2017

Free Throw Deep Dives: Accuracy Scatter Plots

In an effort to fill the void during this year's Superbowl-like interlude before the NBA Finals, I have added one more tool to my series of free throw deep dives: Accuracy Scatter Plots

This tool is essentially an interactive version of the charts I originally published two years ago in my first post analyzing SportVU motion tracking data. For those of you familiar with MLB's PitchF/x system, which can track each ball's placement relative to the strike zone, this is a similar view, but for free throw shooting. By analyzing the raw SportVU data on ball motion and applying a simple physics model, I can chart where a player's free throw shots land relative to the center of the hoop.

Here is a sample PitchF/x chart:
And here is a sample "ShotF/x" chart for Kevin Durant:

The blue dots show made free throws and the red dots are misses. My code does its best to make sense of the SportVU data, but some anomalies remain (e.g. the blue dots that fall well outside the hoop).

Saturday, May 6, 2017

Free Throw Deep Dives: The Windup and the Release

Part three of an infrequent series. Click here to go straight to the interactive tool.

In previous free throw deep dives, I used SportVU ball tracking data to examine how launch angle and release spot affect free throw accuracy. In this post, we back things up a bit, one second to be precise, and dive into the specific mechanics of each player's free throw shot.

For this free throw analysis I focused on the motion of the ball (in all three dimensions) for the second prior to the ball being released. One second, while somewhat arbitrary, was chosen so that I'm capturing the natural shooting motion of the player after any pre-shot routine has been completed (e.g. one dribble, two dribbles, Klay Thompson's weird arm tap thing, etc.).

The ball tracking data is messy, and shooting motion will vary from shot to shot, so I built a simple LOESS model for each player, with the goal of teasing out a player's typical shooting motion in all three dimensions. LOESS models are nice because they don't force you to shoehorn your data into a pre-determined type of curve (e.g. polynomial, exponential, etc.).

Here are the results for Kevin Durant's typical shooting form:

Friday, April 14, 2017

Playoff Seed Probability Motion Charts

As I have done the last few seasons, here are motion charts that show how each team's playoff seed probabilities have evolved and shifted over the 2016-17 season. The probabilities are calculated using my NBA Vegas rankings, which were updated daily

Saturday, January 21, 2017

Free Throw Deep Dives: Picking Your Spot

Note: Similar to my recent "deflategate" post, the following utilizes SportVU data on player and ball position. Sadly, this data was walled off from the public nearly a year ago, meaning what analysis I can do has a limited shelf life. This version of the post had been ready for some time now, but I had intended to expand its scope. However, given the data is nearly a year old, I thought it was best to publish what I had, even if I still consider it incomplete.

Part two of an infrequent series:

The purpose of these posts is to assess what makes for good free throw shooting. The NBA's SportVU system tracks the position of the ball in all three dimensions. I have taken that raw, often messy data and organized it using some freshman level physics. From that simple model, I have created a whole host of new descriptive statistics on player shooting mechanics.

In my first deep dive, I examined how vertical release angle (i.e. high arc, low arc) correlates with free throw success. As it turns out, there is little correlation between the arc of a player's typical shot and their accuracy. For every "high arc" sharpshooter like Stephen Curry, you have equally successful "low arc" shooters like Kyle Korver; or spectacularly unsuccessful high arc shooters like Andre Drummond. I did find a (unsurprising) correlation between consistency in release angle and free throw percentage.

In this post, we will shift focus from the vertical axis to the horizontal. Where do players typically "spot up" from the free throw line, and how important is it to pick a consistent spot?

Release Spot

We'll start with where players tend to release their free throw shot. For all the analysis below, I am using SportVU data going back to the 2013-14 NBA season and ending, sadly, on January 23, 2016 - the date the NBA removed detailed player tracking data from Also, I am excluding all games played at the Warriors' Oracle arena. For reasons unknown, the Oracle SportVU data is very messy and its inclusion was skewing player statistics, particularly those related to consistency.

The chart below shows the average release spot for some 326 NBA players (a player needed to have at least 100 free throws in order to be included).

Now that we are oriented, we will zoom in on the rectangular box: