Monday, September 7, 2015

NFL Market Rankings and the Inequities of Playoff Seeding

The Top Five
With the regular season just a few days away, my market based NFL rankings are now up and running. These update daily and are designed to reflect the betting market's ranking of all 32 NFL teams. The rankings themselves use the betting point spreads and over/unders from each game. A simple weighted linear regression is used to reverse engineer a ranking from these pairwise numbers.

The key metric here is Generic Points Favored, or GPF, and it is what you would expect the team to be favored by against a league average opponent on a neutral field. By using the point spreads in conjunction with the over/under, GPF can be further decomposed into its offensive and defensive components (oGPF and dGPF).

For now, there is just a week's worth of point spreads and totals to work with, which is insufficient to create a proper regression analysis amongst the 32 NFL teams. To get enough connections between the teams, you need at least three weeks of data. To fill in the gaps in the meantime, I have used the full season point spreads as published by Cantor Gaming.

This will be my fourth year publishing these rankings for the NFL. When it comes to predicting future wins, they have outperformed a variety of other ranking systems. I am curious to see how they stack up this year against ESPN's revamped Football Power Index.

Saturday, August 22, 2015

Kind of a Drag

Just a heads up that this post is pretty heavy on math and physics: differential equations, integrals, drag coefficients, air density at varying altitudes, etc. You know, in case you're into that kind of thing.

Most physics problems, especially the undergraduate variety, require simplifying assumptions in order to make them workable - frictionless surfaces, perfect vacuums, spherical cows. This past May, I employed some simple physics to analyze the shooting trajectories of NBA players. The raw location data came from the SportVU location tracking system, and is somewhat noisy. To tease signal from that noise, I assumed the ball's path followed a simple trajectory, the kind which physics majors cut their teeth on as freshman. I then used linear regression to pick a path that best matched the raw data.

One key simplifying assumption made was to ignore the impact of drag on the flight of the ball. As the ball moves through the air, it is pushing that air out of the way, and the air pushes back, slowly degrading the ball's velocity. As it turns out, this effect, while small, was not negligible, and its omission was creating persistent bias in the modeling of free throws and field goals.

In this post, I'll outline my attempts to incorporate drag into my model of a basketball's trajectory, and then test that model's predictions against the raw SportVU data (science!). As a bonus project of sorts, I will also examine whether drag effects are noticeably different for thin air arenas such as those of the Denver Nuggets and Utah Jazz.

Thursday, August 20, 2015

Vegas Never Doubted Clayton Kershaw

Baseball diamond marines.jpg
San Diego's Petco Park
The most "pitcher friendly" park in the
league according to the market
After a long hiatus, betting market rankings for major league baseball are now available and will update daily for what's left of the season. Similar to my rankings for the NFL, NBA, College Football, and College Basketball, these attempt to reverse engineer an implied power ranking from the betting lines and totals for each game. I now have all five sports on a common mathematical framework and intend to share the technical underpinnings of the methodology in a future post.

The Los Angeles Dodgers currently sit atop the market based rankings, despite having just the sixth best record in the league.

Ranking Starting Pitchers

In addition to the standard team rankings, I can also derive a ranking of starting pitchers. Each MLB team is more like 5 to 6 distinct teams, depending upon who takes the mound to start. According to the market, the best pitcher in the league is, and has been, the Dodgers' Clayton Kershaw. Teams facing Kershaw are expected to score 1.27 less runs on average when compared to a league average starter. And despite Kershaw's early season bout with mediocrity, the market didn't blink. Kershaw has remained the top ranked pitcher throughout the season according to my Vegas rankings (check the sparkline next to each starter in the ranking table for a snapshot of their season progression).

Sunday, June 14, 2015

How Much Actual Time is Left in the Game?

The Persistence of NBA Game Time
Close games in the NBA can somewhat of a mixed blessing. As the drama increases, the pace tends to drag, as teams call timeouts to draw up set plays, players are fouled to stop the clock, and officials await decisions from the NBA's centralized review office in New Jersey. On average, the final minute of an NBA game takes over 5 minutes of real time to complete, and that number gets much larger if the game is close.

Here are some cheat sheets I put together that give you a sense as to how much real time is left in an NBA game, as a function of scoring margin and game time remaining. I had hoped to build a real time view of this into my live win probability graphs, but that will likely have to wait for the offseason.

Monday, June 8, 2015

New article at FiveThirtyEight: A Win Probability Guide to US vs. Australia

I have a new article up at FiveThirtyEight: A Win Probability Guide to US vs. Australia. Continuing my rather unhealthy obsession with in-game/in-match win probability, I took last year's work on mens soccer probability and applied it to the womens game. The US is a heavy favorite against Australia, and will be for all three of its so-called "Group of Death" matches. Win probabilities for mismatches evolve very differently than for evenly matched teams, and the post is a guide to how the US outlook evolves in the event they don't establish an early lead.

Data for womens sports is not exactly abundant, and soccer was no exception. The data for the model was culled from recent seasons of top league play. It took some effort, as well as some tedious cleaning to compile the data, and in the end, my final dataset consisted of just 950 matches (compared to the 3,000+ matches that all but fell in my lap for mens soccer). So, the model built here may be somewhat more prone to noise (or overly smoothed) than models built from a more robust dataset.

Thursday, June 4, 2015

Profiling the Warriors' Free Throw Shooters

Last week's piece on the analysis of shooting arcs (which I pretentiously named "ShArc") received a lot of positive feedback, which was nice. As I indicated at the end of the post, there are about a hundred different directions I can take this, and any meaningful next steps will probably take place in the offseason.

But since it is the eve of the NBA Finals, I thought I'd share some more data, as well as share an approach for creating a simple visualization of each player's free throw accuracy.

The chart below represents some 30,000 free throws taken in the NBA this past season, with each circle representing a shot and where that shot crossed the hoop's threshold. Scatter plots can be a great visualization tool, but when you've got thousands upon thousands of data points, the human eye can only discern so much. To make sense of the this chaotic scatter, I calculated a boundary ellipse around the data, defined as an one that encircles 75 percent of the data points within the smallest area (h/t to Rasmus Baath of R-Bloggers for the idea and code).

Tuesday, May 26, 2015

Introducing ShArc: Shot Arc Analysis

This post has been in the works for several months now, and is the result of a deep dive into the NBA's SportVU motion tracking data. It's a project I picked up when I had time, and put down when I got stuck (which was often). What follows is rough and unpolished, but for me, its creation was a lot of fun. It allowed me to take my first and most abiding academic obsession, physics, and combine it with my more recent obsessions of data mining and sports analytics. In short, I nerd-sniped myself.

The Mixed Blessing of Big Data

The NBA is in the midst of a data-explosion that at times feels paralysis-inducing. We're spoiled for choice, with literally hundreds of thousands of data points, per game, begging to de dissected and analyzed. And yet, progress has been made. Kirk Goldsberry of Grantland, an early pioneer of shot location data, last year introduced a new SportVU-based stat called EPV, or Expected Point Value. EPV evaluates a team's expected points on a real time basis, as the possession evolves, accounting for shot clock, ball location, and the position of all ten players on the court. A new "microeconomics" for the NBA, as Goldsberry and his coauthors described it in their Sloan Analytics paper.

The NBA itself, in addition to funding and supporting the addition of the cameras, has also developed a whole host of new stats from this "big data" they helped create: Catch and Shoot, Defense at the Rim, Pull Up Shooting, among many, many others. There is also the work of the smart people at Nylon Calculus, making sense of the SportVU data with simple metrics built from common sense understanding of the game of basketball (as opposed to inscrutable mathematical black boxes).

In this post, I begin my own foray into the NBA's big data world, but with a different focus. While it's common knowledge that SportVU data provides location in two dimensions for every player on the court, what may not be widely appreciated is that the ball itself is tracked in all three dimensions. When developing EPV with his students, Kirk Goldsberry code named the work "XY Hoops". Consider the work below an "XYZ Hoops" project of sorts.