Sunday, June 14, 2015

How Much Actual Time is Left in the Game?

The Persistence of NBA Game Time
Close games in the NBA can somewhat of a mixed blessing. As the drama increases, the pace tends to drag, as teams call timeouts to draw up set plays, players are fouled to stop the clock, and officials await decisions from the NBA's centralized review office in New Jersey. On average, the final minute of an NBA game takes over 5 minutes of real time to complete, and that number gets much larger if the game is close.

Here are some cheat sheets I put together that give you a sense as to how much real time is left in an NBA game, as a function of scoring margin and game time remaining. I had hoped to build a real time view of this into my live win probability graphs, but that will likely have to wait for the offseason.

Monday, June 8, 2015

New article at FiveThirtyEight: A Win Probability Guide to US vs. Australia

I have a new article up at FiveThirtyEight: A Win Probability Guide to US vs. Australia. Continuing my rather unhealthy obsession with in-game/in-match win probability, I took last year's work on mens soccer probability and applied it to the womens game. The US is a heavy favorite against Australia, and will be for all three of its so-called "Group of Death" matches. Win probabilities for mismatches evolve very differently than for evenly matched teams, and the post is a guide to how the US outlook evolves in the event they don't establish an early lead.

Data for womens sports is not exactly abundant, and soccer was no exception. The data for the model was culled from recent seasons of top league play. It took some effort, as well as some tedious cleaning to compile the data, and in the end, my final dataset consisted of just 950 matches (compared to the 3,000+ matches that all but fell in my lap for mens soccer). So, the model built here may be somewhat more prone to noise (or overly smoothed) than models built from a more robust dataset.

Thursday, June 4, 2015

Profiling the Warriors' Free Throw Shooters

Last week's piece on the analysis of shooting arcs (which I pretentiously named "ShArc") received a lot of positive feedback, which was nice. As I indicated at the end of the post, there are about a hundred different directions I can take this, and any meaningful next steps will probably take place in the offseason.

But since it is the eve of the NBA Finals, I thought I'd share some more data, as well as share an approach for creating a simple visualization of each player's free throw accuracy.

The chart below represents some 30,000 free throws taken in the NBA this past season, with each circle representing a shot and where that shot crossed the hoop's threshold. Scatter plots can be a great visualization tool, but when you've got thousands upon thousands of data points, the human eye can only discern so much. To make sense of the this chaotic scatter, I calculated a boundary ellipse around the data, defined as an one that encircles 75 percent of the data points within the smallest area (h/t to Rasmus Baath of R-Bloggers for the idea and code).

Tuesday, May 26, 2015

Introducing ShArc: Shot Arc Analysis

This post has been in the works for several months now, and is the result of a deep dive into the NBA's SportVU motion tracking data. It's a project I picked up when I had time, and put down when I got stuck (which was often). What follows is rough and unpolished, but for me, its creation was a lot of fun. It allowed me to take my first and most abiding academic obsession, physics, and combine it with my more recent obsessions of data mining and sports analytics. In short, I nerd-sniped myself.

The Mixed Blessing of Big Data

The NBA is in the midst of a data-explosion that at times feels paralysis-inducing. We're spoiled for choice, with literally hundreds of thousands of data points, per game, begging to de dissected and analyzed. And yet, progress has been made. Kirk Goldsberry of Grantland, an early pioneer of shot location data, last year introduced a new SportVU-based stat called EPV, or Expected Point Value. EPV evaluates a team's expected points on a real time basis, as the possession evolves, accounting for shot clock, ball location, and the position of all ten players on the court. A new "microeconomics" for the NBA, as Goldsberry and his coauthors described it in their Sloan Analytics paper.

The NBA itself, in addition to funding and supporting the addition of the cameras, has also developed a whole host of new stats from this "big data" they helped create: Catch and Shoot, Defense at the Rim, Pull Up Shooting, among many, many others. There is also the work of the smart people at Nylon Calculus, making sense of the SportVU data with simple metrics built from common sense understanding of the game of basketball (as opposed to inscrutable mathematical black boxes).

In this post, I begin my own foray into the NBA's big data world, but with a different focus. While it's common knowledge that SportVU data provides location in two dimensions for every player on the court, what may not be widely appreciated is that the ball itself is tracked in all three dimensions. When developing EPV with his students, Kirk Goldsberry code named the work "XY Hoops". Consider the work below an "XYZ Hoops" project of sorts.

Friday, May 15, 2015

Heavy Favorites Usually Don't Surrender Big Leads (usually)

The Houston Rockets pulled off the second most improbable comeback of this year's playoffs last night. Down by 19 with 2 minutes left in the third, the Rockets finished the game on a ridiculous 49-18 run to force a game seven in their conference semifinals series with the Clippers.

From 2000 to 2012, there were 624 games in which a team trailed by 19 with 2 minutes left in the third. In just 12 of those games (1.9%) did the trailing team go on to win. But that includes all games, including those in which a heavily favored team fights back from a steep deficit.

The Rockets were 8.5 point underdogs against the Clippers, and heavy underdogs rarely pull off what Houston did last night. Here is the raw data from the 2000-2012 NBA seasons (the raw data behind my win probability model).

two minutes left in the third:
all games 7.5 to 12 pt underdogs
trailing by games won pct games won pct
21 500 5 1.0% 178 0 0.0%
20 599 10 1.7% 203 1 0.5%
19 624 12 1.9% 193 0 0.0%
18 749 17 2.3% 212 3 1.4%
17 843 26 3.1% 242 2 0.8%

Out of 193 games, not a single underdog of 7.5 to 12 points came back from a 19 point deficit. These raw numbers are fairly consistent with the two win probability graphs for this game (by design):

  • The "50/50" version which ignores team strength differences. The Rockets low point was 2.3% in that version.
  • The "pre game" version which factors in the Vegas point spread, with a low point of 0.7% for the Rockets.

Thursday, May 14, 2015

The Clutch Shooting of Paul Pierce

Paul Pierce currently leads all NBA players in win probability added for the 2015 playoffs. Win probability added, as I've defined it, measures the impact a player's shots, free throw attempts, and turnovers has on his team's chances of victory. It deliberately gives more weight to "clutch" shots that occur during crucial game situations - situations such as Pierce's go ahead (temporarily so) three point shot last night against the Hawks.

That shot, which put the Wizards up 1 with 8 seconds left, increased their win probability by 50%, from 18% to 68%. Had Pierce missed, the Wizards win probability would have dropped to 8%, a total potential swing of 60% riding on Pierce's shot. As I did earlier this week with Derrick Rose, I can measure Pierce's shooting performance during similar high pressure, clutch situations.

Saturday, May 9, 2015

The Clutch Shooting of Derrick Rose

The bank was open late for Derrick Rose last night. The Bulls took a two games to one lead over the Cavaliers, thanks to his buzzer beating three pointer. "I don't mean to sound cocky", Rose said, "but that's a shot you want to take if you are a player in my position".

Clutch shooting isn't easy, and it gets progressively harder as the stakes increase. For Rose's bank shot last night, roughly 50% of the game's win probability hung in the balance (make: win, miss: overtime). Last night's shot fell (with a little help from the United Center glass), but how has Rose fared in similar clutch situations, and how does that compare to league averages? The table below summarizes Derrick Rose's three point field goal percentage as a function of win probability "swing" (i.e. the stakes).