Saturday, October 10, 2015

PitchF/x for the NBA

One of sports analytics' first "big data" moments came with Major League Baseball's introduction of PitchF/x in 2006. With the help of cameras installed in each stadium, each pitch is now tracked at an obsessive level of detail. Where once we had just "ball" or "strike", pitches can now be classified according to pitch type, velocity, release point, and movement.

In addition, strike zone position is also tracked for every pitch, and can then be converted into nifty looking scatter plots, such as those found on Fangraphs. The chart below is the scatter plot for Max Scherzer's recent no-hitter, his second of the season.
Earlier this year, I rolled out my attempt at PitchF/x-style analysis for the NBA. "ShotF/x" would have been a snappy name, but probably a trademark violation. So, I called it ShArc instead, short for Shot Arc Analysis. In that initial post, I used a simple physics model to examine the finer details of player free throw shooting - shot angle, release height, peak height, and accuracy. The raw data came from SportVU, the NBA's version of the PitchF/x system. Because the data was noisy, I applied some basic physics to tease out the most likely trajectory from the scatter of data points.

Over the summer, I refined the underlying physics model to account for the impact of air resistance, allowing for better plotting of shot accuracy (i.e. how "on target" the shot was relative to the center of the hoop). The new model has now been extended to include field goal shooting.

Here is a scatter plot of all field goal shots from the 2014-15 season. To orient you to the graph, imagine you are looking down on the hoop (the red circle) from the rafters of the arena. Each grey circle represents where a shot crossed the plane of the hoop.
To organize the data, there are two ovals included on the graph. The larger dark grey one encircles 75% of all shots. The blue oval covers 75% of all made shots. If nothing else, the two ovals seem to validate the accuracy of the trajectory model. Shot location was modeled without knowledge of hoop location, so it is good to see that the shots seem nicely centered around the hoop.

The above chart is for all shots. Here are all shots from the paint (note - for this chart and the ones that follow, I am using shot location categories from

Shots from the paint will be from a closer distance, so it may be a surprise that the shot distribution is more dispersed, rather than less. Close range shots do tend to be better defended, which could lead to wider variation. In addition, the margin for error is greater for close range shots. The ball's approach velocity is much lower, allowing for the ball to carom off the rim or backboard before falling through.

Once you get into mid-range shots and beyond, you're largely in swish-or-miss territory, as you can see from the mid-range chart below. The "75% of makes" circle is more tightly centered within the hoop.

Three point shots "above the break" (i.e. not from the corners) show a similar pattern as mid-range.

The corner three charts are interesting. Instead of being centered on the hoop, corner threes are "too strong" on average.

This shouldn't be too surprising as corner threes are from a shorter distance than your standard three pointer. This is also consistent with work from Grantland's Kirk Goldsberry, who found that corner threes are more likely to be rebounded on the weak side (i.e. the opposite side from which they were shot).

So, what's the significance of all this? I'm not quite sure yet, to be honest. I feel I'm still in foundation-building mode when it comes to shot analysis, and it's not like there is a well established body of research on this topic from which to draw (although working with a largely blank slate is part of the fun).

Next steps will be to deep dive into the shooting tendencies of specific players where the league average results above should make for useful benchmarks. First on the list to analyze is the NBA's best pure shooter, and reigning MVP, Stephen Curry. The only drawback will be the venue where Steph plays half his games: Oakland's Oracle Arena. For whatever reason, data from Oracle Arena is by far the messiest of the 29 NBA arenas, with just 31% of shots being clean enough to model (compared to a league average of 63%). The venue with the best data? The Pepsi Center in Denver, with a 76% hit rate when it comes to shot trajectories.

No comments:

Post a Comment