NBA Player Shooting Motions: A Data Dump

Over 5 years ago, I published my first research on NBA motion tracking data. Where most analysis and research at that time (and still today) focused on the locations of the 10 players in two dimensions, I carved out my own niche by focusing instead on the motion of the ball itself in all three dimensions. You can find the highlights of my research on the main menu of site, under the "Shot Tracking" heading.

One of the more rewarding and fun projects was diving deep into each player's specific shooting motion - their "windup", so to speak. And while I've struggled to translate that work into meaningful and useful basketball analysis, it did allow me to create nifty animated visuals of player shooting motions, like these:

 




These graphics received a lot of engagement and feedback on Twitter, with lots of interesting ideas as to where to take this next - ideas I just don't have time for these days. 

So, I decided to share some of the data here to let others take a crack at this. This is not the raw SportVU data. That data doesn't belong to me and thus is not mine to share. But what I am sharing does have detailed metrics, and detailed path data, on nearly 200 NBA three point shooters.

The dataset has two main csv files:

  • player_metrics: A player level data set with a variety of metrics characterizing their three point shot motion
  • path_detail: This file contains detailed trajectory data (in all three dimensions) for each player. This is the "typical" shooting motion, derived from all available player three point shots (minimum of 200).
January 30 Update: I have added separate sets of files with shot trajectories trained on the following subsets of shots:
  • Open shots: nearest defender is 4 feet away or greater at time of shot
  • Defended shots: nearest defender within 4 feet at time of shot
  • Made shots
  • Missed shots

Links:
Here's the data dictionary for the player_metrics file:
  • pid - official NBA player id number
  • fnm - player first name
  • lnm - player last name
  • hght - player height (inches)
  • n - number of three point shots in the dataset
  • bx, by, bz -x, y, and z coordinates of the beginning of the player's shot motion (in feet). x and y are relative to the point of release (roughly). x coordinate is left/right from the perspective of the player, y coordinate measures distance towards/away from the basket
  • rt - release time; time (in seconds) from the beginning of the shot motion to point of release
  • rx, ry, rz - x, y, and z coordinates of the point of release (in feet). x coordinate is left/right from the perspective of the player, y coordinate measures distance towards/away from the basket.
  • rv - release velocity; speed of ball (feet/second) at point of release
  • rvx, rvy, rvz - x, y, and z components of release velocity
  • mnv - minimum velocity of ball during player shooting motion (feet/second)
  • mnvt - time (in seconds) when minimum velocity is reached (e.g. 0.307 means that minimum velocity is reached 0.307 seconds after the shooting motion begins)
  • mnvx, mnvy, mnvz - x, y, and z components of the minimum velocity of the ball (mnv)
  • mxv - maximum velocity of ball during player shooting motion (feet/second)
  • mxvt - time (in seconds) when maximum velocity is reached (e.g. 0.115 means that maximum velocity is reached 0.115 seconds after the shooting motion begins)
  • mxvx, mxvy, mxvz - x, y, and z components of the maximum velocity of the ball (mxv)
  • ta1t - time (in seconds) when the ball first switches direction from going towards the hoop to away from the hoop. This typically happens early in the shooting motion as the player brings the ball up from around waist level.
  • ta1x, ta1y, ta1z - x, y, and z coordinates at the point when the ball first switches direction from going towards the hoop to away from the hoop.
  • ta2t, ta2x, ta2y, ta2z - same as the ta1 fields defined above, but specifies the time and position coordinates of the second time the ball switches from going towards the hoop to away from the hoop. Field shows “NA” if there is no second point. Note: I included this for completeness, but it appears that for the shooters I analyzed, this point does not exist for any of them.
  • at1t - time (in seconds) when the ball first switches direction from going away from the hoop to towards the hoop. This typically happens as the player has brought the ball back above their head and starts the process of launching the ball towards the basket.
  • at1x, at1y, at1z - x, y, and z coordinates at the point when the ball first switches direction from going away from the hoop to towards the hoop.
  • at2t, at2x, at2y, at2z - same as the “at1” fields defined above, but specifies the time and position coordinates of the second time the ball switches from going away from the hoop to towards the hoop. Field shows “NA” if there is no second point. Note: I included this for completeness, but it appears that for the shooters I analyzed, this point does not exist for any of them.
  • lr1t - time (in seconds) when the ball first switches direction from left to right (from the perspective of the shooter). Most players have a shooting motion that is not strictly straight up and down, but involves some lateral movement.
  • lr1x, lr1y, lr1z - x, y, and z coordinates at the point when the ball first switches direction from moving left to moving right
  • lr2t, lr2x, lr2y, lr2z - same as the “lr1” fields defined above, but for the second time the ball switches direction from left to right
  • rl1t - time (in seconds) when the ball first switches direction from right to left (from the perspective of the shooter). Most players have a shooting motion that is not strictly straight up and down, but involves some lateral movement.
  • rl1x, rl1y, rl1z - x, y, and z coordinates at the point when the ball first switches direction from moving right to moving left
  • rl2t, rl2x, rl2y, rl2z - same as the “rl1” fields defined above, but for the second time the ball switches direction from right to left
  • pl - path length of the players shooting motion (in feet). The length of the path the ball traces from the beginning of the shooting motion to release.
  • spl - the “straight line” path between the beginning of the shooting motion and the point of release
  • plr - ratio between the path length (pl) and the “straight line” path (spl). A measure of how direct or convoluted a player’s shooting motion is (e.g. how directly do they get the ball from point A to point B)
And there's the data dictionary for the path_detail file:
  • pid - official NBA player id number
  • fnm - player first name
  • lnm - player last name
  • hght - player height (inches)
  • t - time (in seconds) elapsed from the beginning of the shooting motion
  • ddst - modeled shot distance (25 feet). The path was trained on three point shots between 23 and 27 feet away from the basket. The loess model used shot distance as an independent variable (to account for differences in shot motion due to shot distance). The path here is for a predicted distance of 25 feet.
  • cx, cy, cz - x, y, and z coordinates of the ball (in feet). x and y are relative to the point of release (roughly). x coordinate is left/right from the perspective of the player, y coordinate measures distance towards/away from the basket.
  • cvx, cvy, cvz - x, y, and z components of the ball's velocity (in feet per second)
  • cv - speed of the ball (in feet per second)
  • cax, cay, caz - x, y, and z components of the ball's acceleration (in feet per second squared)
  • ca - acceleration of the ball (in feet per second squared)
  • rt - release time; time (in seconds) from the beginning of the shot motion to point of release
  • dx, dy, dz - change in x, y, and z position (in feet), relative to prior row
  • d - change in position (in feet), relative to prior row
And a few more technical details:
  • The data is a bit stale at this point, as it is derived from raw tracking data from 2013-2016 (the NBA took the data down publicly on January 23, 2016, a date firmly etched in my mind)
  • The shot trajectories were modeled using standard LOESS techniques (the locfit package in R, specifically). The x, y, and z trajectories were each modeled separately as a function of time, with a "span" of 0.05
  • The trickiest part is figuring out when shooting motion begins. There is no perfect answer here, but the definition I am using is as follows: "When the horizontal velocity of the ball towards the basket reaches a maximum". The idea here is that is the point when the player has decided to start bringing the ball back towards their body.

Next Steps

I plan on sharing some code examples (likely as a gist) in the near future for how this data can be used and analyzed. And if there is enough interest, I may also share more detailed data (e.g. splits between open/defended, specific shot data, etc.).

My hope is that by sharing this data, we can gain a better understanding of what it is specifically that makes a good three point shooter.

Powered by Blogger.