Tuesday, September 6, 2016

20 Years of WNBA Win Probability Graphs

The NBA season is still some two months away, but the WNBA season is in full swing. Regular season play has now resumed following the Olympics shutdown, and the playoffs are just a few weeks away.

One of my offseason goals this year was to extend many of the tools and analysis I had developed for the men's game to the WNBA. I started small last month with the addition of the WNBA to my suite of Vegas team rankings. Today, I have a much more substantial update to announce:

The following tools and stats are available for the entire 20 year history of the WNBA:


Building a Win Probability Model for the WNBA

The nice thing about this project is that I didn't have to start from scratch. WNBA data is structured very similarly to NBA data, so in a lot of ways, I could just point my existing code and methodology to a new dataset. The WNBA win probability model was built using the same approach as the NBA win probability model. It is based on play by play data for over 2,000 WNBA games, going back to the 2007 season.

Win probability is calculated as a function of game time, scoring margin, game state (e.g. has possession, shooting two free throws, after missed shot, etc.), and the pre-game Vegas point spread. I used R's locfit package to build a locally weighted logistic regression model, and then optimized the smoothing parameters via cross validation.

Here are some examples showing how the WNBA model compares to the NBA model:
One pattern that emerges is that WNBA leads seem slightly "safer" for the same score differential and game clock. However, this is to be expected because of differences in game length. The NBA plays 12 minute quarters, while the WNBA's quarters are 10 minutes. So, with 5:00 on the clock in the 2nd quarter, there is more game time left (and thus more possessions left) in an NBA contest, as compared to the WNBA. And the more time remaining, the less safe any given lead is. The differential shrinks (but does not disappear) for the 4th quarter, when game time remaining is equivalent.

Also note that the WNBA game is played at a very similar pace to the NBA game. Average possession length for WNBA games this season is 15.2 seconds, and has been in the 15.0-15.5 seconds range since 2006. Prior to 2006, the WNBA used a 30 second shot clock, and average possession length was about 1.7 seconds longer. NBA possession length was 14.9 seconds this past season, but has varied from 15.1 to 15.6 seconds per possession over the past 12 years.

The WNBA probabilities in the charts above are a bit "bumpier" than I would like. I would expect that if I had additional data to feed into the regression, those bumps would smooth themselves out as they did with the NBA. There is far more data available for mens sports than for womens. The NBA model was built from ~16,000 games, where the WNBA model is relying on just over 2,000 games.

Graphs and Box Scores

The win probability graphs and box scores will update daily for the remainder of the regular season and playoffs. This should look mostly familiar to those who are familiar with my NBA win probability graphs. The first tab has the game's graph, vitals statistics (excitement index, comeback factor, game MVP and LVP) as well as a win probability added box score for each player. The sparkline next to each player's name shows how their win probability "balance" evolved minute by minute over the course of the game.

The "fourFactors" tab breaks down team win probability added by the four factors of basketball: shooting, free throws, rebounds, and turnovers. Here is how the four factors broke down for the final game of the 2015 season, when the Minnesota Lynx defeated the Indiana Fever for the 2015 WNBA title:


The sum of these four components should net out to 50% for the winning team (give or take a few points, due to plays that cannot be assigned to one of the four factors). Each team starts out with 50% win probability and the winning team must effectively "withdraw "50% from their opponent's balance. This chart shows where those withdrawals come from.

The Lynx won this game with a relatively balanced attack, holding a win probability advantage in all four factors. A typical victory tends to be more heavily weighted to field goal shooting. Here are the average win probability added advantages for the winning team in a WNBA game:
  • Field Goals: +34.1%
  • Free Throws: +4.2%
  • Rebounds: +4.8%
  • Turnovers: +5.7%
  • Other: +1.2%

The "pace" tab summarizes the game's pace of play. With complete play by play data, we no longer have to rely on indirect measures of pace that use box score stats. We can simply count each possession as it happens, allowing for a much more detailed look at pace, with breakdowns by each team's offense as well as by quarter.

The final tab, "clutch", is somewhat experimental, and currently not available for my NBA charts. The goal here is to build a clutch shooting box score for each game, by breaking down shots into four categories according to how important they were in the context of the game. Those categories are garbage time, normal basketball, clutch, and clutch "squared". See here for additional background on the classifications. For the Lynx-Fever finals game mentioned above, the game was a blowout and effectively over by the 4th quarter. The "clutch" tab reflects this as there were no shots taken that classified as clutch. In fact, about 35% of shots were classified as "garbage time" shots.

Top Games Finder

With nearly 4,000 games over the course of 20 seasons, it helps to have a tool to sort things out. The Top Games Finder allows you to filter for particularly exceptional games. The most exciting game in the WNBA's history, as measured by the sum of its win probability swings, was a triple overtime contest in 2008 between the Liberty and Fever. 

Games can also be sorted by comeback factor, which is based on the winning team's odds at their lowest point. The 2nd biggest comeback of all time occurred just a couple months ago, when the Connecticut Sun overcame an 11 point deficit in the final two minutes to force overtime and defeat the Minnesota Lynx.

Filters are also available for the Tension Index, most impressive game performance (MVP), and most catastrophic game performance (LVP).

Player Win Probability Added

I define player win probability added as the sum of the win probability contributions due to a players field goals (made or missed), free throws (made or missed) and turnovers. It is an imperfect and limited definition, but it's clean, and roughly corresponds to how we calculate usage. I also have what I call "kitchen sink" win probability added, which simply sums the win probability contributions of any and all player-attributable stats (basically adding rebounds, assists, steals, and blocks). You can find all of these stats on the Player Win Probability Added page.

By selecting "All Seasons" under the "Season" dropdown, you can see who the all time WNBA greats are when it comes to win probability added.

Phoenix's Diana Taurasi holds a commanding lead in career win probability added, with a total of +41.08. Her second closest competitor is Lauren Jackson at +35.45. If we sort by kitchen sink win probability added instead, Indiana's Tamika Catchings is the all time leader at +169.89.

Chicago's Elena Delle Donne is currently on pace to have the greatest single season in win probability added. She has been on a tear this year, with five games in which she amassed more than 50% win probability added on her own.

Just a note on the win probabilities: These were built using data solely from the post-2006 24 second shot clock era. I am not adjusting the model at all when I apply it to per-2006 games when the 30 second shot clock was in effect. This will probably make the probabilities a bit "under-confident", but I don't think the impact is significant.

1 comment:

  1. Wow what a Great Information about World Day its very nice informative post. thanks for the post. Buy NBA 2K18 MT Cheap

    ReplyDelete