Sunday, June 29, 2014

Top Match Finder for Tennis

Win probability graphs are up and running for Wimbledon matches. With 127 matches played in each major tournament, it can be difficult to track down particularly noteworthy matches. So, I've added a top match finder, which returns the top matches according to either Excitement Index or Comeback Factor (with filters for date and tournament). This is similar to the Top Games Finder I added for the NBA earlier this year. Here is the link for Tennis: Top Match Finder.

Thursday, June 26, 2014

Live USA-Germany Win Probabilities

Weisskopf Seeadler haliaeetus leucocephalus 9 amkAs always, I don't guarantee this live updates will work, but if all goes well, check the following link at noon ET for a live updating win probability graph: USA-Germany Live Win Probabilities.

These use my recently developed soccer win probability calculator. Note that my starting probabilities will differ from the betting odds somewhat, as the market has factored in an anomalously high probability of a draw into their odds.

The graph currently shows yesterday's Honduras-Switzerland match, but should start updating once USA-Germany gets underway.

Tuesday, June 24, 2014

The Betting Odds of a US-Germany Tie

Collusion theories abound in advance of this Thursday's US-Germany World Cup match. A tie in that match would send both the US and Germany to the Round of Sixteen, and eliminate their other two Group G competitors, Ghana and Portugal. US coach Jurgen Klinsmann has already gone on record that he will not pre-arrange a tie with his former team. But it's not as if he'd admit to it if it were true.

And even if there were no formal collusion, both teams have an incentive to adopt a low-risk style of play that could increase the likelihood of a draw. Nobody can say for sure how each team will play this Thursday, and there are plenty of examples across all sports of teams "going for the win" even when the outcome is immaterial or even counter-productive to the team's long term objectives.

The pundits and fans can speculate, but the bookies have to pick a number and back it with money. Is there evidence from the sports books that the market expects an abnormally high draw probability?

Saturday, June 21, 2014

In-Match Soccer Probability

Futbol!
In what is becoming somewhat of an obsession, I've added a new sport to my suite of win probability tools: In-Match Soccer Probability.

The tool right now is fairly bare bones, but I hope to add some additional features (and de-uglify it) as the World Cup progresses. As it stands, the model provides win, loss, and draw probabilities as a function of the following: game time (in minutes), goal differential, and pre-match odds. I'm not the first to build an in-match model for soccer. The soccer analytics site Soccer Statistically has an interactive model which displays probabilities as function of game time, goal differential, and home/away.

Home field advantage doesn't really apply to World Cup games (except for maybe the host country). So, the model I have is a bit more flexible, allowing you to input pre-match probabilities in a variety of formats. You can use the betting odds from your favorite bookie (Odds Portal is a handy reference), or you can input the probabilities directly from sites like FiveThirtyEight and numberFire.

The Data

I built the model from play by play data from the past two seasons of five of the major European leagues (English Premier, Bundesliga, Serie A, Eredivisie, and La Liga). This worked out to about 3,000 matches. The model itself is a modified version of LOESS, where instead of building local linear regressions, I'm building local ordered logistic regressions (ordered logistic regression was necessary because soccer outcomes are trinary, not binary).


Sunday, June 15, 2014

Live Win Probabilities for Game 5

The win probability graph will be updating live for tonight's (potentially final) game between the Heat and the Spurs. Here is the link: Live NBA Win Probability Graph.

The graph may not start updating until midway through the first quarter. Until then, Game 4's graph will be displayed. Just click the refresh button to get the most up to date version of the graph.

Monday, June 9, 2014

Never Bet a Horse Named Joe

In this post, I will attempt to determine whether horses with popular first names (e.g. Michael, Mary, etc.) are overbet by the public.

With the Belmont Stakes over and another Triple Crown bid thwarted (by "cowards", no less), the public will go back to largely ignoring the sport of horse racing. So, this might not be the most ideally timed post, but here goes.

Moreso than probably any other sport, gambling is a fundamental part of horse racing. In most cases, odds and payouts for horse racing bets are determined by a parimutuel system. Under parimutuel betting, the odds are set directly by the public, with no need for bookmakers or "sharps" to set payouts. As a result, horse racing odds are a pure reflection of the public's preferences (to the extent they're willing to vote with their wallets).

Despite my statistical inclinations, when I bet the horse races, I tend not to put much thought into it. I'll often play number combinations that appeal to me (my birthday, my daughter's birthday, wedding anniversary, pi, etc.). I'll also play horses based solely on names. If there is a horse with "Michael" or "Mike" in its name, I'll almost always place a bet on it. I know it's not a "smart" bet, but it's a Pascal's wager of sorts for me. If I don't place the bet and the horse wins, I'll be kicking myself. The bet is insurance against this post hoc regret.

Friday, June 6, 2014

Guest Post at Deadspin: Why do NBA Playoff Games Take So Long?

I have a guest post up at Regressing (Deadspin's stat geek subsection): Why Do NBA Playoff Games Take So Long?

This builds upon my previous work on the length of NBA games, focusing now on what adds to game length in the NBA in general, as well as what drives the increased game time in the playoffs. I would love to take the credit for the amazing visuals in that post, but those are the work of Reuben Fischer-Baum.

Thursday, June 5, 2014

Game Six Revisited

With the Heat and the Spurs due to face off again in the NBA Finals, replays from last year's dramatic Game Six are sure to feature heavily in coverage of this rematch, particularly this shot:


In this post, I will use my win probability model to take a more detailed look at the ups and downs of one of the greatest NBA Finals games of all time.

A couple weeks ago, I added the 2012-13 season to my win probability graphs and box scores. Here is the game six graph (link):


The Excitement Index of 10.1 quantifies how much the win probability graph travelled over the course of the game (more movement = more excitement). The Comeback Factor of 199 means that, at their lowest point, the Heat's chances of winning the game were 199 to 1. MVP designates the player with highest Win Probability Added (WPA) for the game. LVP (Least Valuable Player) designates the player with the lowest total WPA. Hero and Goat do the same for my clutch WPA stat. Note the symmetry between MVP Ray Allen's +41.3% WPA and LVP Manu Ginobli's -41.3%.

For now, let's start with the end of regulation and work backwards. Here is a breakdown of that game tying three:

Sunday, June 1, 2014

French Open Graphs - Now with Market Odds

My win probability graphs for the French Open now feature a toggle which allows you to view probabilities that are calibrated to the betting market odds. For the original "50/50" versions, I assumed that each player has a 62.5% chance of winning a point when serving (and a corresponding 37.5% chance on return). The "Market" version adjusts the serve and return probabilities so that the starting win probability matches the win probability as implied by the pre-game betting line. The assumed serve and return probabilities are featured at the top of the graph, next to the score.

What results is a more accurate version of the real drama of each match. For example, here is the graph for the round 1 match between Serena Williams and Alison Lim. Serena went into the match with a win probability of 98.4%, and the line barely budged from that number as Serena won in straight sets 6-2, 6-2.


And here is the graph from Ernests Gulbis' upset over Roger Federer. Gulbis began the game with a 36% win probability, and saw his chances fall to 7% midway through the second set, before battling back to win in five sets.