Saturday, July 12, 2014

On the Probability of Scoring a Goal

Soccer goal low angleIn this post, I will describe my attempts to model the probability of a goal being scored in soccer. After correcting for team imbalances, I find that a trailing team has a higher probability of scoring in most situations. This result has potential implications for strategy and whether teams should be adopting a more aggressive style of play.

The Model

Using the same dataset I used for my win probability model (~3,000 matches from five of the top European Leagues), I employed LOESS smoothing to build a model that predicts the probability of a goal being scored within the next minute of game time. The model is a function of the following:
  • game time
  • goal difference
  • team strength
I derive the team strength from the pre-match betting odds, and convert it into an expected goals scored per game. Including team strength as a parameter is crucial for this type of analysis, because the model is also a function of goal differential. There is going to be heavy selection bias in the raw historical results. Favorites are going to be over-represented in game situations in which a team has a positive goal differential. As a result, the raw goal probability is higher for teams that have a lead (favorites tend to score more). But having a lead in and of itself does not lead to a higher probability of scoring more goals. This is correlation, not causation.

In fact, once we control for the bias in the results, the exact opposite conclusion emerges: For most of the game, a team trailing by one goal is more likely to score than when leading by a goal or tied. See below for the (smoothed) goal probabilities as a function of game time. The probabilities reflect a team that would be expected to score 1.4 goals per game, on average.

Friday, July 4, 2014

Tennis matches and luck

Tennis, like most sports, is largely a matter of scoring more points than your opponent. But the game-set-match scoring system used in tennis differentiates it from other contests. In basketball, scoring more points than your opponent defines victory. In tennis, scoring more points tends to lead to victory, but it's not a guarantee. It also matters when you score your points, and whether those points help you win sets.

In a recent post for FiveThirtyEight, Carl Bialik covered this topic, referring to matches in which a player wins, despite winning fewer points, as a "lottery match". Using data from Tennis Abstract, he found that 7.5 percent of mens' matches ended in this way.

For this post, I will take a closer look at these lottery matches and use it to define a "luck" measure for tennis, which will be added to my tennis win probability graphs.

Sunday, June 29, 2014

Top Match Finder for Tennis

Win probability graphs are up and running for Wimbledon matches. With 127 matches played in each major tournament, it can be difficult to track down particularly noteworthy matches. So, I've added a top match finder, which returns the top matches according to either Excitement Index or Comeback Factor (with filters for date and tournament). This is similar to the Top Games Finder I added for the NBA earlier this year. Here is the link for Tennis: Top Match Finder.

Thursday, June 26, 2014

Live USA-Germany Win Probabilities

Weisskopf Seeadler haliaeetus leucocephalus 9 amkAs always, I don't guarantee this live updates will work, but if all goes well, check the following link at noon ET for a live updating win probability graph: USA-Germany Live Win Probabilities.

These use my recently developed soccer win probability calculator. Note that my starting probabilities will differ from the betting odds somewhat, as the market has factored in an anomalously high probability of a draw into their odds.

The graph currently shows yesterday's Honduras-Switzerland match, but should start updating once USA-Germany gets underway.

Tuesday, June 24, 2014

The Betting Odds of a US-Germany Tie

Collusion theories abound in advance of this Thursday's US-Germany World Cup match. A tie in that match would send both the US and Germany to the Round of Sixteen, and eliminate their other two Group G competitors, Ghana and Portugal. US coach Jurgen Klinsmann has already gone on record that he will not pre-arrange a tie with his former team. But it's not as if he'd admit to it if it were true.

And even if there were no formal collusion, both teams have an incentive to adopt a low-risk style of play that could increase the likelihood of a draw. Nobody can say for sure how each team will play this Thursday, and there are plenty of examples across all sports of teams "going for the win" even when the outcome is immaterial or even counter-productive to the team's long term objectives.

The pundits and fans can speculate, but the bookies have to pick a number and back it with money. Is there evidence from the sports books that the market expects an abnormally high draw probability?

Saturday, June 21, 2014

In-Match Soccer Probability

Futbol!
In what is becoming somewhat of an obsession, I've added a new sport to my suite of win probability tools: In-Match Soccer Probability.

The tool right now is fairly bare bones, but I hope to add some additional features (and de-uglify it) as the World Cup progresses. As it stands, the model provides win, loss, and draw probabilities as a function of the following: game time (in minutes), goal differential, and pre-match odds. I'm not the first to build an in-match model for soccer. The soccer analytics site Soccer Statistically has an interactive model which displays probabilities as function of game time, goal differential, and home/away.

Home field advantage doesn't really apply to World Cup games (except for maybe the host country). So, the model I have is a bit more flexible, allowing you to input pre-match probabilities in a variety of formats. You can use the betting odds from your favorite bookie (Odds Portal is a handy reference), or you can input the probabilities directly from sites like FiveThirtyEight and numberFire.

The Data

I built the model from play by play data from the past two seasons of five of the major European leagues (English Premier, Bundesliga, Serie A, Eredivisie, and La Liga). This worked out to about 3,000 matches. The model itself is a modified version of LOESS, where instead of building local linear regressions, I'm building local ordered logistic regressions (ordered logistic regression was necessary because soccer outcomes are trinary, not binary).


Sunday, June 15, 2014

Live Win Probabilities for Game 5

The win probability graph will be updating live for tonight's (potentially final) game between the Heat and the Spurs. Here is the link: Live NBA Win Probability Graph.

The graph may not start updating until midway through the first quarter. Until then, Game 4's graph will be displayed. Just click the refresh button to get the most up to date version of the graph.