Saturday, July 12, 2014

On the Probability of Scoring a Goal

Soccer goal low angleIn this post, I will describe my attempts to model the probability of a goal being scored in soccer. After correcting for team imbalances, I find that a trailing team has a higher probability of scoring in most situations. This result has potential implications for strategy and whether teams should be adopting a more aggressive style of play.

The Model

Using the same dataset I used for my win probability model (~3,000 matches from five of the top European Leagues), I employed LOESS smoothing to build a model that predicts the probability of a goal being scored within the next minute of game time. The model is a function of the following:
  • game time
  • goal difference
  • team strength
I derive the team strength from the pre-match betting odds, and convert it into an expected goals scored per game. Including team strength as a parameter is crucial for this type of analysis, because the model is also a function of goal differential. There is going to be heavy selection bias in the raw historical results. Favorites are going to be over-represented in game situations in which a team has a positive goal differential. As a result, the raw goal probability is higher for teams that have a lead (favorites tend to score more). But having a lead in and of itself does not lead to a higher probability of scoring more goals. This is correlation, not causation.

In fact, once we control for the bias in the results, the exact opposite conclusion emerges: For most of the game, a team trailing by one goal is more likely to score than when leading by a goal or tied. See below for the (smoothed) goal probabilities as a function of game time. The probabilities reflect a team that would be expected to score 1.4 goals per game, on average.

Friday, July 4, 2014

Tennis matches and luck

Tennis, like most sports, is largely a matter of scoring more points than your opponent. But the game-set-match scoring system used in tennis differentiates it from other contests. In basketball, scoring more points than your opponent defines victory. In tennis, scoring more points tends to lead to victory, but it's not a guarantee. It also matters when you score your points, and whether those points help you win sets.

In a recent post for FiveThirtyEight, Carl Bialik covered this topic, referring to matches in which a player wins, despite winning fewer points, as a "lottery match". Using data from Tennis Abstract, he found that 7.5 percent of mens' matches ended in this way.

For this post, I will take a closer look at these lottery matches and use it to define a "luck" measure for tennis, which will be added to my tennis win probability graphs.