The ModelUsing the same dataset I used for my win probability model (~3,000 matches from five of the top European Leagues), I employed LOESS smoothing to build a model that predicts the probability of a goal being scored within the next minute of game time. The model is a function of the following:
- game time
- goal difference
- team strength
I derive the team strength from the pre-match betting odds, and convert it into an expected goals scored per game. Including team strength as a parameter is crucial for this type of analysis, because the model is also a function of goal differential. There is going to be heavy selection bias in the raw historical results. Favorites are going to be over-represented in game situations in which a team has a positive goal differential. As a result, the raw goal probability is higher for teams that have a lead (favorites tend to score more). But having a lead in and of itself does not lead to a higher probability of scoring more goals. This is correlation, not causation.
In fact, once we control for the bias in the results, the exact opposite conclusion emerges: For most of the game, a team trailing by one goal is more likely to score than when leading by a goal or tied. See below for the (smoothed) goal probabilities as a function of game time. The probabilities reflect a team that would be expected to score 1.4 goals per game, on average.