SciSports Data Analysis

Expected Goals Model 2.0

05/08/16 | SciSports

Goals in football matches are rather unpredictable events and the final result does not always reflect which team was performing the best. This statement can be further illustrated by the outcome of the European Championship last month, in which Portugal took home the coveted prize. An outcome that little, apart from the majority of Portuguese fans, had predicted. To get a better reflection of the real performance, each of the created chances can be evaluated. This can be done by calculating the probability of a shot being converted into a goal with a so called expected goals (xG) model. Although the name implies that an expected value for the number of goals is calculated for future matches, these models actually evaluate chances created in past matches.

In March this year, SciSports introduced an alternative model to calculate expected goals. This new model focused on implementing the team defensive strength and individual player offensive strength into expected goals calculations. This rather limited model also included distance to the goal, angle between shot position and both goal posts and a differentiation in open play, free kick, corner and penalty chances.
The aim of previous months was to improve this model by adding relevant characteristics/parameters and to simplify the defensive/offensive strength model. A short description of this is given in this article. 

Parameters included in the model
The input variables in our new xG-model are divided into situation specific parameters (being either true or false) and assisting characteristics (for instance angle or distance to goal). These are shortly described below.

Situation specific parameters
Situation specific parameters are divided into the following categories, in which only one of the types is true for every chance that is created (these are obvious and do not need any further explanation):

  • Match situation (open play / corner / direct free kick / indirect free kick / penalty)
  • Rebound (from keeper / from woodwork / none)
  • Attempt type (header / shot from dribble / shot from pass)
  • Game state (≥ 2 goals up / 1 goal up / Draw / 1 goal down / ≥ 2 goals down) 

The game state (match score) at the moment of the opportunity is of great importance. In general, a team that is in the lead, tends to play more compact and awaits the actions of his opponent. Therefore, it will be harder for the opponent to create an opportunity. On the other hand, the opponent has to take more risks, leading to larger available spaces and larger opportunities to create goals for the leading team. This was well illustrated in the Euro 2016 quarter final between Germany and France.

For every situation specific parameter, a subset of characteristics is taken into account.

  • The distance to the middle of the goal, the goal line and the back line are taken into account. Intuitively shooting from greater distance is harder as the shot location to goal angle is smaller and the time to react for the opposition is higher.
  • The angle between the goal posts and shot (or header) location and this angle from the location of the previous action are the 4th and 5th characteristics. The chance of success is of course higher when the player shoots from positions with a better angle.
  • For the dribblers, we’ve added dribbling distance to the algorithm. The dribbling distance influences the chance of scoring an opportunity. Another important aspect is the transition velocity in the longitudinal direction of the pitch. This transition velocity indicates if the attack uses fast transition phases or slow buildup play.

Not included
Parameters not included in the calculations are defensive errors and big chances. This is due to the fact that these variables are rather subjective. In the future, we will determine which parameters are able to improve the predictability of our xG-model.

Offensive and defensive improvements
To create a more robust model taking into account the team defensive strength and individual player offensive strength, the previously introduced player/defense specific xG maps are reduced into a subset of chance specific factors.
Initially the probabilities of goal attempts are scaled in ten categories. The ten categories are penalty conversion, direct free kick conversion and 4 categories for both shots and headers. The shot and header categories are divided into small, mediocre, good and big chances. Interpolation is used for all chances with probabilities between two categories. For the big chances category, the following table shows the highest rated players:

Teams with best rated defence (lowest number indicating less goals against) against small (assuming these are the easiest to block and therefore indicate a good structured defence) shooting and headed chances are listed in the following table:

The large number of English Championship teams in ‘preventing conversion of small headed chances’ and the number of Serie A teams in ‘preventing conversion of small shooting chances’ is probably due to the playing style, although this is not yet verified.
It is plausible that the teams with a solid defensive structure (Atletico Madrid, Juventus, PSG and Bayern München) tend to prevent the opposing team from making a goal attempt instead of reducing the chance at scoring.

The model is optimised on a dataset of 250.000+ goal attempts for the seven competitions listed in the figure below. The accuracy of this model is calculated by combining all teams for the 2015/2016 season. This results in a root mean square error (RMSE) of 0.1366 for goals scored and 0.1567 for goals conceded (penalties and own goals excluded). The model is optimized using both the RMSEP values and the Brier score per match. It should be taken into account that the model is optimized on a total dataset, which also contains the 65.000 goal attempts used to calculate the RMSE values. The model will therefore be tested on the data for the upcoming season and we will give an update on its performance in January 2017. The scatter plot for goals vs expected goals per game is presented below.

Final remark
To conclude with a final remark on Portugal winning the European Championship. Although they managed to win only one out of seven matches within 90 minutes, their defending was very structured and solid. Portugal were the only team conceding less than 1 xG per match (even against Hungary) and averaged a mean xG against of only 0.56 per match. Combining this with a lethal counter attack resulted in Portugal winning the Euro2016!

Note: Competitions included in the model: Barclays Premier League, Süper Lig, Italian Serie A, Bundesliga, La Liga, Ligue 1, Championship and Eredivisie.