Thursday, 9 June 2016

Quantifying Passing Subsequences: A Way to Identify Teams and Players’ Playing Style?

‘Passes’ dominate the bulk of recordable events of a football match, usually numbering in the hundreds as compared to ‘goals’ or ‘shots’ for example which rarely ever surpass 7 and 30 respectively. Us football fans have become accustomed to seeing references to so-called “passing statistics” such as Xabi Alonso completing a record number of passes with Bayern Munich or Sergio Busquets having 99% passing accuracy in a match. When Arsenal recently signed Granit Xhaka, the media’s coverage was dominated with figures of how he was amongst the top 5 of the Bundesliga’s “Completed Passes” tables. That’s all very well; but with the richness of information available, are we truly limited to such (no offense) obvious conclusions and interpretations?

The recorded information on passes is now greatly detailed, with not only the “passer” and “recipient” being recorded, but also the time at which it occurred, the starting and finish coordinates, the type of pass (long, short, aerial, through ball, etc.), etc. Surely there is more information to be uncovered.

Gyarmati, Kwak and Rodriguez (2014) have a creative take on the problem. They organise the pass data by what they call 3 passes-long motifs, defined as distinct sequences of 3 passes between players regardless of their identity. In total, there are 5 motifs:

  1.   ABCD
  2.   ABCB
  3.   ABCA
  4.   ABAC
  5.   ABAB

This may seem a bit confusing at the moment but if you bear with my explanation for a moment, it’s actually pretty simple: In an England match, a sequence of 3 passes of the type ABCB for example would be Kane passing to Barkley (AàB), then Barkley passing to Wilshere (BàC) and then Wilshere passing back to Barkley (CàB). If in that same match there is an instance of Vardy passing to Rooney (AàB), Rooney passing to Walker (BàC) and then Walker passing back to Rooney (CàB); this would once again be another instance of ABCB in this game even though it wasn’t the same players involved, since in their methodology Gyarmati, Kwak and Rodriguez don’t distinguish between the identity of the players, just the different motifs possible. In their investigation, a sequence of six passes for example would tally four different motifs, as the figure illustrates:

Take a moment to make sure you understand exactly how the information is being processed. In the end, the authors are left with a count tally for each match counting how many times each of the 5 motifs occurred.

The authors’ reasoning is that by understanding the motifs’ distribution for different teams, inherent information about a team’s playing style will become apparent. It seems like a reasonable intuition, if we consider for example that ABCD is a direct build-up passing sequence involving 4 different players, while ABAB most likely reveals a patient build up where 2 players give the ball back and forth in the style we usually attribute to Barcelona or Bayern Munich.

They then applied their ideas to passing data from the 2012-13 Spanish La Liga season. The following figures (taken directly from their paper) show the z-score for each motif for each team in that season:

Indeed, Barcelona seem to make extensive use of “patient” sequences like ABAC, ABCB and ABAB when compared to other teams; and significantly less use of the “direct” motif ABCD.

Another interesting way to use this data is through cluster analysis or clustering. This is a technique designed to discover natural groupings inside data. As a toy example, if you feed the following data points to a clustering algorithm, it will tell you: “There are 4 different groups, and these are the points in each of the groups.”

NOTE: Remember that when things are in 2 dimensions we can visualise it and naturally observe this kind of results, but the true value of these methods is when analysing data in higher dimensions than 3.

Viewing each team as a vector in 5 dimensions, where each entry corresponds to the z-score of one of the five motifs, the authors performed cluster analysis and their result yielded 4 natural groups of teams:

NOTE: The final league standings are in parenthesis for context

Lopez and Sanchez (2015) build on the previous passing motifs approach in their article “Who can replace Xavi?”, and turn the attention to players specifically by looking at which roles each player is fulfilling within the team’s passing sequences. There are now 15 possible roles for a player:

He can either be the “A” in a ABAB sequence (XaviàIniestaàXaviàIniesta), or he can be the “B” (IniestaàXaviàIniestaàXavi). Similarly, he can be the “A” in a ABAC sequence (XaviàBusquetsàXaviàMessi), or he can be the “B” in an ABAC sequence (BusquetsàXaviàBusquetsàMessi), etc. In total there are 15 roles each player can be for all the motifs we already discussed.

This interpretation allows each player to be viewed as a vector in a 15 dimensional space; and the geometry (distances between players) of this space allows for plenty of questions to be answered. A cluster analysis can again be performed, and in this way you can see which players can fulfil similar roles within a team’s passing combinations. Think of the applications this has for recruitment!

In answering their question regarding who can replace Xavi within Barcelona’s passing combinations, Lopez and Sanchez draw out a list of the 20 players geometrically closest to Xavi’s 15-dimensional passing motif feature vector (using data from the previous 3 and 5 seasons of the La Liga and Premier League respectively):

Image taken directly from Lopez and Sanchez (2015)

I really enjoyed the two articles I presented here, and not because I believe their methodology to be the “be all and end all” of investigating playing style or player recruitment, even though they do provide some valuable insight. The true reason I really like them is because they provide the perfect example of how applying math to football problems is not about the brute computing force of computers and algorithms as some might think; but rather it requires skill, creativity and even good old fashioned “football” sense to understand how to quantify and aggregate raw data into useful and manageable ways, and then apply methodologies whose outcomes can be tangibly interpreted in the football context.

I think there is more to come from these recent approaches. Stay tuned.

  1.        Peña, J.L. and Navarro, R.S., 2015. Who can replace Xavi? A passing motif analysis of football players. arXiv preprint arXiv:1506.07768.
  2.        Gyarmati, L., Kwak, H. and Rodriguez, P., 2014. Searching for a unique style in soccer. arXiv preprint arXiv:1409.0308.


  1. Brilliant article . Could you do a graph plotting trophies vs passing motifs . A graph which will show which passing motifs win your more trophies and brings success .

    1. You could definitely have a look at that, its in my plans to try and link all this back to success; so keep a look out!