Pages

Thursday 21 July 2016

Quantifying Passing Subsequences: The Mysterious Case of Leicester City Part 2

In the previous entry we left off having seen the result of a clustering dendrogram for the 5-dimensional representation of teams corresponding to the ratio at which they use the 5 passing sequences using data from the 2015-2016 Premier League season.

It came as a surprise that Leicester was signalled out by the method as the team with the most distinctive passing style in the league. But then again, Leicester were eventually crowned champions, so surely something qualitative is there to be found. The problem is untangling the true causality relation of what is being discovered. Saying “Leicester were champions because they earned the highest number of points” is a bit moronic. Something like “Leicester were champions because they scored the third highest amount of goals and had the second least amount of goals conceded” or “Leicester were champions because they were able to name an unchanged starting XI the most times in the season” can provide a bit more insight, but ultimately, when using data from the same season it can be difficult to decipher the true causality of discoveries; i.e., did X happen because Leicester had the potential to be champions, or did Leicester have the potential to be champions because of X.

The essential question then that the whole football world wants answered is if Leicester championship run could have been predicted BEFORE the campaign kicked off. Surely the sports trading community would be interested.
To investigate deeper, we went back to the data from the 2014-2015 when Leicester were almost certainly doomed to relegation but miraculously went on an incredible winning run in the season’s final stretch that saw them go from being 7 points from safety in April to end comfortably 6 points above the relegation zone. No team in the history of the Premier League had ever remained in the first division having fewer than 20 points by the 29th Fixture (Leicester had 19).

Could anyone have predicted Leicester exploits back then? Should we have known?

This was the resulting dendrogram for the data from the 2014-2015 season using the same methodology from the previous entry:



There are several important things to say regarding the results. First of all, forgetting about Leicester for a minute, it’s very satisfying to see that many of the same pairings from the 2015-2016 season are maintained. Arsenal-Manchester City, Tottenham-Chelsea and Crystal Palace-Sunderland are all examples of pairings that arise in both cases. There are other general trends that are respected like Liverpool, Southampton and Swansea being similar, just as Leicester, Arsenal and Manchester City forming the leftmost group in both cases with either Watford or Aston Villa. This is important because the probability of this happening (similar groups for both seasons) if the method was randomly pairing teams would be extremely low. This means that the method is identifying something (which I will call passing style) which is consistent in teams across a pair of consecutive seasons. This ‘satisfying consistency’ can also be seen in data for the 2013-14 and 2012-13 season for which I also replicated the method.

Let’s return now to Leicester. Just as in the case of the 2015-16 season, Leicester is the team that joins a subgroup highest up the clustering tree, meaning its passing style has the weakest bond to any other group of teams, i.e. it is the most distinctive. There is a very important caveat so we don’t get carried away: “being distinctive” is in no way equivalent to “being successful”. In fact, the second most “distinctive” team is Burnley who were relegated at the end of 2014-15. Both Leicester and Burnley have a relatively low total amounts of motifs completed, but this doesn’t explain their distinctiveness necessarily since both QPR and Crystal Palace completed fewer motifs than them and have relatively “strong” bonds with other teams. Also, a truly fascinating characteristic of Leicester’s results for both seasons is that in both of them, Leicester’s passing style forms a subgroup with Arsenal and Manchester City’s, arguably the “passing powerhouses” of the Premier League.

To answer the question posed before regarding whether we should have known about Leicester, I would cautiously say “No”. No, I very much doubt any concrete methodology would have pointed to Leicester as the eventual winner. However, keeping in mind that “being distinctive” is not synonymous to “being successful” (poor relegated Burnley), the truth is that with this data before the start of the 2015-16 season I could have said to pundits: ‘Hey, keep an eye on Leicester, there’s something interesting going on there (they are distinctive and are close to Arsenal and Manchester City)’. Moreover, I would also predict that if Leicester keep their players over the summer, this “style” which has led them to be distinctive in both the 2014-15 and 2015-16 seasons will still be there and could once again lead them to success. I wouldn’t go as far as saying they’ll win it again, but I think they’ll be in the contest. Then again, I could be completely wrong and Leicester’s fortunes can fall off a cliff in the upcoming season; but I know better than to think that means that everything I’ve said here is wrong. I hope the readers do as well (If Leicester end up doing well again, I would also be cautious about the omnipotence of my methods; statistics and probability are all about being better informed about the chaotic randomness of the world, not about fortune telling…).

I’ll keep on trying to see what else this methodology has to give. I suspect some sort of “tree/dendrogram” method could be used to quantify how much success (higher finishes in the league table) is being accumulated in what areas of the tree and what a team’s position in the dendrogram says about its final league position. Also, as I mentioned a couple of entries back when I first spoke about this methodology, the really interesting bit could be extrapolating the method to discover how well prospective recruitments will fit within a team’s passing style. I also hope to have a go at this. Finally, some additional variables could also be integrated into the methodology to further distinguish passing sequences. For example, a completely vertical instance of ABCD is very different from a sequence of ABCD composed of horizontal square passes. Integrating this is also something I’m working on.

Keep an eye on the blog to see how it all unfolds.

Friday 8 July 2016

Quantifying Passing Subsequences: The Mysterious Case of Leicester City


This entry follows up with the previous entry's idea for quantifying teams' and players' passing styles through 3-passes long motifs (if you haven't read it I recommend you do so before reading this one).

Now, I decided to attempt to replicate the results shown previously from the Spanish La Liga using last season's Premier League data. In my application, I quantify the raw passing data by counting the amount of times each motif occurs for each team in each match. The table below shows the total amount of times each team performed each of the five motifs throughout the whole season.


As we can appreciate, Arsenal and Manchester City are either 1st or 2nd for every motif category. However, since teams like Arsenal and Manchester City complete the highest amounts of passes in a season, it is to be expected that they also complete the most motifs for each category.

A different way of looking at this data then is to analyse the relative frequencies of the motifs as a percentage of the total number of motifs completed by a team during match. That is to say, regardless of how many motifs were completed in total by a team, we want to look at which percentage of them were ABAB, which percentage of them were ABAC, etc.

The following boxplots show the distribution of the relative frequency of each motif during each match for each team of the 2015-16 Premier League season:







Now, isn't that interesting?! Leicester emerge as a team with a distinctive playing style now. If you return for a moment to the previous entry you can see that both Barcelona and Leicester “win” in the ABAB and ABAC categories and noticeably “lose” in the ABCD category (this similarity isn’t there in the other two categories). I'm obviously not claiming that Leicester and Barcelona have a similar style, I'm sure I would lose all credibility with football fans and might as well just close the blog. The main difference is that Barcelona don't only win the relative frequency battle, but also the overall total usually completing many more passes than their opposition. Leicester have a much more modest return in overall motifs completed (i.e. many fewer passes completed). In fact they complete the second lowest amount of motifs overall (second only to WBA and only marginally below Sunderland), but for the amount of motifs they do complete, there seems to be something there in the sense that they tend to proportionally perform a distinctive choice of passing sequences/motifs.

In fact, forget about the whole Barcelona thing for a moment. Even without having ever seen those results for La Liga, the methodology is pointing towards Leicester as the team with the unique style in the Premier League. The following figure shows the Clustering Dendrogram for the data viewing each team as a vector in R^5 where each feature is the mean percentage each motif constitutes of the total for each match of the season:



NOTE: It’s not easy to explain briefly what a clustering dendrogram is or means so please refer to any of the good sources on mathematics widely available (Wikipedia is pretty good), but basically it represents how teams are sequentially grouped according to their similarity (distance in R^5). For example, we can see that Leicester, Watford, Arsenal and Manchester City are a “group” but within that group there are two subgroups consisting of Leicester on its own and the other three, and similarly Arsenal is more tightly grouped with Manchester City than with Watford. The higher in the tree a grouping is made, the “weaker” its bond is. With that in mind, Leicester is the team with the “weakest bond” to any other subgroup of teams.

Honestly, I don't know any more than you what this means (yet); but it's very interesting that something pointing towards Leicester came up when I wasn't even looking for it. This was a simple probing methodology which pinpointed Leicester on its own, without me asking: “Is Leicester distinctive?” or “What sets Leicester apart?”.

It would be important to validate whether there is any sort of linearity between the frequency of each motif and the total amount of motifs completed; if that was the case Leicester’s low amount of motifs could explain why it is being set apart, but I don’t think this would tell the whole story as then WBA and Sunderland would also be signaled out in a way.

In the coming weeks I’ll try to get to the bottom of this…