Tuesday, June 10, 2014

Six Degrees of Rooney: How Rooney is connected to (almost) everyone at the World Cup

This is a version of the Kevin Bacon game but one which uses players at the world cup. What it does is challenge you to connect Wayne Rooney to another player at the world cup in as few steps as possible.

You can connect two players only if they've spent at least half a season together as part of the first team for a club. Chances are you may be able to connect Rooney ultimately to another player, but not in a way that's shorter than the "path" given by the visualization below. Play around with the "viz" a little to get a better idea of how it works.
There are a few caveats to keep in mind though. One is that you can only use club affiliations, not national team ones. So if you want to connect Rooney to Luis Suarez of Uruguay, you can't go "Rooney plays for England with Daniel Sturridge who plays in Liverpool with Luis Suarez". That's cheating!

Also, you can't use players that aren't there at the World Cup. So if you're trying to connect Rooney to Marco Veratti of Paris Saint Germain, you can't use his clubmate Zlatan Ibrahimovic along the way as he isn't at the tournament. (This is a visualization I've made especially for the World Cup after all.)

Another thing to keep in mind is that I only take into account teams that players have actually played for, not the team that holds their registration or who they're contracted to or whatever the proper legalese is. So Romelu Lukaku of Belgium is listed as playing for West Brom and Everton for the last two seasons respectively, even though he may have technically been a Chelsea player then.

Also, you can't use clubs that players are transferring to after the World Cup. So Ciro Immobile of Italy may be going to play for Borussia Dortmund next season, but you can't use future affiliations, just clubs that players have actually played for.

You also can't use stints with youth teams, Under-18 teams or B-teams along the way, only seasons together as part of the first-team squad count. Also, a lot of players, when they're young, especially in the Premier League, go on one-month loans to teams in lower divisions, haven't included them either.

I know a lot of players have been dropping like flies in the build-up to the tournament, so I'll remove missing players and update the data after the first round of matches is done, countries won't be able to modify their squads after that.

(Just as a point of explanation, if you look at the table above, you'll see expressions like 2011-H1 or 2003-H2,  so when I say H1, I mean the first half of the calendar year, ie. Jan-Jun, or the second half of most European seasons. Similarly, H2 stands for the second half of the year or the months Jul-Dec.)

Who Rooney isn't connected to

Note that I've used "almost" in the headline of my blog-post, that's because there's 36 of the 735 players at the World Cup that Rooney can't be connected to, and chances are no one else can be either.

I haven't really looked at why those 36 players are unconnected, it could be their age or the fact that they play for really small clubs in their own country and so have little to connect them with the stars of their national team, which would really set them on their way to being part of Rooney's network, don't know. The thing to keep in mind is that I've just used the 736 players (32 teams x 23-man squads) at the World Cup, so there may well be players who aren't at the tournament who could connect these 36 players to Rooney in some way.

This is the breakup of the 36 unconnected players by country:

If you look at the graph above, the fact that there are so many African teams with unconnected players isn't a surprise, but the fact that players from the USA and Australia are in there is noteworthy. Is that an indicator of how players are opting to stay in the MLS or A-League more instead of moving to play in Europe? Again, don't know, but if you want the list of the 36 unconnected players, it's in the dataset linked to further down the page.

And for those keeping score, of all the 699 players that are connected to Rooney, the player furthest out (Azubuike Egwuekwe of Nigeria) is, indeed, six degrees of separation away from Rooney!

What's the other table in the viz for?
Well, there were a number of things I could have done with the data but, sticking with the overall theme of "connectedness", what I chose to do is make it easier to find out if players from any two national teams have ever played together in the past.

Sometimes you get players in international matches going for those 50/50 balls a little harder with some opposing players more than others, and you can't really figure out why. There's a good chance that those players may have either played with each other in the past for the same club or for rival clubs in the same league. This viz will help you find out if that's the case.

Apart from that, it just lets you answer that very basic football-geek question of whether players from the two teams on the pitch ever played with each other in the past.

Note that I've used FIFA country codes here and most of them are pretty clear, but be wary of confusing codes like BIH for Bosnia, SUI for Switzerland, CIV for Ivory Coast and CRC for Costa Rica.

How this was done
First things first, here's a link to the dataset and the Pajek .net file.

I used transfermarkt.com, kicker.de, soccerway.com & footballdatabase.eu to compile the raw data. (Any mistakes in the data are my own and not that of the website, yada, yada..) I then coded it and created a Pajek format .net file, which was then processed using igraph & R to find the shortest paths from Rooney to every other player, selecting the most recent ones for the visualization. Kind of went down a rabbit hole on this one with all the computer science-y and "social network analysis" stuff I had to wade through, but I'm glad to have come out the other side alive!

As always, suggestions, criticism, bouquets and brickbats all welcome in the comments section below!

Ultimate Poseur

2014-06-10 UPDATE : Philippa @Philby1976 has done some great work visualizing the dataset given above, you can see her work created with BIME here: http://html5.bimeapp.com/EFEx/WorldCupHistory

2014-06-11 UPDATE: Serves me right for not double checking before I uploaded my viz, but it turns out records for nine players were missing from the data I originally uploaded to Tableau, but I've corrected it now and all 736 players should be listed now. Also, because of those missing records, my original figure for unconnected players added up to 30 instead of 36 as it is now. All this happened because there was a stage between the processing of the raw data by R and formatting it for Tableau that wasn't automated and involved significant editing by hand, and so increased the scope for human error. Looks like I'll have to get better at things like programming to avoid slip-ups like this! Don't worry about it, the "viz" is still good, in fact, now it's even better, so use it confidently!

2014-06-12 UPDATE: Had to correct an issue resulting not from missing data, but from too much data! So I had entered multiple records for players like Nabil Bentaleb (ALG) and Remy Cabella (FRA) because there were multiple "shortest paths" to them from Rooney that were all relatively recent, and because I couldn't choose between them, I thought it would be better to just keep them all. But for some reason that confused Tableau and if you entered Bentaleb's or Cabella's name, Tableau wouldn't give an answer. So I deleted any instances of multiple records and everything's working properly now! On another note, a player from Costa Rica has been withdrawn, so will update the data to reflect that change after the first round of matches is over.