Elo, who’s the best?

Rédaction Blog

4 years ago

Learning Innovation

 

Do you know who this is and how this person succeeded?

Mark Zuckerberg

This is Mark Zuckerberg, co-founder and CEO of Facebook. The David Fincher movie The Social Network depicts his rise to fame, from being a Harvard student to becoming the CEO of the most-used social network in the world. Zuckerberg is described as someone showing a lack of empathy for people and girls particularly. Before Facebook, one of his first ideas was to code Facemash, a ranking system of Harvard girls, of whom he managed to get all pictures.

Facemash©The Social Network

In the movie, he asked about the algorithm in order to create the “ranking”. His business partner Eduardo Saverin writes a formula, on a window, in order to successfully “rank” Harvard girls. This is the formula that will be used behind Facemash.

The Social Network

©The Social Network

Do you know what it means? This is the Elo rating system, a method for calculating the relative skill levels of players in zero-sum games such as chess. It is named after its creator Arpad Elo, a Hungarian-American physics professor. We will see in this article how this system is used, how the formula works and how it could be used in what we’re doing here at Coorpacademy. 

TL;DR

ELO is a classification system which tends to order a group over time. The more time passes, the more the group is correctly ordered. The principle is to predict the result of an encounter between two players and to carry out this encounter. If the prediction does not correspond to reality, we adjust the ranks of the two parties. And we repeat the operation.

How does it work?

Let’s take the existing ranking of a chess game. All players already have a score which represents an estimate of their ability, of their level. This is the rating floor. In chess, it’s 1200. 

Formule Classement Elo

©The Social Network

On the following image, we display two people, who have never played against each other. One has a score of 1,200, the other of 1,600. By using the formula we can see in The Social Network movie, we can calculate the probability of someone winning against someone else.  

By applying the above formula, we calculated a probability of winning.

In this example, we calculated that the person with 1,200 points has a 10% chance of winning, while the other person (with 1,600 points) has a 90% chance of winning.

Next, we compare what was predicted and what really happened during the clash between the two players. For example, in the first scheme, the purple player won when there was actually 10% chance of winning. He then gains 27 points, while the other loses 27. The score of the two players adjusts.

In the end, when two people compete several times, if each person wins half of the games, they will have exactly the same score, it will even out. If one player is way stronger, the score will grow but will stagnate after a while (we will see why later in the article).

In this case, the one who had little chance of winning wins, so wins a lot of points (27).

27 points

In this second case, if that purple person loses, it complies with something that was planned (with 90% chances). The purple person will then only lose 3 points. The chances of winning influence the final outcome and the number of points won.

3 points

In the case of Facemash, Zuckerberg’s idea, in the movie, was to rank Harvard girls in terms of “hotness”. If a girl considered as “hot” battles against someone considered way less “hot”, and wins, it was already “predicted” by the formula and her score won’t change much. She won’t evolve much in the ranking. On the other hand, if a girl considered as “very pretty” competes and loses against a girl considered as “way less pretty”, the rankings will evolve a lot. This would have been considered, by the ranking technique, as an anomaly – the algorithm will then try to rebalance forces and try to “fix the anomaly.”

The formula written on the window in the movie tells us how to calculate the Expected, the chances of winning or losing.

In a chess game, this is used as such. R being the actual score, and E the Expected. 

Classement Elo aux échecs

This method shows that by competing with people on the same level as yours, or a bit better – so there’s less “probability” for you to win – you will evolve way more in the ranking than if you only compete against weaker opponents. This is showed by the number of points you earn – or lose – after each games.

There is also the K in the scheme, which is the maximum you can earn or lose in terms of points, or score. In the example above and the competition between the purple and green people, K is 30. 27 points earned or lost after a game is almost the maximum, and this reflects the chance of winning pre-calculated (10 % for the purple person, who eventually wins).  In chess, and in a lot of other various systems, K varies. A player first games will use a high K of 40 to rapidly position the person in the ranking. Then, K goes down to 20 to limit fluctuations.

This is the theory. When you apply this Elo rating technique on a large population, and you make them compete for a long time (they all start at the same level, with the same number of points), you will end up with a natural repartition of the population in a bell-shaped curve (or the curve of a normal function). People that have the same level will naturally regroup. People that have a very strong level can’t evolve infinitely, will have less potential opponents, and it becomes hard to go away from the bell in the middle.

Cloche du milieu

Everyone will be drawn by the middle bell. Even the people extremely strong or weak won’t go too far from it. It allows us to divide the population by level, like in the schema below:

Clusters de population par niveaux

In chess for example, only 13 players have been above 2800. Today, the highest Elo rating of chess History belongs to the World Champion from Norway Magnus Carlsen with 2889 points, only 2 points above the legendary Garry Kasparov.

It’s not possible to win an infinite number of points, also because, when you’re the best chess player in the world, the chance of winning are very close to 100%. Players are drawn by the middle of the curve, which creates clusters of level in the population.

By cutting the curve, you can regroup people with approximately the same level and make them compete.

A lot of games are using this technique, because it is simple and automatically creates leagues of players with similar levels. We find the Elo rating technique in a lot of video games, such as Counter Strike, Age of Empires or League of Legends, such as in the example below:

Ligues FPSPossible use cases at Coorpacademy?

We could possibly use this rating technique in our “Battle” system, when two players compete on a particular course and must answer correctly and rapidly a series of questions in order to win. We had a few learners feedback saying that they’re afraid the other players are way too strong. It’s a common reaction in multiplayers games, where the question “Will I be crushed?” can induce frustration.

At Coorpacademy, a Elo-type system could reduce this frustration and reassure learners.

Putting a matchmaking system with this method would allow learners with skills in management, or digital, to compete with similar-level learners in the same skill area. This being said, we just launched a Massive Battle feature, which allows learners to challenge a large population of learners simultaneously… We will tell you more about it very soon!

To go further.

Cantor’s Paradise

Microsoft’s TrueSkill (an improved Elo rating technique developed by Microsoft for Halo and team multiplayers game, not only in one-to-one).

GDC: Ranking Systems 


Arthur has been Software Architect at Coorpacademy for 5 years. Every month, he presents on Monday mornings the Tekacademy, gathering Coorpacademy’s tech team around digital and technical innovations. This article had been presented to the team in February 2020.

Recommended for you

Comment on this article

Vous

Voir l'étude de cas