Evaluating the Elo algorithm - Can we predict wins?

How can we tell that the Elo algorithm is doing a good job of evaluating players? We can compare to other metrics that the hockey analytics community uses, but they don't generally capture two-way play in the same way that our Elo does. 

In order to test our algorithm, we selected a non-biased metric of success to compare it with. Winning games! A team filled with players with high Elo ratings should beat a team filled with players with low Elo ratings, right? In order to test whether this is true, we needed to construct a team-Elo from the individual player Elo ratings, weighting by time on ice while playing for that team. The following plots show season wins versus team-Elo values for the last four seasons.

The plots above show a clear relationship between wins and Elo-ratings. Interestingly, the average team-Elo rating for playoff teams over the last 4 seasons is 1.526 while the average rating for non-playoff teams is 1.506. This is a telling result! Here are some tidbits:

1) The worst team to make the playoffs was Florida in 2012, with a season team-Elo of 1.502

2) The best team to miss out on the playoffs was the 2014 Toronto Maple leafs, though, as we will touch on in future posts - this team did not fare as well in Corsi-Elo (a metric which we will introduce later, that is an Elo rating for shots)

3) With the exception of Chicago in 2013, Stanley Cup winners (Boston in 2011 and LA in 2012 and 2014) have not been among the elite teams in terms of Elo ratings. However, if we look at team-Elo for the month heading into the playoffs, we see that all three of these cup-winning teams had elite form heading into the playoffs.

This all begs the question: how accurately can be predict the outcome of a given game? The answer, is that it depends on the team-Elo rating difference between the home and away teams, as seen the figure below. The grey areas in the figure quantify the uncertainty in the data, notice how the uncertainty is larger when team-Elo differences are large i.e. when the data is scarce.


Here is what we can glean from the above plot:

1) When the teams are even – the home team wins around 55% of the time.

2) Regardless of the difference in ratings, the home team never loses less than 40% of the time.

3) When the home team's rating exceeds the away team by 0.06, they win 66% of the time.

4) Home ice bias = rating difference when the away team has a 0.012 team-Elo advantage.

All up, we can accurately determine the winner in any give game around 60% of the time. That might not seem so great, however, when you consider that the theoretical maximum for hockey is around 62% (see this famous analysis), this is quite a good result for a single metric!

OK – so the Elo-ratings work, but how good can they get? One parameter that we have freedom to tweak is the Elo k-factor. The k-factor controls the rate at which player Elo ratings can change. Too low a value and players will take too long to reach their true rating, while too high a value and player ratings will be too noisy – giving too much bias to recent games. The analysis for this will be the subject of a future post, as I don't want to get too technical for now.