During the French Open, I was reviewing serve outcomes and stumbled into a cautionary tale about the dangers of interpreting serve results without adjustment for opponent skill. It was a stark reminder of why skill adjustment is so fundamental to good analysis in sport. So I wanted to expand on that topic here.
To motivate the discussion, we can start by asking why is skill adjustment essential for evaluating performance in tennis?
Imagine you wanted to determine which players have had the strongest serve performance in 2023. If you simply looked at serve points won across each players matches this wouldn’t give you an apples-to-apples comparison. The nature of single-elimination tennis means that each player will not only face a unique draw at each event but will also only have more match results the more they win. So the sample of opponents available for any given player is a non-random mix that is strongly tied to a player’s overall skill.
Take Novak Djokovic’s draw at the 2023 French Open. By the conclusion of the event, Djokovic completed 7 matches and four of the opponents were seeded players, three top 12 seeds. You can imagine how this kind of pattern compounds from event to event so that the match samples of top players end up with the largest representation of other top players among their opponents.
Differences in opponent mix can be further complicated if we are relying on data sources like the MCP where match coverage is incomplete and driven by volunteer choice and video availability. Because the matches of top players or later rounds of events are more likely to be recorded and posted by fans, these will tend to be overrepresented in the MCP (though this problem is diminished with each more obscure match that is charted).
Let’s consider a specific player example. Figure 1 is sample of all MCP matches for Daniil Medvedev for the 2022 and 2023 seasons at the time of this writing. The plot shows the spread of return skill of the opponent’s in each match at the time the match was played. We see the center of the distribution is a rating of 1800 but with Medvedev some times facing opponents below a rating of 1600 or as high as 2000.
Figure 1. Return Ratings of Daniil Medvedev’s Opponents for MCP Matches in the 2022-23 Seasons
The cluster over a rating of 2000 is especially interesting and what we would likely expect for players who have reached the final rounds of multiple events. Those matches include four matches against Novak Djokovic, two against Rafael Nadal and the 2023 Indian Wells final against Carlos Alcaraz. It wouldn’t be fair to judge Medvedev’s serve performance in these matches just as we would a match against Christopher Eubanks, for example. If we treated all matches equally and ignored the mix of opponent strength they have faced, the Medvedev example shows us that we would likely systematically underestimate the performance of the better players.
So how to we adjust performance analysis for opponent skill? In statistics, adjustment usually refers to determining the mathematical relationship of the confounding variable and the performance outcome of interest. We would then remove the confounder’s contribution when summarizing performance.
For instance, if we want to adjust for player skill when assessing a server’s performance we could start by fitting a serve prediction model given our server and return skill ratings. I’ve done that with 1800 charted ATP matches in 2022-23 using a generalized linear model. A GLM may be one of the simpler models we could choose but that is not a bad choice for a first approach where interpretation of results is important. Also, it is reasonable to think that the log odds will scale linearly with increases in the difference between the server and return ratings.
The results of the prediction model are shown with the heatmap in Figure 2. We see that the most of the probabilities are around 50 and 60%. We can also observe that the serve win probability is primarily driven by the difference in server and return rating.
Figure 2. Prediction surface for serve win probability based on a GLM of serve and return skill ratings.
A prediction model like the one that is described in Figure 2 can be used to get a baseline expectation for a given match. This can then help put the observed serve result into better context. This would enable us to say that a strong serve performance isn’t necessarily one with a high serve percentage but one where a server excelled expectations.
This is highlighted in Figure 3 where we contrast Medvedev’s serve performances to their predicted probabilities. We can see a group of matches that are in the middle of Medvedev’s performances that look much more impressive than expected because they were matches against opponents like Nadal, Djokovic and Jannik Sinner. On the other hand, Medvedev’s performance against Tim Van Rijthoven in the 2022 Final of Hertogenbosch where winning just 49% of service points put Medevedev’s performance well below what was expected.
Figure 3. Observed versus skill-adjusted serve win percent for Daniil Medvedev’s match sample.
We could (and should) strengthen the performance of our serve prediction model with other features like surface and surface-specific player ratings. But even this simple model should already demonstrate the value of adjusting for player skill. It should also show how this could be an invaluable tool for coaches as it could help them set reasonable expectations for what is a good performance given the opponent faced. I often wonder what coaches due in the absence of model-adjusted stats; whatever it is players are probably getting a raw deal.
It is worth noting that Elo-based skill ratings already build in skill adjustment, which is one of the main reasons the method is so powerful. Using player ratings as features in a regression model is a way that we can essentially enrich the rating’s prediction model and take into account other contextual factors (like say the court index of the opponent’s height) that could be important for getting more precise predictions. We hope the increasing availability of player skill ratings in tennis means that we will see more skill-adjusted stats used in the sport in the future.
Have you written any articles about win probabilities based on score within a game? I'm really curious to see what the average numbers are across the ATP and WTA and at what points the probabilities shift from server to returner. If you haven't written about it personally, do you have any suggestions on resources for this? Cheers, love the articles!