Real Time Analytics

Statistical Modeling Of ATP Singles Matches (Part III)

Fed won 18. (Credit: Martin Richard, USA Today Sports)

Could a statistical model have predicted the 2016 ATP World Tour Finals?

In this ongoing series (Parts I and II) on statistical modeling of ATP Singles matches, I’ve outlined the mathematical underpinnings of basic predictive models for tennis and put forward two simple models to predict match outcomes. It’s now time to put these to the test. 

Using data from the fall of 2016, I will run the O’Malley model and the Klassen et al. model to see how well they predict the ATP World Tour Finals. First, here are the actual results from the World Tour Finals Round Robin (Wikipedia):Image titleImage title

O’Malley Predictions 

Remember that O’Malley uses percentage of service and return points won to yield a probability that a player will win a given match. This model does not account for opponents’ abilities, momentum, court surface, etc. Also note that Monfils dropped out during the tournament and was replaced by Goffin.

Player (Probability of W in 3 Set Match)O’Malley Predicted RR Actual RR
Andy Murray (62.6%)3-03-0
Novak Djokovic (70.7%)3-03-0
Stan Wawrinka (58.8%)0-31-2
Milos Raonic (60.8%)2-12-1
Kei Nishikori (58.9%)1-21-2
Gaels Monfils (59.6%)1-10-2
Marin Cilic (59.1%)2-11-2
Dominic Thiem (57.8%)0-31-2
David Goffin (56.1%)0-10-1

In the round robin, the simple O’Malley model correctly predicted 10 of 12 matches. It was correct 83% of the time.

O’Malley also correctly predicts the outcomes of the semifinal matches. Djokovic defeats Nishikori, and Murray defeats Raonic. The model is now 12 of 14. Unfortunately, in the final round, the model predicts that Djokovic would beat Murray. Not unreasonable, but also incorrect. Had the model been able to account for momentum within the tournament or preceding weeks, it may have seen what many tennis fans saw qualitatively: Murray was on a tear, he had the momentum on his side. 

Regardless, the model finished 12 of 15, correctly predicting 80% of these matches.

Klassen et al. Predictions

The Klassen model is more advanced, as it takes into account the surface of the court and the serving/returning abilities of both players. Let’s see if this additional accuracy is at all reflected in the predictions. 

PlayerKlassen Predicted RRActual RR
Andy Murray3-03-0
Novak Djokovic3-03-0
Stan Wawrinka 1-21-2
Milos Raonic 2-12-1
Kei Nishikori1-21-2
Gaels Monfils1-10-2
Marin Cilic1-21-2
Dominic Thiem0-31-2
David Goffin0-10-1

Indeed, the Klassen model is slightly better in the round robin. It correctly predicted 11 of 12 matches (it missed on Thiem beating Monfils). Its extra correct prediction comes from Wawrinka vs. Cilic, where it gave a very slight edge to Wawrinka. 

In the final rounds, this model performs the same as O’Malley. By the end of the tournament, the Klassen model correctly predicted 13 of 15 or 86.6%. 

To understand why these two models differ slightly, consult my previous articles, which discuss the fundamentals and mathematical background. Importantly, it should be noted that this tournament provides a very small sample size and also featured few upsets. The one big upset of the tournament (i.e. Murray’s final victory) was missed by both models. Will the efficacy of these models hold over a larger sample size? Find out in the next installment.

Edited by Joe Sparacio, Emily Greitzer.

Where was Milos Raonic born?
Created 2/28/17
  1. Yugoslavia
  2. Canada
  3. Albania
  4. Serbia

Be the first to comment! 0 comments


What do you think?

Please log in or register to comment!