Football Matches Outcomes Prediction based on Gradient Boosting Algorithms and Football Rating System

Open Access
Conference Proceedings
Authors: Muhammad Nazim RazaliAida MustaphaSalama A Mostafa AlabdullahSaraswathy Shamini Gunasekaran

Abstract: Prediction in association football is genuinely a hot topic to discuss as it is among the popular sports that have attracted and gained global interest. The prediction may focus on matches outcomes (win, draw and lose) or the number of goals scored obtained by the home and away teams. This paper proposes football matches outcomes prediction models based on a rating system and gradient boosting algorithms. The testing of the models covers implementing pi-rating and Elo rating as data features generated from limited raw datasets to evaluate match outcomes prediction algorithms such as Gradient Boosting Machine (GBM), XGBoost (XGB), Light Gradient Boosting Machine (LGBM), and CatBoost (CB). The used football dataset has 216,743 instances for learning and 206 instances for testing. The dataset consists of 18 football league seasons between 2001/2002 to 2017/2018 across 35 countries. Subsequently, the prediction results of win, draw, or loss in terms of probability are obtained from the proposed models. The results are compared between several models with different rating systems and different boosting algorithms, as well as past literature that uses a similar dataset. The accuracy and Rank Probability Score (RPS) are set as benchmark criteria. As a result, the pi-rating with CB achieves the lowest RPS, 0.1925, and the highest accuracy of 55.82%.

Keywords: Football prediction, rating system, gradient boosting machine, xgboost, catboost, light gradient boosting machine

DOI: 10.54941/ahfe1002524

Cite this paper: