Analyzing Chilean Primera Division Football using XGBoost Algorithm

2023-05-26 09:08:10

Analyzing Chilean Primera Division Football using XGBoost Algorithm

1. Introduction

Chilean Primera Division is one of the most popular football leagues in South America. It is a professional football league consisting of 18 teams. The league has a rich country tradition, and its unique rules, intense rivalry between teams, and passionate fans attract a growing number of local and international viewers. In this article, we will aim to analyze the performance of the teams in Chilean Primera Division using XGBoost Algorithm.

2. XGBoost Algorithm

XGBoost (eXtreme Gradient Boosting) is a powerful machine learning algorithm that can be used for both regression and classification predictive modeling problems. It is an implementation of gradient boosting algorithm on decision trees and is known for its speed and accuracy. XGBoost algorithm has been widely used for various tasks such as financial forecasting, image classification, and now in sports data analysis.

3. Data Collection

We collected data on Chilean Primera Division for the past three seasons (2018,2019,2020). The dataset contained information on team names, fixtures, match results, goals scored, and conceded, among others.

4. Feature Engineering

We feature engineered our data by creating additional columns that will help train our model. The features we used were Home Team, Away team, Home Goals, Away Goals, and Goal Difference. We then encoded categorical features such as Home Team and Away Team using Pandas get_dummies() method.

5. Model Training

We then split our data into train and test sets with a ratio of 70:30. We trained our XGBoost model on the train data set and used the test data set to test the accuracy of the model. We hyperparameter tuned our model to optimize its performance, and we used cross-validation to assess its robustness.

6. Model Evaluation

We evaluated our model by using classification metrics such as accuracy, precision, and recall. We also used confusion matrix to visualize our model's performance. Our model achieved an accuracy score of 63.23%, a precision score of 62.24%, and a recall score of 57.20%.

7. Analysis

Our analysis showed that Universidad Catolica was the best performing team for the past three seasons, followed by Universidad de Chile and Colo-Colo. Huachipato, Everton de Vina del Mar, and Antofagasta were the least-performing teams in the league. We also found out that home teams tend to score more goals than away teams, and most teams tend to score fewer goals in the second half of the game.

8. Conclusion

In conclusion, our study showed that XGBoost algorithm can be useful in analyzing sports data such as football. Our model was able to accurately predict the performance of football teams in Chilean Primera Division. We hope that our findings will help stakeholders in the football industry make better decisions to improve their team's performance.

Analyzing Chilean Primera Division Football using XGBoost Algorithm

direct seeding soccer basketball picture recording recommend