A Machine Learning Estimation of the Occupancy of Padel Facilities in Sweden : An application of Random Forest algorithm on a padel booking dataset

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Padel is one of the fastest growing sports in Sweden. Its popularity rose significantly during the Covid-19 pandemic in 2020, as many other types of sport facilities closed, and people had more flexible work schedules due to remote work. This paper is an analysis on the monthly occupancy of indoor padel facilities in Sweden between January 2018 and April 2022. It aims to answer to what degree a machine learning algorithm can predict the occupancy for a given padel facility and which key features have the largest impact on the occupancy. With these findings, it is possible to estimate the revenue for a given padel facility and therefore be used to identify which type of padel facilities have the biggest opportunity to succeed from an economical perspective. This article reviews the literature regarding different methods of machine learning, in this case, applied to booking systems and occupancy estimations. The reviewed literature also presents the most common evaluation metrics used for comparing different machine learning models. This study analyses the relationship between the occupancy level of a given padel facility and 12 input features, related to the padel facility in question, with a random forest regression model. This work results in a model that achieved a R2 score of 49% and a mean absolute error of 11%. The input features ranked according to the largest impact on the model’s estimation are (with the mean of all absolute SHAP values written in parentheses): Year (7.71), Month (5.23), Average Income in municipality (4.13), Driving Time from municipality Centre (2.35), Population of municipality (1.97), Padel Slots in municipality (1.27), Padel Slots in facility (1.27), Average Court Price (1.12), Tennis Slots in municipality (0.73), Badminton Slots in municipality (0.55), Squash Slots in municipality (0.44) and Golf Slots in municipality (0.26). Padel facilities had the highest average occupancy in 2020. The Covid-19 pandemic is likely a significant contributor to this, due to the shutdown of offices and many types of training venues. Therefore, Year has the largest impact on the model’s estimation. Occupancy of indoor facilities follows a seasonal trend, where it tends to be highest in December and January and lowest in June and July. This trend can partly be explained by a larger demand for indoor sport activities during winter and increased competition from outside padel facilities and other activities during summer. Because of this, Month had the second largest impact on the model’s estimation.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)