Forecasting Football Corner Odds: Statistical Modelling, Betting Strategies and Assessing Market Efficiency

University essay from Lunds universitet/Matematisk statistik

Abstract: Statistical modelling could be included in a betting strategy where the value of a bet is assessed by comparing model predictions and market odds. This thesis presents several models based on statistical learning methods for predicting the total number of corners in a football match. Generalised linear regression and decision tree models were developed and their profitability was examined by using historical odds data. The models were trained and tested on recent seasons of the English Premier League. To further test the predictive strength, the models were tested on the German Bundesliga. Since the number of corners in a football match is count data but exhibits overdispersion, negative binomial regression was used to numerically model the number of corners. This approach was accompanied by logistic regression as well as numerical-based and classification-based random forest models. The number of corners could be seen as a classification variable with the classes defined as above or below a certain number of corners, often referred to as the betting line on the over-under odds market. The explanatory variables used to develop the models were match-by-match statistics from the Premier League, processed by creating averages of different lengths and supplemented by variables representing team capabilities and self-created variables representing current form and motivation. Backward stepwise selection and elastic net were used to select variables to include in the generalised linear regression models. The combinations of model approaches and methods resulted in fifteen possible models, which were assessed using statistical evaluation measures. Level stakes and the Kelly criterion were applied as betting strategies on the best-performing models for each method. Furthermore, the over-under betting market for corners was examined in order to identify potential asymmetries in the offered odds implying an inefficient market. The results indicated that the best-performing models from each method were all profitable when tested on new data from the Premier League, despite having a low degree of explanatory power. On the contrary, the explanatory power and profitability decreased significantly when the Premier League-based models were tested on the Bundesliga without retraining leading to the majority of the models turning unprofitable. The analysis of the over-under market suggested that the under odds offered in the Premier League matches were generally undervalued, while this undervaluation was not statistically significant for the Bundesliga.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)