Predicting hotel cancellations using machine learning
Abstract: Room cancellations is a big challenge for the hotel industry since the number of guest aﬀects the whole operational setup. The purpose of the thesis is to predict hotel cancella-tions using machine learning and analyse which factors have the most inﬂuence. Broadly speaking, machine learning can be summarized as an interdisciplinary science for using computers to solve a given problem by ﬁnding patterns and learning from existing data. Machine learning involves theory from among others probability, statistics, optimization, algorithms and computer science. The problem of predicting cancellations is a binary classiﬁcation problem, as the two possible outcomes are cancellation or non-cancellation. Classiﬁcation in statistics is the process of determining what class a given input data belongs to, in other words predicting a qualitative outcome variable. Data was provided by a hotel in the Gothenburg area and the machine learning algorithms used in the thesis were Random Forest, XGBoost and Logit. Random Forest and XGBoost are tree-based models, which creates decision trees in order to make predictions and in a classiﬁcation problem these are referred to as classiﬁcation trees. The aim for a classiﬁcation tree is to determine a qualitative outcome variable by making step-wise binary splits, where the diﬀerent outcomes are denoted as classes. The logit model, or logistic regression, is a form of binary regression which is used as a reference model in this thesis. Our main ﬁndings indicate that Random Forest is the best performing model onthe hotel data with an accuracy close to 80%. Leadtime, which is a numeric variable that represent the days between when the hotel reservation was made and day of arrival, was the most inﬂuential variable in the Random Forest model. Adding weather data marginally improved the accuracy of predicting hotel cancellations, for all models.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)