Predicting Airbnb Prices in European Cities Using Machine Learning

University essay from Blekinge Tekniska Högskola/Fakulteten för datavetenskaper

Abstract: Background: Machine learning is a field of computer science that focuses on creating models that can predict patterns and relations among data. In this thesis, we use machine learning to predict Airbnb prices in various European cities to help the hosts in setting reasonable prices for their properties. Different supervised machine learning algorithms will be used to determine which model will provide the highest accuracy so that hosts set profitable prices for their housing properties. Objectives: The main goal of this thesis is to use machine learning algorithms to assist the hosts in setting reasonable rental prices for their properties so that they can keep their properties affordable for renters across Europe and achieve maximum occupancy. Methods: The dataset for Airbnb in European cities is gathered from Kaggle and then has been pre-processed using techniques like one-hot encoding, label encoder, standardscaler and principle component analysis. The data set is divided into three parts for training, validation and testing. Next, feature selection is done to determine the most important features that contribute to the pricing, and the dimensionality of the dataset is reduced. Supervised machine learning algorithms are utilized for training. The models are evaluated with reliable performance estimates after tuning the hyperparameters using k-fold cross-validation. Results: The feature_importance_ predicts that room capacity, type of room(shared or not), and the country appear in all three algorithms. Although scores vary between algorithms, these are among the top five attributes that influence the target variable. Day, cleanliness rating, and attr index are some other attributes that are among the top five characteristics. Among the chosen learning algorithms, the random forest regressor gave the best regression model with a R2 score of 0.70. The second best is the gradient boosting regressor with a R2 score of 0.32. While SVM gave the least score of 0.06. Conclusions: Random forest regressor was the best algorithm for predicting the prices of Airbnb and suggests hosts setting reasonable rental prices for their properties with more accurate pricing for renters across Europe compared to other chosen models. Contrary to our expectations SVM had performed the least for this dataset.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)