Time Series Analysis and Binary Classification in a Car-Sharing Service : Application of data-driven methods for analysing trends, seasonality, residuals and prediction of user demand

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Researchers have estimated a 20-percentage point increase in the world’s population residing in urban areas between 2011 and 2050. The increase in denser cities results in opportunities and challenges. Two of the challenges concern sustainability and mobility. With the advancement in technology, smart mobility and car-sharing have emerged as a part of the solution. It has been estimated by research that car-sharing reduces toxic emissions and reduces car ownership, thus decreasing the need for private cars to some extent. Despite being a possible solution to the future’s mobility challenges in urban areas, car-sharing providers suffer from profitability issues. To keep assisting society in the transformation to sustainable mobility alternatives in the future, profitability needs to be reached. Two central challenges to address to reach profitability are user segmentation and demand forecasting. This study focuses on the latter problem and the aim is to understand the demand of different car types and car-sharing users’ individual demands. Quantitative research was conducted, namely, time series analysis and binary classification were selected to answer the research questions. It was concluded that there are a trend, seasonality and residual patterns in the time series capturing bookings per car type per week. However, the patterns were not extensive. Subsequently, a random forest was trained on a data set utilizing moving average feature engineering and consisting of weekly bookings of users having at least 33 journeys during an observation period over 66 weeks (N = 1335705). The final model predicted who is likely to use the service in the upcoming week in an attempt to predict individual demand. In terms of metrics, the random forest achieved a score of .89 in accuracy (both classes), .91 in precision (positive class), .73 in recall (positive class) and .82 in F1-score (positive class). We, therefore, concluded that a machine learning model can predict weekly individual demand fairly well. Future research involves further feature engineering and mapping the predictions to business actions.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)