A comparison of supervised and semi-supervised learning for classification of truck stop locations

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Mengdi Xue; [2021]

Keywords: ;

Abstract: GPS-based data has been an important source for researchers and commercial fleet companies to study and build transportation models, since GPS uses passive information collecting without including too much human participation, and thus provides the possibility of collecting huge amount of data. Recent research shows that the GPS data can be used to solve different kinds of transportation related problems using machine learning algorithms. One of these problems is to understand a stop location’s purpose, which can be used to improve daily transport. This thesis investigates whether applying semi-supervised learning methods on the identification and classification of stop locations can improve the results because the collection of labeled data is expensive. Stop data of individual stops that contains location, duration, and vehicle information is clustered together to extract the features for training and create the cluster data. The cluster data is labeled by the manually labeled polygon bounding boxes on the map into four classes: loading, unloading, workshop, and other. Deep neural networks with virtual adversarial training (VAT) as the regularization method are applied to the cluster data to train the supervised and semi-supervised learning models. Only labeled data are used in supervised learning, while all labeled and non-labeled data are used in semi-supervised learning. The resulting accuracy for supervised learning and semi-supervised learning is 90.16% and 89.21% when applying to the unbalanced training set and 88.36% and 87.31% when using the balanced training set. P-value is 0.18 (18%), which is calculated based on the distribution of accuracy over multiple running times, meaning that the difference is not statistically significant. In conclusion, for the real-world application in this thesis, we did not find a statistically significant difference between the supervised and semi-supervised approaches and labeled data is still vital for this real-world application. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)