Understanding Traffic Cruising Causation : Via Parking Data Enhancement

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Background. Some computer scientists have recently pointed out that it may be more effective for the computer science community to focus more on data preparation for performance improvements, rather than exclusively comparing modeling techniques.Testing how useful this shift in focus is, this paper chooses a particular data extraction technique to examine the differences in data model performance. Objectives. Five recent (2016-2020) studies concerning modeling parking congestion have used a rationalized approach to feature extraction rather than a measured approach. Their main focus was to select modeling techniques to find the best performance. Instead, this study picks a feature common to them all and attempts to improve it. It is then compared to the performance of the feature when it retains the state it had in the related studies. Weights are applied to the selected features, and altered, rather than using several modeling techniques. Specifically in the case of time series parking data, as the opportunity appeared in that sector. Apart from this, the reusability of the data is also gauged. Methods. An experimental case study is designed in three parts. The first tests the importance of weighted sum configurations relative to drivers' expectations. The second analyzes how much data can be recycled from the real data, and whether spatial or temporal comparisons are better for data synthesis of parking data. The third part compares the performance of the best configuration against the default configuration using k-means clustering algorithm and dynamic time warping distance. Results. The experimental results show performance improvements on all levels, and increasing improvement as the sample sizes grow, up to 9% average improvement per category, 6.2% for the entire city. The popularity of a parking lot turned out to be as important as occupancy rates(50% importance each), while volatility was obstructive. A few months were recyclable, and a few small parking lots could replace each other's datasets. Temporal aspects turned out to be better for parking data simulations than spatial aspects. Conclusions. The results support the data scientists' belief that quality- and quantity improvements of data are more important than creating more, new types of models. The score can be used as a better metric for parking congestion rates, for both drivers and managers. It can be employed in the public sphere under the condition that higher quality, richer data are provided.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)