Clustering of Driver Data based on Driving Patterns

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Data analysis methods are important to analyze the ever-growing enormous quantity of the high dimensional data. Cluster analysis separates or partitions the data into disjoint groups such that data in the same group are similar while data between groups are dissimilar. The focus of this thesis study is to identify natural groups or clusters of drivers using the data which is based on driving style. In finding such a group of drivers, evaluation of the combinations of dimensionality reduction and clustering algorithms is done. The dimensionality reduction algorithms used in this thesis are Principal Component Analysis (PCA) and t-distributed stochastic neighbour embedding (t-SNE). The clustering algorithms such as K-means Clustering and Hierarchical Clustering are selected after performing Literature Review. In this thesis, the evaluation of PCA with K-means, PCA with Hierarchical Clustering, t-SNE with K-means and t-SNE with Hierarchical Clustering is done. The evaluation was done on the Volvo Cars’ drivers dataset based on their driving styles. The dataset is normalized first and Markov Chain of driving styles is calculated. This Markov Chain dataset is of very high dimensions and hence dimensionality reduction algorithms are applied to reduce the dimensions. The reduced dimensions dataset is used as an input to selected clustering algorithms. The combinations of algorithms are evaluated using performance metrics like Silhouette Coefficient, Calinski-Harabasz Index and DaviesBouldin Index. Based on experiment and analysis, the combination of t-SNE and K-means algorithms is found to be the best in comparison to other combinations of algorithms in terms of all performance metrics and is chosen to cluster the drivers based on their driving styles.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)