Clustering of Unevenly Spaced Mixed Data Time Series

University essay from KTH/Matematisk statistik

Abstract: This thesis explores the feasibility of clustering mixed data and unevenly spaced time series for customer segmentation. The proposed method implements the Gower dissimilarity as the local distance function in dynamic time warping to calculate dissimilarities between mixed data time series. The time series are then clustered with k−medoids and the clusters are evaluated with the silhouette score and t−SNE. The study further investigates the use of a time warping regularisation parameter. It is derived that implementing time as a feature has the same effect as penalising time warping, andtherefore time is implemented as a feature where the feature weight is equivalent to a regularisation parameter. The results show that the proposed method successfully identifies clusters in customer transaction data provided by Nordea. Furthermore, the results show a decrease in the silhouette score with an increase in the regularisation parameter, suggesting that the time at which a transaction occurred might not be of relevance to the given dataset. However, due to the method’s high computational complexity, it is limited to relatively small datasets and therefore a need exists for a more scalable and efficient clustering technique.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)