Cluster analysis on sparse customer data on purchase of insurance products

University essay from KTH/Matematisk statistik

Author: Michel Alexander Postigo Smura; [2019]

Keywords: ;

Abstract: This thesis work aims at performing a cluster analysis on customer data of insurance products. Three different clustering algorithms are investigated. These are K-means (center-based clustering), Two-Level clustering (SOM and Hierarchical clustering) and HDBSCAN (density-based clustering). The input to the algorithms is a high-dimensional and sparse data set. It contains information about the customers previous purchases, how many of a product they have bought and how much they have paid. The data set is partitioned in four different subsets done with domain knowledge and also preprocessed by normalizing respectively scaling before running the three different cluster algorithms on it. A parameter search is performed for each of the cluster algorithms and the best clustering is compared with the other results. The best is measured by the highest average silhouette index. The results indicates that all of the three algorithms performs approximately equally good, with single exceptions. However, it can be stated that the algorithm showing best general results is K-means on scaled data sets. The different preprocessings and partitions of the data impacts the results in different ways and this shows that it is important to preprocess the input data in several ways when performing a cluster analysis.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)