Czekanowski’s Clustering : Development of Visualization Possibilities of the RMaCzek Package

University essay from Linköpings universitet/Statistik och maskininlärning

Abstract: As one of the most essential data mining tasks, clustering analysis has been widely discussed and employed since its invention. Czekanowski’s diagram, which has been around for over a century as a visualization tool for exploring cluster distributions, is being improved continually. RMaCzek is a package of R, which is used to implement Czekanowski’s diagram. By using this package, users can plot a symmetric or asymmetric Czekanowski’s diagram. However, the user still has to manually judge the clustering result through the diagram, which will inevitably lead to the deviation of the subjective judgement and increase the user’s workload. In order to keep the advantages of Czekanowski’s diagram and exploit its potential, Czekanowski’s clustering algorithm is proposed in this thesis. A new clustering algorithm based on Czekanowski’s diagram that allows it to label the clustering results directly and mark the findings on the Czekanowski’s diagram. Czekanowski’s clustering supports two clustering methods, namely exact Czekanowski’s clustering and fuzzy Czekanowski’s clustering, so that users can choose different methods according to the characteristics of the analysis object. Besides, this thesis will also cover the upgraded RMaCzek R package’s application method, including how to use it for Czekanowski’s clustering, how to express the clustering outcomes by Czekanowski’s diagram and the improvement of plotting function. On the other hand, the performance of the new clustering algorithm will be evaluated in this thesis by comparing it with the other five commonly used clustering algorithms. Also, through some experiments, we were able to determine the impact of various algorithm parameters on clustering performance.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)