k-means Clustering as a Mixed Integer Programming Problem

University essay from KTH/Skolan för teknikvetenskap (SCI)

Author: Hanna Werner; [2022]

Keywords: ;

Abstract: In this thesis k-means clustering is modelled as a mixed integer programming problem and then solved with a technique based on the concept of branch-and-bound. This approach is favourable when the solutions require high accuracy, but due to the heavy computations needed to solve large problems it is not regularly used. The aim of this thesis is to gain an understanding of how the computational complexity for solving these problems are dependant on the following three parameters:  I: the dimension of the dataII: the number of data pointsIII: the number of clusters. The mixed integer programming problem was modelled in Python and solved using the mathematical programming solver Gurobi. The results of the computational tests indicates that the most limiting parameter is the number of data points, which shows an exponential complexity. Furthermore, the results suggests that the computational complexity has a polynomial dependency on the number of clusters. This result is debatable and a test performed with a larger number of cluster could possibly give a result suggesting exponential complexity. Similarly to the number of clusters, the complexity is polynomial in regard to the dimensionality of the data. However, it seems that the number of clusters has a larger impact on the solver than that of the dimensionality.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)