Finding co-workers with similar competencies through data clustering

University essay from Linköpings universitet/Artificiell intelligens och integrerade datorsystem

Abstract: In this thesis, data clustering techniques are applied to a competence database from the company Combitech. The goal of the clustering is to connect co-workers with similar competencies and competence areas in order to enable more skill sharing. This is accomplished by implementing and evaluating three clustering algorithms, k-modes, DBSCAN, and ROCK. The clustering algorithms are fine-tuned with the use of three internal validity indices, the Dunn, Silhouette, and Davies-Bouldin score. Finally, a form regarding the clustering of the three algorithms is sent out to the co-workers, which the clustering is based on, in order to obtain external validation by calculating the clustering accuracy. The results from the internal validity indices show that ROCK and DBSCAN create the most separated and dense clusters. The results from the form show that ROCK is the most accurate of the three algorithms, with an accuracy of 94%, followed by k-modes at 58% and DBSCAN at 40% accuracy. However, the visualization of the clusters shows that both ROCK and DBSCAN create one very big cluster, which is not desirable. This was not the case for k-modes, where the clusters are more evenly sized while still being fairly well-separated. In general, the results show that it is possible to use data clustering techniques to connect people with similar competencies and that the predicted clusters agree fairly well with the gold-standard data from the co-workers. However, the results are very dependent on the choice of algorithm and parametric values, and thus have to be chosen carefully.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)