Evaluation of Text-Independent and Closed-Set Speaker Identification Systems

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Berk Gedik; [2018]

Keywords: ;

Abstract: Speaker recognition is the task of recognizing a speaker of a given speech record and it has wide application areas. In this thesis, various machine learning models such as Gaussian Mixture Model (GMM), k-Nearest Neighbor(k-NN) Model and Support Vector Machines (SVM) and feature extraction methods such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Predictive Cepstral Coefficients (LPCC) are investigated for the speaker recognition task. Combinations of those models and feature extraction methods are evaluated on many datasets varying on the number of speakers and training data size. This way, the performance of methods in different settings are analyzed. As results, it is found that GMM and KNN methods are providing good accuracies and LPCC method performs better than MFCC. Also, the effect of audio recording duration, training data duration and number of speakers on the prediction accuracy is analyzed. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)