A performance measurement of a Speaker Verification system based on a variance in data collection for Gaussian Mixture Model and Universal Background Model

University essay from Malmö universitet/Fakulteten för teknik och samhälle (TS)

Abstract: Voice recognition has become a more focused and researched field in the last century,and new techniques to identify speech has been introduced. A part of voice recognition isspeaker verification which is divided into Front-end and Back-end. The first componentis the front-end or feature extraction where techniques such as Mel-Frequency CepstrumCoefficients (MFCC) is used to extract the speaker specific features of a speech signal,MFCC is mostly used because it is based on the known variations of the humans ear’scritical frequency bandwidth. The second component is the back-end and handles thespeaker modeling. The back-end is based on the Gaussian Mixture Model (GMM) andGaussian Mixture Model-Universal Background Model (GMM-UBM) methods forenrollment and verification of the specific speaker. In addition, normalization techniquessuch as Cepstral Means Subtraction (CMS) and feature warping is also used forrobustness against noise and distortion. In this paper, we are going to build a speakerverification system and experiment with a variance in the amount of training data for thetrue speaker model, and to evaluate the system performance. And further investigate thearea of security in a speaker verification system then two methods are compared (GMMand GMM-UBM) to experiment on which is more secure depending on the amount oftraining data available.This research will therefore give a contribution to how much data is really necessary fora secure system where the False Positive is as close to zero as possible, how will theamount of training data affect the False Negative (FN), and how does this differ betweenGMM and GMM-UBM.The result shows that an increase in speaker specific training data will increase theperformance of the system. However, too much training data has been proven to beunnecessary because the performance of the system will eventually reach its highest point and in this case it was around 48 min of data, and the results also show that the GMMUBM model containing 48- to 60 minutes outperformed the GMM models.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)