Evaluation between Google's and Microsoft's automated speech recognition services regarding performance in Swedish

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Nils Sörby; [2022]

Keywords: ;

Abstract: This thesis describes the comparison of two Automatic Speech Recognition (ASR) systems, used in the context of call center self-service systems, in Swedish. One of the ASR systems is provided by Google and the other is from Microsoft. The evaluation involves several speech recognition challenges, including background noise, voices in background and distance to microphone. Also, the dialect as well as the sex of the people participating in the creation of the recordings are included in the comparison. After the introduction where the perspectives of this thesis is presented, the State of the Art chapter describes the current context of call centre self-service systems. The State of the Art chapter also contains a background on the prediction algorithms, acoustic phonetics, natural language processing as well as a section on separation of target voice that is used in ASR. In order to make a comparison of the systems a test was created to benchmark the performance of each system. The process of creating a test consisted of producing a dataset of recordings that could later be run through the application programming interfaces (API) of the two ASR systems. This dataset was constructed by writing scripts that contain keywords from a selected group of domains and creating instructions on how to record. The instructions were sent to 174 employees at Telia Company who were asked to record using their phones. Out of the 174 employees who received instructions 46 were able to follow the recordings. The recordings were then gathered and manually transcribed. When the dataset was complete it was run through the APIs and the recognized words of each ASR system were added to the dataset. These words were then compared to the manually transcribed words which produced a result of the amount of correctly and incorrectly recognized words. The results show that Google's ASR system achieves better performance than Microsoft's system overall, 11.9% word error rate (WER) compared to 14.1%. However, when filtering for specific domains or other attributes Microsoft's system scores on par or even better than Google's system. An example of this is when filtering for the domain transportation, on the utterances from this domain Microsoft's ASR system scores an WER of 20.8% while Google's system scores 28,3%. This is a substantially worse result and a call center self-service system where these utterances are common could benefit from using the ASR system provided by Microsoft instead of Google.ervisor: Name S

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)