Text Normalization for Text-to-Speech

University essay from Uppsala universitet/Institutionen för lingvistik och filologi

Author: Zhaorui Zhang; [2023]

Keywords: ;

Abstract: Text normalization plays a crucial role in text-to-speech systems by ensuring that the input text is in an appropriate format and consists of standardized words prior to grapheme-to-phoneme conversion for text-to-speech. The aim of this study was to assess the performance of five text normalization systems based on different methods. These text normalization systems were evaluated on the English Google text normalization dataset. The evaluation was based on the similarity between the ground truth and normalized outputs from each text normalization system. Since multiple ground truth issues occurred during the evaluation, the original similarity scores needed to be manually re-scored. The re-scoring was employed on a sample data semi-randomly extracted from the evaluation dataset. According to the results, the accuracy of these text normalization systems  can be ranked as follows: the Duplex system, the Hybrid system, the VT system, the RS system, and the WFST system. For the two rule-based systems from ReadSpeaker, the VT system performed slightly better than the RS system, with a slight difference in the original similarity score. By analyzing the error patterns produced during the normalization process, the study provided valuable insights into the strengths and limitations of these systems. The findings of this study contribute to the refinement of internal rules, leading to improved accuracy and effectiveness of text normalization in text-to-speech applications.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)