Disambiguating Italian homographic heterophones with SoundChoice and testing ChatGPT as a data-generating tool

University essay from Uppsala universitet/Institutionen för lingvistik och filologi

Abstract: Text-To-Speech systems are challenged by the presence of homographs, words that have more than one possible pronunciation. Rule-based approaches are often still the preferred solution to this issue in the industry. However, there have been multiple attempts to solve the ‘homograph issue’, by exploring statistical-based, neural-based, and hybrid techniques, mostly for English. Ploujnikov and Ravanelli (2022) proposed a neural-based grapheme-to-phoneme framework, SoundChoice, which comes as an RNN and a transformer version and can be fine-tuned for homograph disambiguation thanks to a weighted homograph loss. This thesis trains and tests this framework on Italian, instead of English, to see how it performs on a different language. Moreover, seeing as the available data containing homographs was insufficient for this task, the thesis experiments using ChatGPT as a data-generating tool. SoundChoice was also investigated for out-of-domain evaluation by testing it on data from a Corpus. The results showed that the RNN model reached a 71% accuracy from a baseline of 59%. A better performance was observed for the transformers model which went from 57% to 74%. Further analysis would be needed to draw more solid conclusions as to the origin of this gap and the models should be trained on Corpus data and tested on ChatGPT data to assess whether ChatGPT-generated data is, indeed, suitable as a replacement for Corpus data. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)