Exploring Deep Learning Approaches to Cleft Lip and Palate Speech

University essay from Lunds universitet/Matematisk statistik

Abstract: Cleft lip and palate belong to the most common deformities present at birth. The condition hampers normal speech development in children, and treatment involves both surgery and regular sessions with a speech pathologist. The speech pathologist assesses the child’s speech impairment stemming from the condition on a three-point scale: "Competent", "Marginally incompetent" and "Incompetent" and the rating forms a basis for future treatment decisions. This procedure is time and resource intensive since close examination of the entire recording is necessary for an accurate aggregate rating. Furthermore, field experience is that the assigned rating for a singular recording can be biased and the ratings from different speech pathologists are inconsistent. In this thesis, deep learning methods are used to classify audio recordings of children into the three categories. The ambition of this undertaking is to rid the classification of bias and provide speech pathologists with a consistent baseline rating. Different steps in the pre-processing of speech therapy recordings are explored to transform the raw audio input into meaningful information for a neural network. The best performing network structure was a convolutional neural network model and it manages to classify recordings with a 89.76% accuracy by using Mel-spectrograms on 0.2 seconds of pre-processed audio segments. Recommendations about further work is discussed with the end goal of developing a fully automatic classifier with appropriate data gathering methods.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)