Emotionally expressive song synthesis using formants and syllables

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Dexter Gramfors; Andreas Johansson; [2014]

Keywords: ;

Abstract: Speech synthesis is an area of computer science with many practical uses, such as enabling people with visual impairments to take part of text and to provide more human-like feedback from information systems. A similar area of research is text-to-song, where systems comparable to those used in text-to-speech pro- vide mappings from text to melodic units of song. This paper discusses how a text-to-song algorithm can be developed and what parameters affect what emotion is communicated. Fifty participants listened to music generated with our algorithm. Results show that tempo and mode both heavily account for what emotion is communicated; a melody performed with a tempo of 250 bpm was perceived as significantly more happy than a performance with a tempo of 120 bpm, and a melody in major tonality was perceived as significantly more happy than a melody in minor tonality. Combined, these parameters gave even more significant results. A fast tempo combined with major tonality produced a performance that was perceived as even more happy. The opposite was observed when a slow tempo was combined with minor tonality. When a fast tempo was combined with a minor tonality the average answer was neu- tral with answers distributed over the whole spectrum from sad to happy. A slow tempo combined with a major tonality gave almost identical results. We concluded that generating emotionally expressive song with the use of an al- gorithm is definitely possible, but that the methodology can be improved in order to convey emotions even more clearly. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)