How do voiceprints age?

University essay from Uppsala universitet/Institutionen för lingvistik och filologi

Author: Maya Konstantinovna Nachesa; [2023]

Keywords: Voiceprint; Speaker Emotion Recognition; Age; Speaker Verification;

Abstract: Voiceprints, like fingerprints, are a biometric. Where fingerprints record a person's unique pattern on their finger, voiceprints record what a person's voice "sounds like", abstracting away from what the person said. They have been used in speaker recognition, including verification and identification. In other words, they have been used to ask "is this speaker who they say they are?" or "who is this speaker?", respectively. However, people age, and so do their voices. Do voiceprints age, too? That is, can a person's voice change enough that after a while, the original voiceprint can no longer be used to identify them? In this thesis, I use Swedish audio recordings from Riksdagen's (the Swedish parliament) debate speeches to test this idea. Depending on the answer, this influences how well we can search the database for previously unmarked speeches. I find that speaker verification performance decreases as the age-gap between voiceprints increases, and that it decreases more strongly after roughly five years. Additionally, I grouped the speakers into age groups spanning five years, and found that speaker verification has the highest performance for those for whom the initial voiceprint was recorded from 29-33 years of age. Additionally, longer input speech provides higher quality voiceprints, with performance improvements stagnating when voiceprints become longer than 30 seconds. Finally, voiceprints for men age more strongly than those for women after roughly 5 years. I also investigated how emotions are encoded in voiceprints, since this could potentially impede in speaker recognition. I found that it is possible to train a classifier to recognise emotions from voiceprints, and that this classifier does better when recognising emotions from known speakers. That is, emotions are encoded more characteristically per person as opposed to per emotion itself. As such, they are unlikely to interfere with speaker recognition.

AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)

How do voiceprints age?

Searchphrases right now

Popular searches

popular essays yesterday (2024-04-26)