Improvements of the voice activity detector in AMR-WB

University essay from Luleå/Systemteknik

Abstract: In speech coding one can make use of the speech inactivity to reduce the
average bit-rate of the encoded signal. This demands a process commonly
referred to as Voice Activity Detection (VAD) that separates the speech
frames from the frames that only contains background noise. The purpose of
the VAD is to tell the speech encoder to stop or reduce the data flow when
no speech is present. The goal with such a process is to lower the average
bit-rate without affecting the perceived speech quality.

This work is an investigation and evaluation of possible improvements of the
voice activity detector in the Adaptive Multirate Wideband (AMR-WB) speech
coder. The purpose of the work was to reduce the sensitivity to babble
background noise and improve the performance for detection of music. In the
report there is a brief introduction to the theory of speech coding and VAD
followed by the outline of the AMR-WB speech coder. The main part of this
thesis discusses possible improvements of the detector starting with recent
findings in the Adaptive Multirate Narrowband (AMR-NB) algorithm.

Based on the limited material used for evaluation in this work the
modifications proposed for the AMR-NB VAD showed good results also for AMR-
WB. It turned out however that additional modifications should be done in
order to ensure reliable detection of high level non-stationary noises. A
music hangover solution was also suggested for better handling of music when
the suggested modifications are implemented. The solution suggested for
reduction of the sensitivity to babble noises offers a compromise between
voice activity and speech clipping that can be tuned to desired performance.

The results and conclusions in this thesis are based on objective tests of
limited material and contain no formal subjective testing. The conclusions
should therefore be treated as guidance for further studies but indicates
that the solutions proposed will reduce the AMR-WB VADs sensitivity to non-
stationary background noises.

  CLICK HERE TO DOWNLOAD THE WHOLE ESSAY. (in PDF format)