Adaptive content-based sound compression

University essay from Lunds universitet/Avdelningen för Biomedicinsk teknik

Abstract: Three different classification solutions for distinguishing events from non-events in surveillance audio are described and evaluated. The three compared methods are energy functions, Gaussian mixture modelling algorithms (GMM) and existing voice activity detectors (VAD). Recorded test signals with corresponding manually labelled ground truth files are used to determine the accuracies of the methods. The GMM algorithm performed best with an accuracy of 86.63 % in average over different environments, amount of activity and amount of noise. This can be compared to the accuracy of the energy functions (60.08 %) and the VADs (77.85 %). With this method, the data size is reduced from 57.87 to 25.83 MB per hour on average when using an instant bitrate decrease for non-event segments, compared to constantly recording the audio with a high bitrate. This is a reduction in storage size of 55.37 % for the test files which contain 39.9 % activity. A technology we call gradual bitrate decline is also implemented which reduces the bitrate slowly over time instead of instantly after an event has happened. The technology improves the listening experience with the trade-off of taking up more disk space. The noise level at which the GMM algorithm has the best results is with a signal-to-noise ratio (SNR) around 20 dB, where over 97 % of the events are classified as events, and with an accuracy above 92 % for the measured test file. For lower SNR, fewer events are found where for example only 22 % of the events are found at an SNR of 0 dB. The algorithm does not work with outdoor noises such as wind and it is instead constructed and optimised for indoor use.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)