Classification of Acoustic Scenes Using Convolutional Neural Networks

University essay from Lunds universitet/Matematisk statistik

Abstract: Minut is a startup company that builds a camera-free home monitor called Point. This thesis is about investigating the possibilities for Point to be able to use machine learning techniques for classification of acoustic scenes, in particular to detect if a party is ongoing in the home where Point is located. Machine learning is a mathematical field that uses data to learn models from which one can – if successful – make good predictions about the future. The interest in this field, and in particular a type of models called artificial neural networks has the last few years become massive, the main reason being the recent access to powerful hardware and lots of data, which has made these models exceptional at certain tasks. Artificial neural networks are huge mathematical functions with millions of tunable parameters, which makes them very flexible. By showing the networks lots of data and specifying which output that is desired, the learning algorithm of the network is able to learn the mapping between input and output. Convolutional neural networks is in this thesis used to classify acoustic scenes, this is done by showing the network a time- frequency representation of audio together with the correct label. One of the built networks, which we call SlimNet, is a very small network, but yet it is able to distinguish parties from other acoustic scenes with 98 % accuracy. It is also found that the data representation of an acoustic scene does not have to be very large for a neural network to be able to classify it correctly, which is desired since Point has hardware limitations.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)