Song Popularity Prediction with Deep Learning : Investigating predictive power of low level audio features

University essay from Luleå tekniska universitet/Institutionen för system- och rymdteknik

Author: Gustaf Holst; Jan Niia; [2023]

Keywords: machine learning; deep learning; audio;

Abstract: Today streaming services are the most popular way to consume music, and with this the field of Music Information Retrieval (MIR) has exploded. Tangy market is a music investment platform and they want to use MIR techniques to estimate the value of not yet released songs. In this thesis we collaborate with them investigating how a song’s financial success can be predicted using machine learning models. Previous research has shown that well-known algorithms used for tasks such as image recognition and machine translation, also can be used for audio analysis and prediction. We show that a lot of previous work has been done regarding different aspects of audio analysis and prediction, but that most of that work has been related to genre classification and hit song prediction. The popularity prediction of audio is still quite new and this is where we will contribute by researching if low-level audio features can be used to predict streams. We are using an existing dataset with more than 100 000 songs containing low-level features, which we extend with streaming information. We are using the features in two shapes, summarized and full, and the dataset only contains the summarized digital representation of features. We use Librosa to extend the dataset to also have the digital representation of the full version for the audio features.  A previous study by Martín-Gutiérrez et al. [1] successfully used a combination of low-level and high level audio features as well as non musical features such as number of social media followers. The aim of this thesis is to explore five of the low-level features used in a previous study in [1] in order to assess the predictive power that these features have on their own. The five features we explore is; Chromagram, Mel Spectrogram, Tonnetz, Spectral Contrast, and MFCC. These features are selected for our research specifically because they were used in [1], and we want to investigate to what extent these low-level features contribute to the final predictions made by their model. Our conclusion is that neither of these features could be used for prediction with any accuracy, which indicates that other high-level and external features are of more importance. However, Chromagram and Mel Spectrogram in their full feature states show some potential but they will need to be researched more.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)