Identification model of musical works using record linkage

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Pierre Cournut; [2019]

Keywords: ;

Abstract: This thesis is based on a project that is part of IBM’s collaboration with a Collecting Right Organization that collects and distributes payments of authors’ rights. The project aimed at helping this organization identify right beneficiaries for musical tracks listened on online streaming platforms. Given as an input a list of tracks composed of metadata such as artist names, titles and listening statistics, the goal was to match each line with its corresponding element in this organization’s documentation. Since each broadcaster has its own catalogue of music, it can be hard sometimes to find the correct matching for each song. In practice, this organization has a dedicated team that handles manually some of the non-trivial cases. Whereas their identification process focuses on resources that contribute to 90% of the revenue of each listening report, it achieves an identification rate of around 70% of the resources declared which represent a substantial amount of unprocessed tracks left aside. In this thesis, we investigate the possibility to outperform the current solution and design a new identification model that combines concepts and technologies from various fields including search engines, string metrics and machine learning. First, the identification process used by the organization was reproduced and refined to quickly process the most trivial cases. On top of this, an identification algorithm that relies on a machine learning framework was built to process non-trivial cases. This method showed very promising results since it achieves an identification rate and a false discovery rate of the order of those of the current solution without the use of a dedicated team of experts.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)