End-to-end Learning for Singing-Language Identification
Abstract: Singing-language identification (SLID) consists in identifying the language of the sung lyrics directly from a given music recording. This task is of spe- cial interest to music-streaming businesses who benefit from music localiza- tion applications. However, language is a complex semantic quality of music recordings, making the finding and exploiting of its characteristic features ex- tremely challenging. In recent years, most Music Information Retrieval (MIR) research efforts have been directed to problems that are not related to language, and most of the progress in speech recognition methods stay far from musical applications. This works investigates the SLID problem, its challenges and limitations, with the aim of finding a novel solution that effectively leverages the power of deep learning architectures and a relatively large-scale private dataset. As part of the dataset pre-processing, a novel method for identifying the high-level structure of songs is proposed. As the classifier model, a Temporal Convolu- tional Network (TCN) is trained and evaluated on music recordings belonging to seven of the most prominent languages in the global music market. Although results show much lower performance with respect to the current state-of-the-art, a thorough discussion is realized with the purpose of explor- ing the limitations of SLID, identifying the causes of the poor performance, and expanding the current knowledge about the SLID problem. Future im- provements and lines of work are delineated, attempting to stimulate further research in this direction.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)