Named Entity Recognition for Search Queries in the Music Domain

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Abstract: This thesis addresses the problem of named entity recognition (NER) in music-related search queries. NER is the task of identifying keywords in text and classifying them into predefined categories. Previous work in the field has mainly focused on longer documents of editorial texts. However, in recent years, the application of NER for queries has attracted increased attention. This task is, however, acknowledged to be challenging due to queries being short, ungrammatical and containing minimal linguistic context. The usage of NER for queries is especially useful for the implementation of natural language queries in domain-specific search applications. These applications are often backed by a database, where the query format otherwise is restricted to keyword search or the usage of a formal query language. In this thesis, two techniques for NER for music-related queries are evaluated; a conditional random field based solution and a probabilistic solution based on context words. As a baseline, the most elementary implementation of NER, commonly applied on editorial text, is used. Both of the evaluated approaches outperform the baseline and demonstrate an overall F1 score of 79.2% and 63.4% respectively. The experimental results show a high precision for the probabilistic approach and the conditional random field based solution demonstrates an F1 score comparable to previous studies from other domains.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)