Named Entity Recognition for Social Media Text

University essay from Uppsala universitet/Institutionen för lingvistik och filologi

Author: Yaxi Zhang; [2019]

Keywords: ;

Abstract: This thesis aims to perform named entity recognition for English social media texts. Named Entity Recognition (NER) is applied in many NLP tasks as an important preprocessing procedure. Social media texts contain lots of real-time data and therefore serve as a valuable source for information extraction. Nevertheless, NER for social media texts is a rather challenging task due to the noisy context. Traditional approaches to deal with this task use hand-crafted features but prove to be both time-consuming and very task-specific. As a result, they fail to deliver satisfactory performance. The goal of this thesis is to tackle this task by automatically identifying and annotating the named entities with multiple types with the help of neural network methods. In this thesis, we experiment with three different word embeddings and character embedding neural network architectures that combine long short- term memory (LSTM), bidirectional LSTM (BI-LSTM) and conditional random field (CRF) to get the best result. The data and evaluation tool comes from the previous shared tasks on Noisy User-generated Text (W- NUT) in 2017. We achieve the best F1 score 42.44 using BI-LSTM-CRF with character-level representation extracted by a BI-LSTM, and pre-trained word embeddings trained by GloVe. We also find out that the results could be improved with larger training data sets.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)