Transfer Learning for Multilingual Offensive Language Detection with BERT
Abstract: The popularity of social media platforms has led to an increase in user-generated content being posted on the Internet. Users, masked behind what they perceive as anonymity, can express offensive and hateful thoughts on these platforms, creating a need to detect and filter abusive content. Since the amount of data available on the Internet is impossible to analyze manually, automatic tools are the most effective choice for detecting offensive and abusive messages. Academic research on the detection of offensive language on social media has been on the rise in recent years, with more and more shared tasks being organized on the topic. State-of-the-art deep-learning models such as BERT have achieved promising results on offensive language detection in English. However, multilingual offensive language detection systems, which focus on several languages at once, have remained underexplored until recently. In this thesis, we investigate whether transfer learning can be useful for improving the performance of a classifier for detecting offensive speech in Danish, Greek, Arabic, Turkish, German, and Italian. More specifically, we first experiment with using machine-translated data as input to a classifier. This allows us to evaluate whether machine translated data can help classification. We then experiment with fine-tuning multiple pre-trained BERT models at once. This parallel fine-tuning process, named multi-channel BERT (Sohn and Lee, 2019), allows us to exploit cross-lingual information with the goal of understanding its impact on the detection of offensive language. Both the use of machine translated data and the exploitation of cross-lingual information could help the task of detecting offensive language in cases in which there is little or no annotated data available, for example for low-resource languages. We find that using machine translated data, either exclusively or mixed with gold data, to train a classifier on the task can often improve its performance. Furthermore, we find that fine-tuning multiple BERT models in parallel can positively impact classification, although it can lead to robustness issues for some languages.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)