Machine Learning for Detecting Hate Speech in Low Resource Languages

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: This work examines the role of both cross-lingual zero-shot learning and data augmentation in detecting hate speech online for low resource set-ups. The proposed solutions for situations where the amount of labeled data is scarce are to use a language with more resources during training or to create synthetic data points. Cross-lingual zero-shot results suggest some knowledge transfer is occurring. However, results seem greatly influenced by the specific training data set selected. This is further supported by cross-data set experimentation within the same language, where results were also found to fluctuate based on training data without the need for cross-lingual transfer. Meanwhile, data augmentation methods show an improvement, especially for low amounts of data. Furthermore, a detailed discussion on how the proposed data augmentation techniques impact the data is presented in this work.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)