Machine Learning for Detecting Hate Speech in Low Resource Languages

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: This work examines the role of both cross-lingual zero-shot learning and data augmentationin detecting hate speech online for low resource set-ups. The proposedsolutions for situations where the amount of labeled data is scarce are to use alanguage with more resources during training or to create synthetic data points.Cross-lingual zero-shot results suggest some knowledge transfer is occurring. However,results seem greatly influenced by the specific training data set selected. Thisis further supported by cross-data set experimentation within the same language,where results were also found to fluctuate based on training data without the needfor cross-lingual transfer. Meanwhile, data augmentation methods show an improvement,especially for low amounts of data. Furthermore, a detailed discussionon how the proposed data augmentation techniques impact the data is presented inthis work.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)