IDENTIFYING HATE SPEECH IN SOCIAL MEDIA THROUGH CONTENT AND SOCIAL CONNECTIONS ANALYSIS

University essay from Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori

Abstract: Hate speech is a problem which puts its targets at risk of serious harm. It spreads fast and has a real influence on the society because of the ubiquity of the internet and social media, and so various research efforts have been put to find solutions to automatic hate speech detection. Despite major developments in the field, challenges with data scarcity and characteristics often cause solutions reported in previous research to overfit the datasets that were used to train and test them, which results in dramatic performance losses and failures in generalization. This study addressed this issue, it tried to find a solution that would mitigate overfitting effects originating from these issues and enhance language-based classifier with extra user information concerning one’s social connections. It compared two single-source models – one based on textual information, and the other based on information concerning one’s social connections and proposed a joint decision engine that selects the model whose class assignment was more certain for a given instance. Although the single-source models’ performance dropped drastically on test data, the joint decision engine succeeded in reducing some of the issues related to overfitting, improving the overall performance. This observation suggests that simple solutions might be efficient in reducing model overfit and paves the way towards validating these findings.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)