Towards End-User Understanding: Exploring Explanations For Profanity Detection

University essay from Umeå universitet/Institutionen för datavetenskap

Author: Noah Öberg; [2023]

Keywords: ;

Abstract: Current text classification models can accurately identify instances of specific categories, such as hate speech or bad language, but they often don’t provide clear explanations to the end user for their decisions. This can lead to confusion or mistrust in the results, especially in sensitive applications where the consequences of misclassification can be significant. To address this issue, the work in this thesis explores two ways of adding explanations to a classification model. We trained the classification model on a large dataset using three different machine learning algorithms to determine which one was best suited for the task. The profanity explanations were then generated using two different approaches. The first is a ”naive” approach. When a text is classified as profane, the selected model tries to identify substrings until it finds the part that is classified as profanity. This way, the user will know which part was considered not okay. The second approach is to ”group” the dataset into multiple categories with predefined explanations. This way, the trained model is able to find more subtle hateful content that the ”naive” approach might miss out on.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)