Analyzing Toxicity in YouTube Comments with the Help of Machine Learning

University essay from Stockholms universitet/Institutionen för data- och systemvetenskap

Abstract: Toxic comments are overall likely to make someone feel uncomfortable and leave a discussion and are therefore potentially problematic. Toxic comments occur online on various social media, and depending on the site, get detected manually or via machine learning algorithms (or both), and removed depending on the severity and other factors. The problem is the lack of research on toxic comments on Swedish YouTube channels, meaning that content creators, especially new ones, will be unfamiliar with and unprepared for these toxic comments. We aim to expand research in this area by finding out not only the proportion of comments on Swedish YouTube channels that are toxic, but what type of toxic comments occur, and what types are the most common. A Survey of documents was the chosen research strategy, and mixed methods were used as well, by combining qualitative and quantitative data analysis, with more focus on the quantitative aspect. A random sample of 79 577 YouTube comments was collected as data, and the machine learning program Hatescan was used to generate a toxicity score for each comment, allowing us to sort these comments based on score, and sample to manually analyze the type of toxicity of these comments. The results show that 0.643% of the total comments analyzed were toxic. It was found that most of the toxic comments are directed toward someone from the video. Toxic comments in the form of personal insults, and toxic comments about someone’s intelligence/competence were by far the most common.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)