Analysis of online news media through visualisation and text clustering

University essay from Uppsala universitet/Informationssystem

Abstract: Online news has grown in frequency and popularity as a convenient source of information for several years. A result of this drastic surge is the increased competition for viewer-ship and prolonged relevance of online news websites. Higher demands by internet audiences have led to the use of sensationalism such as ‘clickbait’ articles or ‘fake news’ to attract more viewers. The subsequent shift in the journalistic approach in new media opened new opportunities to study the behaviour and intent behind the news content. As news publications cater their news to a specific target audience, conclusions about said news outlets and their readers can be deduced from the content they wish to broadcast. In order to understand the nature behind the publication’s choice of producing content, this thesis uses automated text categorisation as a means to analyse the words and phrases used by most news outlets. The thesis acts as a case study for approximately 143,000 online news articles from 15 different publications focused on the United States between the years 2016 and 2017. The focus of this thesis is to create a framework that observes how news articles group themselves based on the most relevant terms in their corpora. Similarly, other forms of analyses were performed to find similar insights that may give an idea about the news structure over a certain period of time. For this thesis, a preliminary quantitative analysis was also conducted before data processing, followed by applying K-means clustering to these articles post-cleansing. The overall categorisation approach and visual analysis provided sufficient data to re-use this framework with further adjustments. The cluster groups deduced that the most common news categories or genres for the selected publications were either politics - with special focus on the U.S. presidential elections - or crime-related news within the U.S and around the world. The visual formations of these clusters heavily implied that the above two categories were distributed even within groups containing other genres like finance or infotainment. Moreover, the added factor of churning out multiple articles and stories per day suggest that mainstream online news websites continue to use broadcast journalism as their main form of communication with their audiences

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)