Predicting Swedish News Article Popularity

University essay from Linköpings universitet/Interaktiva och kognitiva system

Abstract: In this work, 132,229 articles from a Swedish news publisher are used to explore news article popularity prediction. Linear-, k-Nearest Neighbor- and Support Vector Regression are evaluated using the two different metrics root mean squared error and R2. The problem is then relaxed into only attempting to rank the articles relative to each other. The prediction problem is also explored as a classification problem using the classes Low, Mid and High popularity. The classifiers evaluated are Naive Bayes and SVM using pre-defined features and using a Bag-of-words feature set. The results were analyzed to understand what information they can bring to the editors at the publisher and news agencies in general. The results clearly showed that the manually set metadata newsvalue had a large impact on article performance. A survey was done with editors to compare human prediction performance with the classifier performance. Although the SVM classifier performs with higher accuracy than the editors (59% vs 55%) the models are considered weak in their current state.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)