News Value Modeling and Prediction using Textual Features and Machine Learning
Abstract: News value assessment has been done forever in the news media industry and is today often done in real-time without any documentation. Editors take a lot of different qualitative aspects into consideration when deciding what news stories will make it to the first page. This thesis explores how the complex news value assessment process can be translated into a quantitative model, and also how those news values can be predicted in an effective way using machine learning and NLP. Two models for news value were constructed, for which the correlation between modeled and manual news values was measured, and the results show that the more complex model gives a higher correlation. For prediction, different types of features are extracted, Random Forest and SVM are used, and the predictions are evaluated with accuracy, F1-score, RMSE, and MAE. Random Forest shows the best results for all metrics on all datasets, the best result being on the largest dataset, probably due to the smaller datasets having a less even distribution between classes.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)