Event-Centric Clustering of News Articles

University essay from Institutionen för informationsteknologi

Author: Jon Borglund; [2013]

Keywords: ;

Abstract: Entertainity AB plans to build a news service to provide news to end-users in an innovative way. The service must include a way to automatically group series of news from different sources and publications, based on the stories they are covering.This thesis include three contributions: a survey of known clustering methods, an evaluation of human versus human results when grouping news articles in an event-centric manner, and last an evaluation of an incremental clustering algorithm to see if it is possible to consider a reduced input size and still get a sufficient result.The conclusions are that the result of the human evaluation indicates that users are different enough to warrant a need to take that into account when evaluating algorithms. It is also important that this difference is considered when conducting cluster analysis to avoid overfitting. The evaluation of an incremental event-centric algorithm shows it is desirable to adjust the similarity threshold, depending on what result one want. When running tests with different input sizes, the result implies that a short summary of a news article is a natural feature selection when performing cluster analysis.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)