Automation of Editorial Tasks on the Website Content Central

University essay from Umeå universitet/Institutionen för datavetenskap

Author: Markus Sköld; [2016]

Keywords: ;

Abstract: Content Central is a website that allows freelance journalists and photographers to upload their work so that media outlets can buy and publish them. Content Central must moderate the content uploaded to assure that everything is of high quality and that it can be published directly. Right now this is done manually with an editor that work at Content Central. The aim of this thesis is to automate the editorial process on Content Central with the use of natural language processing techniques. The focus of the automation is put on the tasks that consume the most time which is spell checking, formatting and word and sign replacement. The automation of these tasks is done by the development of prototypes. The spell checking task is handled with two prototypes, one prototype uses a dictionary and handles non-word errors and the other prototype uses probability and word trigrams and bigrams to handle real word errors. The formatting and sign replacement is handled by a rule-based prototype. These prototypes are tested on data from Content Central and compared with the results from the editor moderating the same data. Problems are found with the spell checkers, they give many false positives and are therefore deemed not so useful. The formatting and sign replacement prototype achieve a 52.8% recall and 98.6% precision which isestimated to decrease the time the editor spend on content with these errors with at least 51 seconds.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)