Sequential Aggregation of Textual Features forDomain Independent Author Identication

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Linda Eriksson; [2014]

Keywords: ;

Abstract: In the area of Author Identication many approaches have been made to identify the author of a written text. By identifying the individual variation that can be found in texts, features can be calculated. These feature values are commonly calculated by normalizing the values to an average valueover the whole text. When using this kind of Simple features much of the variation that can be found in texts will not get captured. This project intends to use the sequential nature of the text to denie Sequential featuresat sentence level. The theory is that the Sequential features will be able to capture more of the variation that can be found in the texts, compared to the Simple features. To evaluate these features a classication of authors was made on several dierent datasets. The result showed that the Sequential features performs better than the Simple features in some cases, however the dierence was not large enough to confirm the theory of them being better than the Simple features.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)