Introducing a Hierarchical Attention Transformer for document embeddings : Utilizing state-of-the-art word embeddings to generate numerical representations of text documents for classification

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Viktor Karlsson; [2019]

Keywords: ;

Abstract: The field of Natural Language Processing has produced a plethora of algorithms for creating numerical representations of words or subsets thereof. These representations encode the semantics of each unit which for word level tasks enable immediate utilization. Document level tasks on the other hand require special treatment in order for fixed length representations to be generated from varying length documents. We develop the Hierarchical Attention Transformer (HAT), a neural network model which utilizes the hierarchical nature of written text for creating document representations. The network rely entirely on attention which enables interpretability of its inferences and context to be attended from anywhere within the sequence. We compare our proposed model to current state-of-the-art algorithms in three scenarios: Datasets of documents with an average length (1) less than three paragraphs, (2) greater than an entire page and (3) greater than an entire page with a limited amount of training documents. HAT outperforms its competition in case 1 and 2, reducing the relative error up to 33% and 32.5% for case 1 and 2 respectively. HAT becomes increasingly difficult to optimize in case 3 where it did not perform better than its competitors.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)