A lightweight deep learning architecture for text embedding : Comparison between the usage of Transformers and Mixers for textual embedding

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Text embedding is a widely used method for comparing pieces of text together by mapping them to a compact vector space. One such application is deduplication which consists in finding textual records that refer to the same underlying idea in order to merge them or delete one of them. The current state of the art in this domain uses the Transformer architecture trained on a large corpus of text. In this work, we evaluate the performance of a recently proposed architecture: the Mixer. It offers two key advantages, its parameter count scale linearly with the context window and it is built of simple MLP blocks that benefit from hardware acceleration. We found a 26% increase in performance when using the Mixer compared to the Transformer for a model of similar size.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)