The Impact of Scaling Down a Language Model Used for Text Summarization

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Marcus Lindström; [2020]

Keywords: ;

Abstract: Machine learning based language models have achieved state-of-the-art results on a variety of tasks. However, their applicability in commercial uses is becoming increasingly difficult to motivate due to their ever-increasing size. The tradeoff between performance and training/inference times is the usual suspect, often leading to the use of less sophisticated solutions. In this thesis, I investigate how a trained Transformer-based language model specialized in generating sentence embeddings can be scaled down using a technique known as Knowledge Distillation. I evaluate both the original model and its distilled counterpart on the SentEval STS-benchmarks, and also through human evaluation on extractive summaries generated by clustering their embeddings. My results show that a 7.5 times smaller model not only operates at over twice the speed but achieves almost 98% of the original model’s average result on the STS tasks. Furthermore, the human evaluation shows that, when subjectively assessed, summaries generated by the smaller model are significantly better.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)