Unstructured pruning of pre-trained language models tuned for sentiment classification.

University essay from KTH/Matematisk statistik

Author: Oscar Nordström; [2022]

Keywords: Unstructured pruning; transformer; BERT; sentiment classification; natural language processing; neural networks; deep learning; Ostrukturerad pruning; transformer; BERT; sentimentklassifikation; språkbehandling; neurala nätverk; djup inlärning;

Abstract: Transformer-based models are frequently used in natural language processing. These models are oftenlarge and pre-trained for general language understanding and then fine-tuned for a specific task. Becausethese models are large, they have a high memory requirement and have high inference time. Severalmodel compression techniques have been developed in order to reduce the mentioned disadvantageswithout significantly reducing the inference performance of the models. This thesis studies unstructuredpruning method, which are pruning methods that do not follow a predetermined pattern when removingparameters, to understand which parameters can be removed from language models and the impact ofremoving a significant portion of a model's parameters. Specifically, magnitude pruning, movementpruning, soft movement pruning, and $L_0$ regularization were applied to the pre-trained languagemodels BERT and M-BERT. The pre-trained models in turn were fine-tuned for sentiment classificationtasks, which refers to the task of classifying a given sentence to predetermined labels, such as positive ornegative. Magnitude pruning worked the best when pruning the models to a ratio of 15\% of the models'original parameters, while soft movement pruning worked the best for the weight ratio of 3\%. Formovement pruning, we were not able to achieve satisfying results for binary sentiment classification.From investigating the pruning patterns derived from soft movement pruning and $L_0$ regularization, itwas found that a large portion of the parameters from the last transformer blocks in the model architecturecould be removed without significantly reducing the model performance. An example of interestingfurther work is to remove the last transformer blocks altogether and investigate if an increase in inferencespeed is attained without significantly reducing the performance.

AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)

Unstructured pruning of pre-trained language models tuned for sentiment classification.

Searchphrases right now

Popular searches

popular essays yesterday (2024-04-24)