BERTie Bott’s Every Flavor Labels : A Tasty Guide to Developing a Semantic Role Labeling Model for Galician

University essay from Uppsala universitet/Institutionen för lingvistik och filologi

Author: Micaella Bruton; [2023]

Keywords: natural language processing; NLP; Galician; low-resource language; low resource language; semantic role labeling; SRL; mBERT; XLM-R; transfer-learning; transfer learning; Spanish; verbal indexing; procesamento de linguaxe natural; NLP; Galego; lingua de recursos limitados; etiquetado de papeis semánticos; SRL; mBERT; XLM-R; aprendizaxe por transferencia; Español; indexación verbal; språkteknologiska verktyg; NLP; naturlig språkbehandling; galiciska; språk med begränsade resurser; semantisk rollmärkning; SRL; mBERT; XLM-R; överföringsinlärning; spanska; verbal indexering; verbalindexering; procesamiento del lenguaje natural; NLP; Gallego; idioma de bajos recursos; etiquetado de roles semánticos; SRL; mBERT; XLM-R; aprendizaje por transferencia; Español; indexación verbal;

Abstract: For the vast majority of languages, Natural Language Processing (NLP) tools are either absent entirely, or leave much to be desired in their final performance. Despite having nearly 4 million speakers, one such low-resource language is Galician. In an effort to expand available NLP resources, this project sought to construct a dataset for Semantic Role Labeling (SRL) and produce a baseline for future research to use in comparisons. SRL is a task which has shown success in amplifying the final output for various NLP systems, including Machine Translation and other interactive language models. This project was successful in that fact and produced 24 SRL models and two SRL datasets; one Galician and one Spanish. mBERT and XLM-R were chosen as the baseline architectures; additional models were first pre-trained on the SRL task in a language other than the target to measure the effects of transfer-learning. Scores are reported on a scale of 0.0-1.0. The best performing Galician SRL model achieved an f1 score of 0.74, introducing a baseline for future Galician SRL systems. The best performing Spanish SRL model achieved an f1 score of 0.83, outperforming the baseline set by the 2009 CoNLL Shared Task by 0.025. A pre-processing method, verbal indexing, was also introduced which allowed for increased performance in the SRL parsing of highly complex sentences; effects were amplified in scenarios where the model was both pre-trained and fine-tuned on datasets utilizing the method, but still visible even when only used during fine-tuning.

AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)

BERTie Bott’s Every Flavor Labels : A Tasty Guide to Developing a Semantic Role Labeling Model for Galician

Searchphrases right now

Popular searches

popular essays yesterday (2024-04-26)