Essays about: "bilingual embeddings"

Found 5 essays containing the words bilingual embeddings.

  1. 1. Extending a Text Classifier to Multiple Languages

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Albin Byström; [2021]
    Keywords : Natural language processing; Multilingual; Transformer; Word embeddings; Text classification; Språkteknologi; Flerspråkig; Transformator; Ordinbäddningar; Textklassificering;

    Abstract : This thesis explores the possibility to extend monolingual and bilingual text classifiers to multiple languages. Two different language models are explored, language aligned word embeddings and a transformer model. The goal was to take a classifier based on Swedish and English samples and extend it to Danish, German, and Finnish samples. READ MORE

  2. 2. Exploring Cross-lingual Sublanguage Classification with Multi-lingual Word Embeddings

    University essay from Linköpings universitet/Statistik och maskininlärning

    Author : Min-Chun Shih; [2020]
    Keywords : ;

    Abstract : Cross-lingual text classification is an important task due to the globalization and the increased availability of multilingual data. This thesis explores the method of implementing cross-lingual classification on Swedish and English medical corpora. READ MORE

  3. 3. Text and Speech Alignment Methods for Speech Translation Corpora Creation : Augmenting English LibriVox Recordings with Italian Textual Translations

    University essay from Uppsala universitet/Institutionen för lingvistik och filologi

    Author : Giuseppe Della Corte; [2020]
    Keywords : speech translation; parallel corpora; bilingual sentence alignment; sentence embeddings; cosine similarity; forced alignment; text collection; corpora creation; audio signal processing;

    Abstract : The recent uprise of end-to-end speech translation models requires a new generation of parallel corpora, composed of a large amount of source language speech utterances aligned with their target language textual translations. We hereby show a pipeline and a set of methods to collect hundreds of hours of English audio-book recordings and align them with their Italian textual translations, using exclusively public domain resources gathered semi-automatically from the web. READ MORE

  4. 4. Low Supervision, Low Corpus size, Low Similarity! Challenges in cross-lingual alignment of word embeddings : An exploration of the limitations of cross-lingual word embedding alignment in truly low resource scenarios

    University essay from Uppsala universitet/Institutionen för lingvistik och filologi

    Author : Andrew Dyer; [2019]
    Keywords : word embeddings; cross-lingual; multilingual; low-resource; corpus size; Vecmap; FastText; alignment; orthogonal; eigenvalues; Laplacian; isospectral; isomorphic; bilingual lexicon induction;

    Abstract : Cross-lingual word embeddings are an increasingly important reseource in cross-lingual methods for NLP, particularly for their role in transfer learning and unsupervised machine translation, purportedly opening up the opportunity for NLP applications for low-resource languages.  However, most research in this area implicitly expects the availablility of vast monolingual corpora for training embeddings, a scenario which is not realistic for many of the world's languages. READ MORE

  5. 5. Word embeddings for monolingual and cross-language domain-specific information retrieval

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Chaya Wigder; [2018]
    Keywords : information retrieval; domain-specific information retrieval; cross-language information retrieval; word embeddings; bilingual embeddings; informationssökning; domänspecifik informationssökning; tvärspråklig informationssökning; ordinbäddningar; tvåspråkiga inbäddningar;

    Abstract : Various studies have shown the usefulness of word embedding models for a wide variety of natural language processing tasks. This thesis examines how word embeddings can be incorporated into domain-specific search engines for both monolingual and cross-language search. READ MORE