Essays about: "Statistical Machine Translation"

Showing result 1 - 5 of 12 essays containing the words Statistical Machine Translation.

  1. 1. Syntax-based Concept Alignment for Machine Translation

    University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

    Author : Arianna Masciolini; [2023-03-30]
    Keywords : computational linguistic; machine translation; concept alignment; syntax; dependency parsing; Universal Dependencies; Grammatical Framework;

    Abstract : This thesis presents a syntax-based approach to Concept Alignment (CA), the task of finding semantical correspondences between parts of multilingual parallel texts, with a focus on Machine Translation (MT). Two variants of CA are taken into account: Concept Extraction (CE), whose aim is to identify new concepts by means of mere linguistic comparison, and Concept Propagation (CP), which consists in looking for the translation equivalents of a set of known concepts in a new language. READ MORE

  2. 2. Text and Speech Alignment Methods for Speech Translation Corpora Creation : Augmenting English LibriVox Recordings with Italian Textual Translations

    University essay from Uppsala universitet/Institutionen för lingvistik och filologi

    Author : Giuseppe Della Corte; [2020]
    Keywords : speech translation; parallel corpora; bilingual sentence alignment; sentence embeddings; cosine similarity; forced alignment; text collection; corpora creation; audio signal processing;

    Abstract : The recent uprise of end-to-end speech translation models requires a new generation of parallel corpora, composed of a large amount of source language speech utterances aligned with their target language textual translations. We hereby show a pipeline and a set of methods to collect hundreds of hours of English audio-book recordings and align them with their Italian textual translations, using exclusively public domain resources gathered semi-automatically from the web. READ MORE

  3. 3. Spelling Normalization of English Student Writings

    University essay from Uppsala universitet/Institutionen för lingvistik och filologi

    Author : Yuchan HONG; [2018]
    Keywords : spelling normalization; English student writings; phonetic similarity comparison; Levenshtein edit distance; character-based statistical machine translation; character-based neural machine translation;

    Abstract : Spelling normalization is the task to normalize non-standard words into standard words in texts, resulting in a decrease in out-of-vocabulary (OOV) words in texts for natural language processing (NLP) tasks such as information retrieval, machine translation, and opinion mining, improving the performance of various NLP applications on normalized texts. In this thesis, we explore different methods for spelling normalization of English student writings including traditional Levenshtein edit distance comparison, phonetic similarity comparison, character-based Statistical Machine Translation (SMT) and character-based Neural Machine Translation (NMT) methods. READ MORE

  4. 4. Automatic Identification of Duplicates in Literature in Multiple Languages

    University essay from Linköpings universitet/Statistik och maskininlärning

    Author : Emil Klasson Svensson; [2018]
    Keywords : Text Mining Topic Model Polylingual PLT Named Entity Recgonition NER Statistics Machine Learning Duplicate Detection Litterature Fiction Books Book Natural Language Processing NLP;

    Abstract : As the the amount of books available online the sizes of each these collections are at the same pace growing larger and more commonly in multiple languages. Many of these cor- pora contain duplicates in form of various editions or translations of books. READ MORE

  5. 5. Hybrid Machine Translation : Choosing the best translation with Support Vector Machines

    University essay from Uppsala universitet/Institutionen för informationsteknologi

    Author : Hannes Karlbom; [2016]
    Keywords : ;

    Abstract : In the field of machine translation there are various systems available which have different strengths and weaknesses. This thesis investigates the combination of two systems, a rule based one and a statistical one, to see if such a hybrid system can provide higher quality translations. READ MORE