Machine Transcription Conversion Between Perso-Arabic and Romanized Writing Systems

University essay from Institutionen för datavetenskap

Author: Maziar Yaesoubi; [2010]

Keywords: ;

Abstract: Perso-Arabic script is the official writing system in Iran. Romanized transcriptions, based on phonology of Persian, have been extensively used in electronic communications especially on Internet. Dealing with the conversion between these two types of writing systems has been an interesting topic in Natural Language Processing. Similar to Machine Translation, these conversions can be applied at different grammatical layers; such as sentence, phrase or word layer. In this thesis, by choosing Dabire as a standard Romanized transcription, we introduce two approaches to achieve such conversions at word level. In Lexicon-based approach we use Finite State Technology for bi-directional conversion between Perso-Arabic and Dabire. The second approach uses association analysis for statistical conversion from Perso-Arabic to Dabire.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)