Developing Optical Character Recoginition for Ethiopic Scripts

University essay from Högskolan Dalarna/Datateknik

Author: Fitsum Demissie; [2011]

Keywords: Ethiopic; Geez; Amharic; SVM; OCR; Latin; Non-Latin.;

Abstract:

The Amharic language is the Official language of over 70 million people mainly in Ethiopia. An extensive literature survey and the government report reveal no single Amharic character recognition is found in the country. The Amharic script has 33 basic characters each with seven orders giving 310 distinct characters, including numbers and punctuation symbols. The characters are visually similar; there is a typeface, but no capitalization. Beside this there is no any standard font to use the language in the computer but they use different fonts developed by different stakeholders without keeping a standard on their own way and interest and this create a problem of incompatibility between different fonts and documents.This project is to investigate the reason why Amharic optical character recognition is not addressed by local and international researchers and developers and finally to develop Amharic optical character recognition uses the features and facilities of Microsoft windows Vista or 7 using Unicode standard.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)