Handwritten Recognition for Ethiopic (Ge’ez) Ancient Manuscript Documents

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: The handwritten recognition system is a process of learning a pattern from a given image of text. The recognition process usually combines a computer vision task with sequence learning techniques. Transcribing texts from the scanned image remains a challenging problem, especially when the documents are highly degraded, or have excessive dusty noises. Nowadays, there are several handwritten recognition systems both commercially and in free versions, especially for Latin based languages. However, there is no prior study that has been built for Ge’ez handwritten ancient manuscript documents. In contrast, the language has many mysteries of the past, in human history of science, architecture, medicine and astronomy. In this thesis, we present two separate recognition systems. (1) A character-level recognition system which combines computer vision for character segmentation from ancient books and a vanilla Convolutional Neural Network (CNN) to recognize characters. (2) An end- to- end segmentation free handwritten recognition system using CNN, Multi-Dimensional Recurrent Neural Network (MDRNN) with Connectionist Temporal Classification (CTC) for the Ethiopic (Ge’ez) manuscript documents. The proposed character label recognition model outperforms 97.78% accuracy. In contrast, the second model provides an encouraging result which indicates to further study the language properties for better recognition of all the ancient books. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)