Automatic Handwritten Digit Recognition On Document Images Using Machine Learning Methods

University essay from Blekinge Tekniska Högskola

Abstract: Context: The main purpose of this thesis is to build an automatic handwritten digit recognition method for the recognition of connected handwritten digit strings. To accomplish the recognition task, first, the digits were segmented into individual digits. Then, a digit recognition module is employed to classify each segmented digit completing the handwritten digit string recognition task. In this study, different machine learning methods, which are SVM, ANN and CNN architectures are used to achieve high performance on the digit string recognition problem. In these methods, images of digit strings are trained with the SVM, ANN and CNN model with HOG feature vectors and Deep learning methods structure by sliding a fixed size window through the images labeling each sub-image as a part of a digit or not. After the completion of the segmentation, to achieve the complete recognition of handwritten digits.Objective: The main purpose of this thesis is to find out the recognition performance of the methods. In order to analyze the performance of the methods, data is needed to be used for training using machine learning methods. Then digit data is tested on the desired machine learning technique. In this thesis, the following methods are performed: Implementation of HOG Feature extraction method with SVM Implementation of HOG Feature extraction method with ANN Implementation of Deep Learning methods with CNN Methods: This research will be carried out using two methods. The first research method is the ¨Literature Review¨ and the second ¨Experiment¨. Initially, a literature review is conducted to get a clear knowledge on the algorithms and techniques which will be used to answer the first research question i.e., to know which type of data is required for the machine learning methods and the data analysis is performed. Later on, with the knowledge of RQ1, Experimentation is conducted to answer the RQ2, RQ3, RQ4. Quantitative data is used to perform the experimentation because qualitative data which obtains from case-study and survey cannot be used for this experiment method as it contains non-numerical data. In this research, an experiment is conducted to find the best suitable machine learning method from the existing methods. As mentioned above in the objectives, an experiment is conducted using SVM, ANN, and CNN. By considering the results obtained from the experiment, a comparison is made on the metrics considered which results in CNN as the best method suitable for Documents Images. Results: Compare the results for SVM, ANN with HOG Feature extraction and the CNN method by using segmented results. Based on the Experiment results it is found that SVM and ANN have some drawbacks like low accuracy and low performance in the recognition of documented images. So, the other method i.e., CNN has greater performance with high accuracy. The following are the results of the recognition rates of each method. SVM performance - 39% ANN performance - 37% CNN performance - 71%. Conclusion: This research concentrates on providing an efficient method for recognition of automatic handwritten digits recognition. Here a sample training data is treated with existing machine learning and deep learning methods like SVM, ANN, and CNN. By the results obtained from the experimentation, it clearly is shown that the CNN method is much efficient with 71% performance when compared to ANN and SVM methods. Keywords: Handwritten Digit Recognition, Handwritten Digit Segmentation, Handwritten Digit Classification, Machine Learning Methods, Deep Learning, Image processing on document images, Support Vector Machine, Conventional Neural Networks, Artificial Neural Networks

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)