Clustering of Image Search Results to Support Historical Document Recognition

University essay from Blekinge Tekniska Högskola/Institutionen för datalogi och datorsystemteknik

Abstract: Context. Image searching in historical handwritten documents is a challenging problem in computer vision and pattern recognition. The amount of documents which have been digitalized is increasing each day, and the task to find occurrences of a selected sub-image in a collection of documents has special interest for historians and genealogist. Objectives. This thesis develops a technique for image searching in historical documents. Divided in three phases, first the document is segmented into sub-images according to the words on it. These sub-images are defined by a features vector with measurable attributes of its content. And based on these vectors, a clustering algorithm computes the distance between vectors to decide which images match with the selected by the user. Methods. The research methodology is experimentation. A quasi-experiment is designed based on repeated measures over a single group of data. The image processing, features selection, and clustering approach are the independent variables; whereas the accuracies measurements are the dependent variable. This design provides a measurement net based on a set of outcomes related to each other. Results. The statistical analysis is based on the F1 score to measure the accuracy of the experimental results. This test analyses the accuracy of the experiment regarding to its true positives, false positives, and false negatives detected. The average F-measure for the experiment conducted is F1 = 0.59, whereas the actual performance value of the method is matching ratio of 66.4%. Conclusions. This thesis provides a starting point in order to develop a search engine for historical document collections based on pattern recognition. The main research findings are focused in image enhancement and segmentation for degraded documents, and image matching based on features definition and cluster analysis.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)