Text-aided object segmentation and classification in images

University essay from Lunds universitet/Matematik LTH

Abstract: Object recognition in images is a popular research field with many applications including medicine, robotics and face recognition. The task of automatically finding and identifying objects in an image is challenging in the extreme. By looking at the problem from a new angle and including additional information beside the visual, the problem becomes less ill posed. In this thesis we investigate how the addition of text annotations to images affects the classification process. Classifications of different sets of labels as well as clusters of labels were carried out. A comparison between the results from using only visual information and from also including information from an image description is given. In most cases the additional information improved the accuracy of the classification. The obtained results were then used to design an algorithm that could, given an image with a description, find relevant words from the text and mark their presence in the image. A large set of overlapping segments is generated and each segment is classified into a set of categories. The image descriptions are parsed by an algorithm (a so called chunker) and visually relevant words (key-nouns) are extracted from the text. These key-nouns are then connected to the categories by metrics from WordNet. To create an optimal assignment of the visual segments to the key-nouns combinatorial optimization was used. The resulting system was compared to manually segmented and classified images. The results are promising and have given rise to several new ideas for continued research.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)