Leveraging Dominant Language Image Tags for Automatic Image Annotation in Minor Languages

University essay from Institutionen för informationsteknologi

Author: Hjalmar Wennerström; [2010]

Keywords: ;

Abstract: Image annotations, often in the form of tags, are very useful when indexing large image collections. They provide an intuitive human centered way to search and browse images using text queries. However, tagging images is very time consuming to do manually so researchers have developed methods for automatic image tagging. These methods rely on a set of example images with tags to learn what images should be associated with which tags. One thing that has been overlooked with these systems is the fact that example images with tags are different in each language. Generally researchers have only made English automatic tagging systems and not considered the problems of building equally good systems in other minor languages where it is more difficult to obtain example images and tags. In this thesis we study how an automatic tagging system in Japanese compares to an automatic tagging system in English. We find that the Japanese system suffers in performance and based on this we improve the performance by leveraging the dominant English language system. We compare an automatic translation of the tags using a dictionary to our proposed translation matrix method. Our method estimates the translation of tags based on the co-occurrence of different language tags in images. We show that our proposed method using very simple heuristics performs about the same as a high end machine translator in the case of automatic tagging systems. There are several improvements to be made but with this work we show that the conceptual idea is strong, giving reasons to improve it further. The main contribution of our approach is the ability to translate words that a dictionary cannot interpret as well as considering the context when establishing a translation.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)