Integration of Image and Word Embeddings for Descriptive Image Similarity

University essay from Lunds universitet/Matematik LTH

Abstract: Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user. To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network originally trained for object detection. The transfer and training is carried out in the machine learning framework TensorFlow. The described approach shows promising performance in general and a thorough comparison of different layouts is carried out. The best model is tested, qualitatively as well as quantitatively through a task-specific custom evaluation scheme and on common benchmark datasets. The conclusion based on these results is that the proposed system is well suited for the given tasks, and that it opens up for a number of interesting extensions.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)