Matching Sticky Notes Using Latent Representations

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: his project addresses the issue of accurately identifying repeated images of sticky notes. Due to environmental conditions and the 3D location of the camera, different pictures taken of sticky notes may look distinct enough to be hard to determine if they belong to the same note. More specifically, this thesis aims to create latent representations of these pictures of sticky notes to encode their content so that all the pictures of the same note have a similar representation that allows to identify them. Thus, those representations must be invariant to light conditions, blur and camera position. To that end, a Siamese neural architecture will be trained based on data augmentation methods. The method consists of learning to embed two augmented versions of the same image into similar representations. This architecture has been trained with unsupervised learning and fine-tuned with supervised learning to detect if two representations belong or not to the same note. The performance of ResNet, EfficientNet and Vision Transformers in encoding the images into their representations has been compared with different configurations. The results show that, while the most complex models overfit small amounts of data, the simplest encoders are capable of properly identifying more than 95% of the sticky notes in grey scale. Those models can create invariant representations that are close to each other in the latent space for pictures of the same sticky note. Gathering more data could result in an improvement of the performance of the model and the possibility of applying it to other fields such as handwritten documents.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)