Essays about: "visual modality"
Showing result 6 - 10 of 30 essays containing the words visual modality.
-
6. Multi-modal Models for Product Similarity : Comparative evaluation of unimodal and multi-modal architectures for product similarity prediction and product retrieval
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : With the rapid growth of e-commerce, enabling effective product recommendation systems and improving product search for shoppers plays a crucial role in driving customer satisfaction. Traditional product retrieval approaches have mainly relied on unimodal models focusing on text data. READ MORE
-
7. There’s a Microwave in the Hallway
University essay from Göteborgs universitet/Institutionen för data- och informationsteknikAbstract : Embodied Question Answering (EQA) is a task in which an agent situated in virtual environment navigates from its current position to an object (Navigation), and then answer a question about it (Visual Question Answering, VQA), for example “What color is the table in the table in the kitchen?” This project examines how an agent modelled as a deep neural network uses semantic information from its language model and visual information to answer questions in the second task. This is important since due to the regular nature of the task and the dataset it could be that the model is answering questions purely based on general semantic information from its language model (tables are frequently brown) and not relying on the visual scene, a phenomenon that is commonly known as hallucinating. READ MORE
-
8. Audiovisual Cross-Modality in Virtual Reality
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : What happens when we see an object of a certain material but the sounds that it makes comes from another material? Whilst it is an interesting question, it is an area that is under researched. Though there has been some previous research in the field the visuals have been represented using textures on simple shapes like cubes or spheres. READ MORE
-
9. News article segmentation using multimodal input : Using Mask R-CNN and sentence transformers
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : In this century and the last, serious efforts have been made to digitize the content housed by libraries across the world. In order to open up these volumes to content-based information retrieval, independent elements such as headlines, body text, bylines, images and captions ideally need to be connected semantically as article-level units. READ MORE
-
10. VL Tasks: Which Models Suit? : Investigate Different Models for Swedish Image-Text Relation Task
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : In common sense, modality measures the number of areas a model covers. Multi-modal or cross-modal models can handle two or more areas simultaneously. Some common cross-models include Vision-Language models, Speech-Language models, and Vision-Speech models. READ MORE