Essays about: "visual modality"

Showing result 6 - 10 of 30 essays containing the words visual modality.

  1. 6. Multi-modal Models for Product Similarity : Comparative evaluation of unimodal and multi-modal architectures for product similarity prediction and product retrieval

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Christos Frantzolas; [2023]
    Keywords : Computer Vision; Natural Language Processing; Representation Learning; Metric Learning; Multimodal Retrieval; Bildigenkänning; Språkteknologi; Representationsinlärning; Metrisk inlärning; Multimodal informationssökning;

    Abstract : With the rapid growth of e-commerce, enabling effective product recommendation systems and improving product search for shoppers plays a crucial role in driving customer satisfaction. Traditional product retrieval approaches have mainly relied on unimodal models focusing on text data. READ MORE

  2. 7. There’s a Microwave in the Hallway

    University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

    Author : Yasmeen Emampoor; [2022-04-20]
    Keywords : embodied question answering; visual question answering; multi-modality; information fusion;

    Abstract : Embodied Question Answering (EQA) is a task in which an agent situated in virtual environment navigates from its current position to an object (Navigation), and then answer a question about it (Visual Question Answering, VQA), for example “What color is the table in the table in the kitchen?” This project examines how an agent modelled as a deep neural network uses semantic information from its language model and visual information to answer questions in the second task. This is important since due to the regular nature of the task and the dataset it could be that the model is answering questions purely based on general semantic information from its language model (tables are frequently brown) and not relying on the visual scene, a phenomenon that is commonly known as hallucinating. READ MORE

  3. 8. Audiovisual Cross-Modality in Virtual Reality

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Samuel Sandberg Bröms; Emil Hansen; [2022]
    Keywords : Cross-modality; Materials; Virtual Reality; VR; Audiovisual; Audio; Visual; ;

    Abstract : What happens when we see an object of a certain material but the sounds that it makes comes from another material? Whilst it is an interesting question, it is an area that is under researched. Though there has been some previous research in the field the visuals have been represented using textures on simple shapes like cubes or spheres. READ MORE

  4. 9. News article segmentation using multimodal input : Using Mask R-CNN and sentence transformers

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Gustav Henning; [2022]
    Keywords : Historical newspapers; Image segmentation; Multimodal learning; Deep learning; Digital humanities; Mask R-CNN; Historiska tidningar; Bildsegmentering; Multimodal inlärning; Djupinlärning; Digital humaniora; Mask R-CNN;

    Abstract : In this century and the last, serious efforts have been made to digitize the content housed by libraries across the world. In order to open up these volumes to content-based information retrieval, independent elements such as headlines, body text, bylines, images and captions ideally need to be connected semantically as article-level units. READ MORE

  5. 10. VL Tasks: Which Models Suit? : Investigate Different Models for Swedish Image-Text Relation Task

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Meinan Gou; [2022]
    Keywords : BERT; Visual-Language; Language Understanding; Object Detection; Multimodality; BERT; Visual-Language; Språkförståelse; Objektdetektion; Multimodalitet;

    Abstract : In common sense, modality measures the number of areas a model covers. Multi-modal or cross-modal models can handle two or more areas simultaneously. Some common cross-models include Vision-Language models, Speech-Language models, and Vision-Speech models. READ MORE