Essays about: "Visual Question Answering"

Showing result 1 - 5 of 12 essays containing the words Visual Question Answering.

  1. 1. Where to Fuse

    University essay from Lunds universitet/Matematisk statistik

    Author : Lukas Petersson; [2024]
    Keywords : Technology and Engineering;

    Abstract : This thesis investigates fusion techniques in multimodal transformer models, focusing on enhancing the capabilities of large language models in understanding not just text, but also other modalities like images, audio, and sensor data. The study compares late fusion (concatenating modality tokens after separate encoding) and early fusion (concatenating before encoding) techniques, examining their respective advantages and disadvantages. READ MORE

  2. 2. COMPLEXITY & RANDOMNESS Exploring the Limits of Pattern Perception

    University essay from Institutionen för tillämpad informationsteknologi

    Author : Beppe Rådvik; Joel Pettersson; [2023-02-01]
    Keywords : pattern perception; complexity; randomness; Aksentijevic-Gibson complexity; Visual short-term memory;

    Abstract : Pattern perception is a core part of human cognition, however, our capacity to process patterns is limited. If a pattern is too complex to process, we no longer perceive it as a pattern but rather as noise, thus we hypothesize that there is a limit to human pattern perception that can be measured in terms of the complexity of the pattern. READ MORE

  3. 3. There’s a Microwave in the Hallway

    University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

    Author : Yasmeen Emampoor; [2022-04-20]
    Keywords : embodied question answering; visual question answering; multi-modality; information fusion;

    Abstract : Embodied Question Answering (EQA) is a task in which an agent situated in virtual environment navigates from its current position to an object (Navigation), and then answer a question about it (Visual Question Answering, VQA), for example “What color is the table in the table in the kitchen?” This project examines how an agent modelled as a deep neural network uses semantic information from its language model and visual information to answer questions in the second task. This is important since due to the regular nature of the task and the dataset it could be that the model is answering questions purely based on general semantic information from its language model (tables are frequently brown) and not relying on the visual scene, a phenomenon that is commonly known as hallucinating. READ MORE

  4. 4. Improving Visual Question Answering by Leveraging Depth and Adapting Explainability

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Amrita Kaur Panesar; [2022]
    Keywords : VQA; RGB-D; Explainability; Grad-CAM; Human-Robot Interaction; VQA; RGB-D; Förklarbarhet; Grad-CAM; Samspel människa-robot;

    Abstract : To produce smooth human-robot interactions, it is important for robots to be able to answer users’ questions accurately and provide a suitable explanation for why they arrive to the answer they provide. However, in the wild, the user may ask the robot questions relating to aspects of the scene that the robot is unfamiliar with and hence be unable to answer correctly all of the time. READ MORE

  5. 5. EMBODIED QUESTION ANSWERING IN ROBOTIC ENVIRONMENT Automatic generation of a synthetic question-answer data-set

    University essay from Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori

    Author : Ali Aruqi; [2021-11-12]
    Keywords : Embodied Question Answering; Question Generation; Spatial Relations; Synthetic Data-sets; Multi-Modality;

    Abstract : Embodied question answering is the task of asking a robot about objects in a 3D environment. The robot has to navigate the environment, find the entities in question, and then stop to answer the question. The answering system consists of navigation and visual-question-answering components. READ MORE