Essays about: "VQA"

Showing result 1 - 5 of 6 essays containing the word VQA.

  1. 1. Where to Fuse

    University essay from Lunds universitet/Matematisk statistik

    Author : Lukas Petersson; [2024]
    Keywords : Technology and Engineering;

    Abstract : This thesis investigates fusion techniques in multimodal transformer models, focusing on enhancing the capabilities of large language models in understanding not just text, but also other modalities like images, audio, and sensor data. The study compares late fusion (concatenating modality tokens after separate encoding) and early fusion (concatenating before encoding) techniques, examining their respective advantages and disadvantages. READ MORE

  2. 2. There’s a Microwave in the Hallway

    University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

    Author : Yasmeen Emampoor; [2022-04-20]
    Keywords : embodied question answering; visual question answering; multi-modality; information fusion;

    Abstract : Embodied Question Answering (EQA) is a task in which an agent situated in virtual environment navigates from its current position to an object (Navigation), and then answer a question about it (Visual Question Answering, VQA), for example “What color is the table in the table in the kitchen?” This project examines how an agent modelled as a deep neural network uses semantic information from its language model and visual information to answer questions in the second task. This is important since due to the regular nature of the task and the dataset it could be that the model is answering questions purely based on general semantic information from its language model (tables are frequently brown) and not relying on the visual scene, a phenomenon that is commonly known as hallucinating. READ MORE

  3. 3. Improving Visual Question Answering by Leveraging Depth and Adapting Explainability

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Amrita Kaur Panesar; [2022]
    Keywords : VQA; RGB-D; Explainability; Grad-CAM; Human-Robot Interaction; VQA; RGB-D; Förklarbarhet; Grad-CAM; Samspel människa-robot;

    Abstract : To produce smooth human-robot interactions, it is important for robots to be able to answer users’ questions accurately and provide a suitable explanation for why they arrive to the answer they provide. However, in the wild, the user may ask the robot questions relating to aspects of the scene that the robot is unfamiliar with and hence be unable to answer correctly all of the time. READ MORE

  4. 4. IMPLEMENTING PERCEPTUAL SEMANTICS IN TYPE THEORY WITH RECORDS (TTR)

    University essay from Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori

    Author : Arild Matsson; [2019-11-18]
    Keywords : visual question answering; artificial intelligence; type theory; image recognition; perceptual semantics; spatial relations;

    Abstract : Type Theory with Records (TTR) provides accounts of a wide range of semantic and linguistic phenomena in a single framework. This work proposes a TTR model of perception and language. Utilizing PyTTR, a Python implementation of TTR, the model is then implemented as an executable script. READ MORE

  5. 5. Using Deep Learning to Answer Visual Questions from Blind People

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Denis Dushi; [2019]
    Keywords : Visual Question Answering; VizWiz; Deep Learning.;

    Abstract : A natural application of artificial intelligence is to help blind people overcome their daily visual challenges through AI-based assistive technologies. In this regard, one of the most promising tasks is Visual Question Answering (VQA): the model is presented with an image and a question about this image. It must then predict the correct answer. READ MORE