Essays about: "multimodal input"

Showing result 1 - 5 of 24 essays containing the words multimodal input.

  1. 1. Where to Fuse

    University essay from Lunds universitet/Matematisk statistik

    Author : Lukas Petersson; [2024]
    Keywords : Technology and Engineering;

    Abstract : This thesis investigates fusion techniques in multimodal transformer models, focusing on enhancing the capabilities of large language models in understanding not just text, but also other modalities like images, audio, and sensor data. The study compares late fusion (concatenating modality tokens after separate encoding) and early fusion (concatenating before encoding) techniques, examining their respective advantages and disadvantages. READ MORE

  2. 2. Automated Interpretation of Lung Ultrasound for COVID-19 and Tuberculosis diagnosis

    University essay from Lunds universitet/Matematik LTH

    Author : Chloé Soormally; [2023]
    Keywords : Tuberculosis; COVID-19; Lung Ultrasound; Computer-aided detection CAD ; Deep learning; Technology and Engineering;

    Abstract : BACKGROUND. Early and accurate detection of infectious respiratory diseases like COVID-19 and tuberculosis (TB) plays a crucial role in effective management and the reduction of preventable mortality. READ MORE

  3. 3. Real-time visual feedback of emotional expression in singing

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Xuehua Fu; [2023]
    Keywords : Sound and Music Computing; Music Visualization; Cross-modal Feedback; Mapping; Music Expressivity; Multimodal Interaction; Ljud-och musikberäkning; Musikvisualisering; Korsmodal återkoppling; Kartläggning; Musikexpressivitet; Multimodal interaktion;

    Abstract : The thesis project concerns the development and evaluation of a real-time music visualization system aimed at creating a multi-modal perceptual experience of music emotions. The purpose of the project is to provide singers with real-time visual feedback on their singing, to enhance their expression of emotions in the music. READ MORE

  4. 4. Text-Driven Fashion Image Manipulation with GANs : A case study in full-body human image manipulation in fashion

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Reza Dadfar; [2023]
    Keywords : Multimodal fashion image editing; Generative adversarial network inversion; Text-driven image manipulation; TD-GEM; Multimodal modebildredigering; Generativa adverserial Nätverk inversion; Text-driven bildmanipulation; TD-GEM;

    Abstract : Language-based fashion image editing has promising applications in design, sustainability, and art. However, it is considered a challenging problem in computer vision and graphics. The diversity of human poses and the complexity of clothing shapes and textures make the editing problem difficult. READ MORE

  5. 5. Playstyle Generation with Multimodal Generative Adversarial Imitation Learning : Style-reward from Human Demonstration for Playtesting Agents

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : William Ahlberg; [2023]
    Keywords : Imitation Learning; Reinforcement Learning; Game-testing; Imitationsinlärning; Förstärkande inlärning; Speltestning;

    Abstract : Playtesting plays a crucial role in video game production. The presence of gameplay issues and faulty design choices can be of great detriment to the overall player experience. READ MORE