Hierarchical Fusion Approaches for Enhancing Multimodal Emotion Recognition in Dialogue-Based Systems : A Systematic Study of Multimodal Emotion Recognition Fusion Strategy

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Multimodal Emotion Recognition (MER) has gained increasing attention due to its exceptional performance. In this thesis, we evaluate feature-level fusion, decision-level fusion, and two proposed hierarchical fusion methods for MER systems using a dialogue-based dataset. The first hierarchical approach integrates abstract features across different temporal levels by employing RNN-based and transformer-based context modeling techniques to capture nearby and global context respectively. The second hierarchical strategy incorporates shared information between modalities by facilitating modality interactions through attention mechanisms. Results reveal that RNN-based hierarchical fusion surpasses the baseline by 2%, while transformer-based context modeling and modality interaction methods improve accuracy by 0.5% and 0.6%, respectively. These findings underscore the significance of capturing meaningful emotional cues in nearby context and emotional invariants in dialogue MER systems. We also emphasize the crucial role of text modality. Overall, our research highlights the potential of hierarchical fusion approaches for enhancing MER system performance, presenting systematic strategies supported by empirical evidence.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)