Multimodal Machine Learning in Human Motion Analysis

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Currently, most long-term human motion classification and prediction tasks are driven by spatio-temporal data of the human trunk. In addition, data with multiple modalities can change idiosyncratically with human motion, such as electromyography (EMG) of specific muscles and respiratory rhythm. On the other hand, progress in Artificial Intelligence research on the collaborative understanding of image, video, audio, and semantics mainly relies on MultiModal Machine Learning (MMML). This work explores human motion classification strategies with multi-modality information using MMML. The research is conducted using the Unige-Maastricht Dance dataset. Attention-based Deep Learning architectures are proposed for modal fusion on three levels: 1) feature fusion by Component Attention Network (CANet); 2) model fusion by fusing Graph Convolution Network (GCN) with CANet innovatively; 3) and late fusion by a simple voting. These all successfully exceed the benchmark of single motion modality. Moreover, the effect of each modality in each fusion method is analyzed by comprehensive comparison experiments. Finally, statistical analysis and visualization of the attention scores are performed to assist the distillation of the most informative temporal/component cues characterizing two qualities of motion.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)