Contextual Augmented Global Contrast for Multimodal Intent Recognition

Kaili Sun, Zhiwen Xie, Mang Ye, Huyin Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 26963-26973

Abstract


Multimodal intent recognition (MIR) aims to perceive the human intent polarity via language visual and acoustic modalities. The inherent intent ambiguity makes it challenging to recognize in multimodal scenarios. Existing MIR methods tend to model the individual video independently ignoring global contextual information across videos. This learning manner inevitably introduces perception biases exacerbated by the inconsistencies of the multimodal representation amplifying the intent uncertainty. This challenge motivates us to explore effective global context modeling. Thus we propose a context-augmented global contrast (CAGC) method to capture rich global context features by mining both intra-and cross-video context interactions for MIR. Concretely we design a context-augmented transformer module to extract global context dependencies across videos. To further alleviate error accumulation and interference we develop a cross-video bank that retrieves effective video sources by considering both intentional tendency and video similarity. Furthermore we introduce a global context-guided contrastive learning scheme designed to mitigate inconsistencies arising from global context and individual modalities in different feature spaces. This scheme incorporates global cues as the supervision to capture robust the multimodal intent representation. Experiments demonstrate CAGC obtains superior performance than state-of-the-art MIR methods. We also generalize our approach to a closely related task multimodal sentiment analysis achieving the comparable performance.

Related Material


[pdf]
[bibtex]
@InProceedings{Sun_2024_CVPR, author = {Sun, Kaili and Xie, Zhiwen and Ye, Mang and Zhang, Huyin}, title = {Contextual Augmented Global Contrast for Multimodal Intent Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {26963-26973} }