Multi-Modal Correlated Network with Emotional Reasoning Knowledge for Social Intelligence Question-Answering

Baijun Xie, Chung Hyuk Park; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 3075-3081

Abstract


The capacity for social reasoning is essential to the development of social intelligence in humans, which we easily acquire through study and experience. The acquisition of such ability by machines, however, is still challenging, even with the diverse deep learning models that are currently available. Recent artificial social intelligence models have achieved state-of-the-art results in question-answering tasks by employing a variety of methods, including self-supervised setups, multi-modal inputs, and so on. However, there is still a gap in the literature regarding the introduction of commonsense knowledge when training the model in social intelligence tasks. In this paper, we propose a Multi-Modal Temporal Correlated Network with Emotional Social Cues (MMTC-ESC). In order to model cross-modal correlations, an attention-based mechanism is used, and contrastive learning is achieved using emotional social cues. Our findings indicate that combining multimodal inputs and using contrastive loss is advantageous for the performance of social intelligence learning.

Related Material


[pdf]
[bibtex]
@InProceedings{Xie_2023_ICCV, author = {Xie, Baijun and Park, Chung Hyuk}, title = {Multi-Modal Correlated Network with Emotional Reasoning Knowledge for Social Intelligence Question-Answering}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3075-3081} }