Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

Lin, Yuxiang; Sun, Jingdong; Cheng, Zhi-Qi; Wang, Jue; Liang, Haomin; Cheng, Zebang; Dong, Yifei; He, Jun-Yan; Peng, Xiaojiang; Hua, Xian-Sheng

Yuxiang Lin, Jingdong Sun, Zhi-Qi Cheng, Jue Wang, Haomin Liang, Zebang Cheng, Yifei Dong, Jun-Yan He, Xiaojiang Peng, Xian-Sheng Hua; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 5235-5245

Abstract

Most existing emotion analysis emphasizes which emotion arises (e.g., happy, sad, angry) but neglects the deeper why. We propose Emotion Interpretation (EI), focusing on causal factors--whether explicit (e.g., observable objects, interpersonal interactions) or implicit (e.g., cultural context, off-screen events)--that drive emotional responses. Unlike traditional emotion recognition, EI tasks require reasoning about triggers instead of mere labeling. To facilitate EI research, we present EIBench, a large-scale benchmark encompassing \num 1615 basic EI samples and \num 50 complex EI samples featuring multifaceted emotions. Each instance demands rationale-based explanations rather than straightforward categorization. We further propose a Coarse-to-Fine Self-Ask (CFSA) annotation pipeline, which guides Vision-Language Models (VLLMs) through iterative question-answer rounds to yield high-quality labels at scale. Extensive evaluations on open-source and proprietary large language models under four experimental settings reveal consistent performance gaps--especially for more intricate scenarios--underscoring EI's potential to enrich empathetic, context-aware AI applications. Our benchmark and methods are publicly available at \href https://github.com/Lum1104/EIBench https://github.com/Lum1104/EIBench , offering a foundation for advanced multimodal causal analysis and next-generation affective computing.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Lin_2025_CVPR, author = {Lin, Yuxiang and Sun, Jingdong and Cheng, Zhi-Qi and Wang, Jue and Liang, Haomin and Cheng, Zebang and Dong, Yifei and He, Jun-Yan and Peng, Xiaojiang and Hua, Xian-Sheng}, title = {Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {5235-5245} }