-
[pdf]
[bibtex]@InProceedings{Lu_2025_CVPR, author = {Lu, Xilong and Yu, Jun and Zhang, Yunxiang and Zhu, Lingsi and Zheng, Yang and Wang, Yongqi and Ling, Qiang}, title = {Robust Stage-Wise LVLM Adaptation: Multi-Phase Prompt Lora Fine-tuning for Compound Expression Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {5779-5786} }
Robust Stage-Wise LVLM Adaptation: Multi-Phase Prompt Lora Fine-tuning for Compound Expression Recognition
Abstract
Compound Expression Recognition (CER) is crucial for understanding human emotions and improving human-computer interaction. However, CER faces challenges due to the complexity of facial expressions and the difficulty of capturing subtle emotional cues. To surmount these obstacles, we present a novel approach that harnesses the power of Large Vision-Language Models (LVLMs). Our methodology incorporates a two-stage fine-tuning process, complemented by the design of exclusive prompts. In the first stage, pre-trained LVLMs are fine-tuned on basic facial expressions to establish fundamental patterns. Subsequently, in the second stage, the model is further optimized on a compound-expression dataset to refine the interactions between compound expressions. Our approach has achieved remarkable results. It has attained advanced accuracy on the RAF-DB dataset and demonstrated robust zero-shot generalization on the C-EXPR-DB dataset. Notably, in the 8th ABAW Compound Expression Recognition Challenge, our method secured the first place with an F1 score of 0.5723, highlighting its great potential for real-world applications in emotion analysis and human-computer interaction.
Related Material