Free Lunch Enhancements for Multi-modal Crowd Counting

Haoliang Meng, Xiaopeng Hong, Zhengqin Lai, Miao Shang; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 14013-14023

Abstract


This paper addresses multi-modal crowd counting with a novel `free lunch' training enhancement strategy that requires no additional data, parameters, or increased inference complexity. First, we introduce a cross-modal alignment technique as a plug-in post-processing step for the pre-trained backbone network, enhancing the model's ability to capture shared information across modalities. Second, we incorporate a regional density supervision mechanism during the fine-tuning stage, which differentiates features in regions with varying crowd densities. Extensive experiments on three multi-modal crowd counting datasets validate our approach, making it the first to achieve an MAE below 10 on RGBT-CC. The code is available at https://github.com/HenryCilence/Free-Lunch-Multimodal-Counting.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Meng_2025_CVPR, author = {Meng, Haoliang and Hong, Xiaopeng and Lai, Zhengqin and Shang, Miao}, title = {Free Lunch Enhancements for Multi-modal Crowd Counting}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {14013-14023} }