-
[pdf]
[supp]
[bibtex]@InProceedings{Madan_2025_WACV, author = {Madan, S. and Ghosh, S. and Sookha, L. R. and Ganaie, M.A. and Subramanian, R. and Dhall, A. and Gedeon, T.}, title = {MIP-GAF: A MLLM-Annotated Benchmark for Most Important Person Localization and Group Context Understanding}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {1467-1476} }
MIP-GAF: A MLLM-Annotated Benchmark for Most Important Person Localization and Group Context Understanding
Abstract
Estimating the Most Important Person (MIP) in any social event setup is a challenging problem mainly due to contextual complexity and scarcity of labeled data. Moreover the causality aspects of MIP estimation are quite subjective and diverse. To this end we aim to address the problem by annotating a large-scale 'in-the-wild' dataset for identifying human perceptions about the 'Most Important Person (MIP)' in an image. The paper provides a thorough description of our proposed Multimodal Large Language Model (MLLM) based data annotation strategy and a thorough data quality analysis. Further we perform a comprehensive benchmarking of the proposed dataset utilizing state-of-the-art MIP localization methods indicating a significant drop in performance compared to existing datasets. The performance drop shows that the existing MIP localization algorithms must be more robust with respect to 'in-the-wild' situations. We believe the proposed dataset will play a vital role in building the next-generation social situation understanding methods. The dataset and associated code will be made available for research purposes.
Related Material