CLIPping Imbalances: A Novel Evaluation Baseline and PEARL Dataset for Pedestrian Attribute Recognition

Kamalakar Vijay, Lalit Lohani, Kamakshya Prasad Nayak, Debi Prosad Dogra, Heeseung Choi, Hyungjoo Jung, Ig-Jae Kim; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 7102-7111

Abstract


Pedestrian Attribute Recognition (PAR) serves as a fundamental task in computer vision and is crucial for upgrading security systems. It helps in precisely identifying and characterizing various attributes of pedestrians. However current PAR datasets have certain issues in representing a wide range of attributes correctly which makes the existing PAR methods less effective in real-world scenarios. Addressing this limitation this paper introduces PEARL a comprehensive dataset comprising of diverse pedestrian images annotated with 146 attributes. These samples have been sourced from surveillance videos across twelve countries. This paper also formulates an image-based PAR using language-image fusion strategy and utilizes CLIP as a new evaluation baseline. Specifically we leverage textual information by transforming sets of attributes into meaningful sentences. Addressing the inherent data imbalance in PAR we provide three types of prompt settings to optimize the training of the CLIP model. Our evaluation encompasses a thorough assessment of the proposed baseline model across various datasets including PEARL dataset as well as established PAR benchmarks such as PA100K RAP and PETA.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Vijay_2025_WACV, author = {Vijay, Kamalakar and Lohani, Lalit and Nayak, Kamakshya Prasad and Dogra, Debi Prosad and Choi, Heeseung and Jung, Hyungjoo and Kim, Ig-Jae}, title = {CLIPping Imbalances: A Novel Evaluation Baseline and PEARL Dataset for Pedestrian Attribute Recognition}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7102-7111} }