AttriVision: Advancing Generalization in Pedestrian Attribute Recognition using CLIP

Mehran Adibi Sedeh, Assia Benbihi, Romain Martin, Marianne Clausel, Cédric Pradalier; Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 354-365

Abstract


Pedestrian Attribute Recognition (PAR) is a critical task in computer vision that identifies semantic attributes such as gender age clothing and accessories from images of individuals. This task is essential in applications such as surveillance smart city infrastructure and security systems. Despite significant advances in deep learning PAR remains challenging due to strong imbalances in the attribute classes and the need for robust generalization across different datasets and environments. In this work we address these two limitations with AttriVision a novel approach that adopts the generic CLIP features to make PAR better generalize and introduces a new Focal Cross-Entropy (FCE) loss function to handle the inherent class imbalance in PAR datasets. FCE improves the model's robustness by giving more weight to difficult-to-classify samples. Our method also demonstrates remarkable transferability to other attribute recognition tasks such as vehicle attributes without any architectural modifications. This transferability makes AttriVision a powerful and versatile tool for attribute recognition. We validate our approach on the Unified Pedestrian Attribute Recognition (UPAR) dataset that integrates data from several sources including PA100K PETA RAPv2 and Market1501. AttriVision achieves new state-of-the-art results on UPAR with a mean accuracy of 89.4% and an F1 score of 91.9%. These results demonstrate the model's effectiveness in handling real-world variability including differences in image sensors viewing conditions and person densities making it highly suitable for a wide range of real-world applications.

Related Material


[pdf]
[bibtex]
@InProceedings{Sedeh_2025_WACV, author = {Sedeh, Mehran Adibi and Benbihi, Assia and Martin, Romain and Clausel, Marianne and Pradalier, C\'edric}, title = {AttriVision: Advancing Generalization in Pedestrian Attribute Recognition using CLIP}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {354-365} }