Investigating CLIP Performance for Meta-Data Generation in AD Datasets

Sujan Sai Gannamaneni, Arwin Sadaghiani, Rohil Prakash Rao, Michael Mock, Maram Akila; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 3840-3850


Using Machine Learning (ML) models for safety-critical perception tasks in Autonomous Driving (AD) or other domains requires a thorough evaluation of the model performance and the data coverage w.r.t. the intended Operational Design Domain (ODD). However, obtaining the needed per-image semantic meta-data along the relevant dimensions of the ODD for real-world image datasets is non-trivial. Recent advances in self-supervised foundation models, specifically CLIP, suggest that such meta-data could be obtained for real-world images in an automated fashion using zero-shot classification. While CLIP was already reported to achieve promising performance on tasks such as the recognition of gender or age on facial images, we investigate to which extent less prominent and more fine-grained observables, e.g., presence of accessories such as spectacles or the shirt- or hair-color, can be determined. We provide an analysis of CLIP for generating fine-grained meta-data on three datasets from the AD domain, one of synthetic origin including ground truth, the others being Cityscapes and Railsem19. We also compare with a standard facial dataset where more elaborate attribute annotations are present. To improve the quality of generated meta-data, we additionally extend the ensemble approach of CLIP by a simple noise-suppressing technique.

Related Material

[pdf] [supp]
@InProceedings{Gannamaneni_2023_CVPR, author = {Gannamaneni, Sujan Sai and Sadaghiani, Arwin and Rao, Rohil Prakash and Mock, Michael and Akila, Maram}, title = {Investigating CLIP Performance for Meta-Data Generation in AD Datasets}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {3840-3850} }