Exploiting CLIP Self-Consistency to Automate Image Augmentation for Safety Critical Scenarios

Sujan Sai Gannamaneni, Frederic Klein, Michael Mock, Maram Akila; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 3594-3604

Abstract


With the current interest in deploying machine learning (ML) models in safety-critical applications like automated driving (AD) there is increased effort in developing sophisticated testing techniques for evaluating the models. One of the primary requirements for testing is the availability of test data particularly test data that captures the long tail distributions of traffic events. As such data collection in the real world is hazardous there is also a necessity for generating synthetic data using simulators or deep learning-based approaches. We propose a pipeline to generate augmented safety-critical scenes of the Cityscapes dataset using pre-trained SOTA latent diffusion models with additional conditioning using text and OpenPose-based ControlNet where we have fine-grained control of the attributes of the generated pedestrians. In addition we propose a filtering mechanism similar to self-consistency checks in large language models (LLMs) to improve the quality of the generated data regarding the adherence to generated attributes reaching 25% improvement in our experiments. Finally using pre-trained SOTA segmentation models on Cityscapes we evaluate the generated dataset's viability by qualitatively evaluating the predicted segmentation maps.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Gannamaneni_2024_CVPR, author = {Gannamaneni, Sujan Sai and Klein, Frederic and Mock, Michael and Akila, Maram}, title = {Exploiting CLIP Self-Consistency to Automate Image Augmentation for Safety Critical Scenarios}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {3594-3604} }