LLAVAGUARD: VLM-based Safeguard for Vision Dataset Curation and Safety Assessment

Lukas Helff, Felix Friedrich, Manuel Brack, Patrick Schramowski, Kristian Kersting; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 8322-8326

Abstract


We introduce LlavaGuard a family of multimodal safe- guard models based on Llava offering a robust framework for evaluating the safety compliance of vision datasets and models. Our models come with a new taxonomy designed for assessing safety risks within visual data. With this safety taxonomy we have collected and annotated a high-quality dataset to guide Vision-Language Models (VLMs) in safety. We present models in two sizes namely LlavaGuard-7b and LlavaGuard-13b both safety-tuned on our novel annotated dataset to perform policy-based safety assessments of visual content. In this context LlavaGuard goes beyond binary safety classification by providing information on the violated safety categories a detailed explanation and a final assessment. In our evaluations our models demonstrate state-of-the-art performance with LlavaGuard-13b exhibiting the best results while the much smaller LlavaGuard-7b model outperforms the much larger Llava-34b baseline. Furthermore LlavaGuard is designed to allow for customization of the safety taxonomy to align with specific use cases facilitating zero-shot prompting with individual policies for tailored content moderation.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Helff_2024_CVPR, author = {Helff, Lukas and Friedrich, Felix and Brack, Manuel and Schramowski, Patrick and Kersting, Kristian}, title = {LLAVAGUARD: VLM-based Safeguard for Vision Dataset Curation and Safety Assessment}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {8322-8326} }