Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Shayakh Islam; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, 2025, pp. 5278-5282

Abstract


Pretraining datasets are foundational to the development of multimodal models, yet they often have inherent biases and toxic content from the web-scale corpora they are sourced from. In this paper, we investigate the prevalence of toxicity in LLaVA image-text pretraining dataset, examining how harmful content manifests in different modalities. We present a comprehensive analysis of common toxicity categories and propose targeted mitigation strategies, resulting in the creation of a refined toxicity-mitigated dataset. This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training dataset. We offer guidelines for implementing robust toxicity detection pipelines. Our findings underscore the need to actively identify and filter toxic content - such as hate speech, explicit imagery, and targeted harassment - to build more responsible and equitable multimodal systems. The toxicity-mitigated dataset is open source and is available for further research.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Alam_2025_CVPR, author = {Alam, Nahid and Kanjula, Karthik Reddy and Guthikonda, Surya and Islam, Shayakh}, title = {Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {5278-5282} }