Towards Reporting Bias in Visual-Language Datasets: Bi-modal Data Augmentation by Decoupling Object-Attribute Association

Wu, Qiyu; Zhao, Mengjie; He, Yutong; Huang, Lang; Ono, Junya; Wakaki, Hiromi; Mitsufuji, Yuki

Qiyu Wu, Mengjie Zhao, Yutong He, Lang Huang, Junya Ono, Hiromi Wakaki, Yuki Mitsufuji; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 4157-4166

Abstract

Reporting bias occurs when people assume universal derstanding and omit explicit details. In this paper, we focus on the wide existence of object-attribute association in vision-language datasets, which is caused by reporting bias and can consequentially degrade models trained on them. To mitigate this, we propose a bi-modal augmentation (BiAug) approach through object-attribute decoupling to flexibly synthesize vision-language examples with a rich array of object-attribute pairing, and through constructing cross-modal hard negative vision-language examples. First, BiAug decouples object-attribute associations. Cross-modal verified object candidates are extracted, followed by generation of contradictive attributes of the candidates. Second, BiAug synthesizes hard negative vision-language examples. Objects with generated attributes are integrated into both the image and the caption through an image inpainting model and a large language model, respectively. By finishing the two steps, the synthesized examples explicitly complement the omitted objects and attributes of the original examples; the hard negative examples steer the model to distinguish various attributes for an identical object. Extensive experiments and analysis demonstrated that the model trained with our augmented dataset excels in object-attribute comprehension.

Related Material

[pdf]

[bibtex]

@InProceedings{Wu_2025_ICCV, author = {Wu, Qiyu and Zhao, Mengjie and He, Yutong and Huang, Lang and Ono, Junya and Wakaki, Hiromi and Mitsufuji, Yuki}, title = {Towards Reporting Bias in Visual-Language Datasets: Bi-modal Data Augmentation by Decoupling Object-Attribute Association}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {4157-4166} }