Visual Representation Learning through Causal Intervention for Controllable Image Editing

Shanshan Huang, Haoxuan Li, Chunyuan Zheng, Lei Wang, Guorui Liao, Zhili Gong, Huayi Yang, Li Liu; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 23484-23493

Abstract


A key challenge for controllable image editing is that visual attributes with semantic meanings are not always independent, resulting in spurious correlations in model training. However, most existing methods ignore such issues, leading to biased causal visual representation learning and unintended changes to unrelated regions or attributes in the edited images. To bridge this gap, we propose a diffusion-based causal visual representation learning framework called CIDiffuser to capture causal representations of visual attributes based on structural causal models to address the spurious correlation. Specifically, we first decompose the image representation into a high-level semantic representation for core attributes of the image and a low-level stochastic representation for other random or less structured aspects, with the former extracted by a semantic encoder and the latter derived via a stochastic encoder. We then introduce a causal effect learning module to capture the direct causal effect, that is, the difference of potential outcomes before and after intervening on the visual attributes. In addition, a diffusion-based learning strategy is designed to optimize the representation learning process. Empirical evaluations on two benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods, enabling highly controllable image editing by modifying learned visual representations.

Related Material


[pdf]
[bibtex]
@InProceedings{Huang_2025_CVPR, author = {Huang, Shanshan and Li, Haoxuan and Zheng, Chunyuan and Wang, Lei and Liao, Guorui and Gong, Zhili and Yang, Huayi and Liu, Li}, title = {Visual Representation Learning through Causal Intervention for Controllable Image Editing}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {23484-23493} }