-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Xu_2025_WACV, author = {Xu, Zhiyuan and Chen, Yinhe and Gao, Huan-ang and Zhao, Weiyan and Zhang, Guiyu and Zhao, Hao}, title = {Diffusion-Based Visual Anagram as Multi-Task Learning}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {919-928} }
Diffusion-Based Visual Anagram as Multi-Task Learning
Abstract
Visual anagrams are images that change appearance upon transformation like flipping or rotation. With the advent of diffusion models generating such optical illusions can be achieved by averaging noise across multiple views during the reverse denoising process. However we observe two critical failure modes in this approach: (i) concept segregation where concepts in different views are independently generated which can not be considered a true anagram and (ii) concept domination where certain concepts overpower others. In this work we cast the visual anagram generation problem in a multi-task learning setting where different viewpoint prompts are analogous to different tasks and derive denoising trajectories that align well across tasks simultaneously. At the core of our designed framework are two newly introduced techniques where (i) an anti-segregation optimization strategy that promotes overlap in cross-attention maps between different concepts and (ii) a noise vector balancing method that adaptively adjusts the influence of different tasks. Additionally we observe that directly averaging noise predictions yields suboptimal performance because statistical properties may not be preserved prompting us to derive a noise variance rectification method. Extensive qualitative and quantitative experiments demonstrate our method's superior ability to generate visual anagrams spanning diverse concepts.
Related Material