MIRAGE: Unsupervised Single Image to Novel View Generation with Cross Attention Guidance

Llukman Cerkezi, Aram Davtyan, Sepehr Sameni, Paolo Favaro; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 3799-3809

Abstract


This paper introduces a novel pipeline to generate novel views of an object from a single image. Our method, MIRAGE, trains a pose-conditioned diffusion model on a dataset of real images of multiple unknown categories, all completely unsupervised. The conditioning is obtained via clustering pre-trained self-supervised features to identify approximate object categories and poses. At inference time, we introduce hard-attention guidance and apply cross-view attention to align the appearance of the objects in the generated views with that in the input image. Through our experiments, we show that MIRAGE generates novel views that are on par or better than supervised methods in terms of image realism and 3D consistency. Furthermore, MIRAGE is robust to diverse textures and geometries, not restricted to simple rigid rotations, and is capable of generating plausible deformations of nonrigid objects, such as animals. Code available at: https://github.com/llukmancerkezi/mirage

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Cerkezi_2025_ICCV, author = {Cerkezi, Llukman and Davtyan, Aram and Sameni, Sepehr and Favaro, Paolo}, title = {MIRAGE: Unsupervised Single Image to Novel View Generation with Cross Attention Guidance}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {3799-3809} }