StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models

Lezhong Wang, Jeppe Revall Frisvad, Mark Bo Jensen, Siavash Arjomand Bigdeli; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7416-7425

Abstract


The demand for stereo images increases as manufacturers launch more extended reality (XR) devices. To meet this demand we introduce StereoDiffusion a method that unlike traditional inpainting pipelines is training-free and straightforward to use with seamless integration into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end lightweight method for fast generation of stereo image pairs without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it we generate the latent vector for the right image through Stereo Pixel Shift operations complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layer Modifications to align the right-side image with the left-side image. Moreover our proposed method maintains a high standard of image quality throughout the stereo generation process achieving state-of-the-art scores in various quantitative evaluations.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2024_CVPR, author = {Wang, Lezhong and Frisvad, Jeppe Revall and Jensen, Mark Bo and Bigdeli, Siavash Arjomand}, title = {StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7416-7425} }