-
[pdf]
[supp]
[bibtex]@InProceedings{Han_2023_ICCV, author = {Han, Xiao and Zhu, Xiatian and Deng, Jiankang and Song, Yi-Zhe and Xiang, Tao}, title = {Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {22768-22777} }
Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion
Abstract
Controllable person image synthesis aims at rendering a source image based on user-specified changes in body pose or appearance. Prior art approaches leverage pixel-level denoising diffusion models conditioned on the coarse skeleton via cross-attention. This leads to two limitations: low efficiency and inaccurate condition information. To address both issues, a novel Pose-Constrained Latent Diffusion model (PoCoLD) is introduced. Rather than using the skeleton as a sparse pose representation, we exploit DensePose which offers much richer body structure information. To effectively capitalize DensePose at a low cost, we propose an efficient pose-constrained attention module that is capable of modeling the complex interplay between appearance and pose. Extensive experiments show that our PoCoLD outperforms the state-of-the-art competitors in image synthesis fidelity. Critically, it runs 2x faster and consumes 3.6x smaller memory than the latest diffusion-model-based alternative during inference.
Related Material