UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

Soon Yau Cheong, Armin Mustafa, Andrew Gilbert; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 4173-4182

Abstract


Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due to the random nature of the generation process, the person has a different appearance e.g. pose, face, and clothing, despite using the same text prompt. The appearance inconsistency makes T2I unsuitable for pose transfer. We address this by proposing a multimodal diffusion model that accepts text, pose, and visual prompting. Our model is the first unified method to perform all person image tasks - generation, pose transfer, and mask-less edit. We also pioneer using small dimensional 3D body model parameters directly to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining the person's appearance.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Cheong_2023_ICCV, author = {Cheong, Soon Yau and Mustafa, Armin and Gilbert, Andrew}, title = {UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {4173-4182} }