PISE: Person Image Synthesis and Editing With Decoupled GAN

Jinsong Zhang, Kun Li, Yu-Kun Lai, Jingyu Yang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7982-7990

Abstract


Person image synthesis, e.g., pose transfer, is a challenging problem due to large variation and occlusion. Existing methods have difficulties predicting reasonable invisible regions and fail to decouple the shape and style of clothing, which limits their applications on person image editing. In this paper, we propose PISE, a novel two-stage generative model for person image synthesis and editing, which can generate realistic person images with desired poses, textures, and semantic layouts. To better predict the invisible region, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing by a parsing generator, and then generate the final image by an image generator. To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization to predict the reasonable style of clothing for invisible regions. We also propose spatial-aware normalization to retain the spatial context relationship in the source image. The results of qualitative and quantitative experiments demonstrate the superiority of our model. Besides, the results of texture transfer and parsing editing show that our model can be applied to person image editing.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhang_2021_CVPR, author = {Zhang, Jinsong and Li, Kun and Lai, Yu-Kun and Yang, Jingyu}, title = {PISE: Person Image Synthesis and Editing With Decoupled GAN}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {7982-7990} }