MUSH: Multi-Scale Hierarchical Feature Extraction for Semantic Image Synthesis

Zicong Wang, Qiang Ren, Junli Wang, Chungang Yan, Changjun Jiang; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 4126-4142

Abstract


Semantic image synthesis aims to translate semantic label masks to photo-realistic images. Previous methods have limitations that extract semantic features with limited convolutional kernels and ignores some crucial information, such as relative positions of pixels. To address these issues, we propose MUSH, a novel semantic image synthesis model that utilizes multi-scale information. In the generative network stage, a multi-scale hierarchical architecture is proposed for feature extraction and merged successfully with guided sampling operation to enhance semantic image synthesis. Meanwhile, in the discriminative network stage, the model contains two different modules for feature extraction of semantic masks and real images, respectively, which helps use semantic masks information more effectively. Furthermore, our proposed model achieves the state-of-the-art qualitative evaluation and quantitative metrics on some challenging datasets. Experimental results show that our method can be generalized to various models for semantic image synthesis. Our code is available at https://github.com/WangZC525/MUSH.

Related Material


[pdf] [supp] [code]
[bibtex]
@InProceedings{Wang_2022_ACCV, author = {Wang, Zicong and Ren, Qiang and Wang, Junli and Yan, Chungang and Jiang, Changjun}, title = {MUSH: Multi-Scale Hierarchical Feature Extraction for Semantic Image Synthesis}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {4126-4142} }