Harnessing the Conditioning Sensorium for Improved Image Translation

Cooper Nederhood, Nicholas Kolkin, Deqing Fu, Jason Salavon; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 6752-6761

Abstract


Existing methods for multi-modal domain translation learn to embed the input images into a domain-invariant "content" space and a domain-specific "style" space from which novel images can be synthesized. Rather than learning to embed the RGB image from scratch we propose deriving our content representation from conditioning data produced by pretrained off-the-shelf networks. Motivated by the inherent ambiguity of "content", which has different meanings depending on the desired level of abstraction, this approach gives intuitive control over which aspects of content are preserved across domains. We evaluate our method on traditional, well-aligned, datasets such as CelebA-HQ, and propose two novel datasets for evaluation on more complex scenes: ClassicTV and FFHQ-WildCrops. Our approach, which we call Sensorium, enables higher quality domain translation for complex scenes than prior work.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Nederhood_2021_ICCV, author = {Nederhood, Cooper and Kolkin, Nicholas and Fu, Deqing and Salavon, Jason}, title = {Harnessing the Conditioning Sensorium for Improved Image Translation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {6752-6761} }