StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2085-2094


Inspired by the ability of StyleGAN to generate highly re-alistic images in a variety of domains, much recent work hasfocused on understanding how to use the latent spaces ofStyleGAN to manipulate generated and real images. How-ever, discovering semantically meaningful latent manipula-tions typically involves painstaking human examination ofthe many degrees of freedom, or an annotated collectionof images for each desired manipulation. In this work, weexplore leveraging the power of recently introduced Con-trastive Language-Image Pre-training (CLIP) models in or-der to develop a text-based interface for StyleGAN imagemanipulation that does not require such manual effort. Wefirst introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to auser-provided text prompt. Next, we describe a latent map-per that infers a text-guided latent manipulation step fora given input image, allowing faster and more stable text-based manipulation. Finally, we present a method for map-ping a text prompts to input-agnostic directions in Style-GAN's style space, enabling interactive text-driven imagemanipulation. Extensive results and comparisons demon-strate the effectiveness of our approaches.

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Patashnik_2021_ICCV, author = {Patashnik, Or and Wu, Zongze and Shechtman, Eli and Cohen-Or, Daniel and Lischinski, Dani}, title = {StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {2085-2094} }