Visual Prompt Tuning for Generative Transfer Learning

Kihyuk Sohn, Huiwen Chang, José Lezama, Luisa Polania, Han Zhang, Yuan Hao, Irfan Essa, Lu Jiang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 19840-19851

Abstract


Learning generative image models from various domains efficiently needs transferring knowledge from an image synthesis model trained on a large dataset. We present a recipe for learning vision transformers by generative knowledge transfer. We base our framework on generative vision transformers representing an image as a sequence of visual tokens with the autoregressive or non-autoregressive transformers. To adapt to a new domain, we employ prompt tuning, which prepends learnable tokens called prompts to the image token sequence and introduces a new prompt design for our task. We study on a variety of visual domains with varying amounts of training images. We show the effectiveness of knowledge transfer and a significantly better image generation quality. Code is available at https://github.com/google-research/generative_transfer.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Sohn_2023_CVPR, author = {Sohn, Kihyuk and Chang, Huiwen and Lezama, Jos\'e and Polania, Luisa and Zhang, Han and Hao, Yuan and Essa, Irfan and Jiang, Lu}, title = {Visual Prompt Tuning for Generative Transfer Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {19840-19851} }