X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Lingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8775-8784

Abstract


We introduce X-Adapter a universal upgrader to enable the pretrained plug-and-play modules (e.g. ControlNet LoRA) to work directly with the upgraded text-to-image diffusion model (e.g. SDXL) without further retraining. We achieve this goal by training an additional network to control the frozen upgraded model with the new text-image data pairs. In detail X-Adapter keeps a frozen copy of the old model to preserve the connectors of different plugins. Additionally X-Adapter adds trainable mapping layers that bridge the decoders from models of different versions for feature remapping. The remapped features will be used as guidance for the upgraded model. To enhance the guidance ability of X-Adapter we employ a -text training strategy for the upgraded model. After training we also introduce a two-stage denoising strategy to align the initial latents of X-Adapter and the upgraded model. Thanks to our strategies X-Adapter demonstrates universal compatibility with various plugins and also enables plugins of different versions to work together thereby expanding the functionalities of diffusion community. To verify the effectiveness of the proposed method we conduct extensive experiments and the results show that X-Adapter may facilitate wider application in the upgraded foundational diffusion model. Project page at: https://showlab.github.io/X-Adapter.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Ran_2024_CVPR, author = {Ran, Lingmin and Cun, Xiaodong and Liu, Jia-Wei and Zhao, Rui and Zijie, Song and Wang, Xintao and Keppo, Jussi and Shou, Mike Zheng}, title = {X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {8775-8784} }