Readout Guidance: Learning Control from Diffusion Features

Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, Aleksander Holynski; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8217-8227

Abstract


We present Readout Guidance a method for controlling text-to-image diffusion models with learned signals. Readout Guidance uses readout heads lightweight networks trained to extract signals from the features of a pre-trained frozen diffusion model at every timestep. These readouts can encode single-image properties such as pose depth and edges; or higher-order properties that relate multiple images such as correspondence and appearance similarity. Furthermore by comparing the readout estimates to a user-defined target and back-propagating the gradient through the readout head these estimates can be used to guide the sampling process. Compared to prior methods for conditional generation Readout Guidance requires significantly fewer added parameters and training samples and offers a convenient and simple recipe for reproducing different forms of conditional control under a single framework with a single architecture and sampling procedure. We showcase these benefits in the applications of drag-based manipulation identity-consistent generation and spatially aligned control.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Luo_2024_CVPR, author = {Luo, Grace and Darrell, Trevor and Wang, Oliver and Goldman, Dan B and Holynski, Aleksander}, title = {Readout Guidance: Learning Control from Diffusion Features}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {8217-8227} }