- [pdf] [supp]
Text2Concept: Concept Activation Vectors Directly From Text
Concept activation vectors (CAVs) enable interpretability of a model with respect to human concepts, though CAV generation requires the costly step of curating positive and negative examples for each concept one wishes to encode. To alleviate this bottleneck, we present Text2Concept, an efficient method for obtaining CAVs directly from text. Text2Concept extends the multi-modal accessibility of a CLIP model's feature space to that of an arbitrary off-the-shelf vision model, with only the small extra step of training linear layers on existing data to map the feature spaces to one another. We validate our method qualitatively, by sorting images by similarity to embedded concepts, and quantitatively, by showing surprisingly strong zero-shot classification (enabled via Text2Concept) performance for off-the-shelf vision encoders. Finally, we demonstrate two new interpretability applications of Text2Concept CAVs: building concept bottleneck models with no concept supervision, and diagnosing distribution shifts in terms of human concepts.