Wavelet-Based Mechanistic Interpretability of Vision Transformers via Frequency-Aware Ablations

Abraham, Sophia J.; Hauenstein, Jonathan D.; Scheirer, Walter J.

Sophia J. Abraham, Jonathan D. Hauenstein, Walter J. Scheirer; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 4869-4873

Abstract

We explore a wavelet-based interpretability framework for Vision Transformers (ViT), aiming to analyze their reliance on frequency-specific representations. Through systematic ablations of wavelet subbands, we assess how different frequency components contribute to latent representations and attention mechanisms. Our empirical study on CIFAR-10 reveals that high-frequency details, particularly those captured by Haar wavelets, may influence reconstruction fidelity and attention distributions. While preliminary findings suggest a frequency-dependent behavior in ViT representations, further investigation is needed to generalize across datasets and architectures. This study highlights the potential of frequency-based interpretability but also underscores the need for more rigorous evaluation in larger, more diverse settings. To encourage further exploration, all the experimentation and method code can be found on our GitHub repository.

Related Material

[pdf]

[bibtex]

@InProceedings{Abraham_2025_CVPR, author = {Abraham, Sophia J. and Hauenstein, Jonathan D. and Scheirer, Walter J.}, title = {Wavelet-Based Mechanistic Interpretability of Vision Transformers via Frequency-Aware Ablations}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {4869-4873} }