-
[pdf]
[supp]
[bibtex]@InProceedings{Torres_2024_CVPR, author = {Torres, Felipe and Zhang, Hanwei and Sicre, Ronan and Ayache, St\'ephane and Avrithis, Yannis}, title = {CA-Stream: Attention-based Pooling for Interpretable Image Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {8206-8211} }
CA-Stream: Attention-based Pooling for Interpretable Image Recognition
Abstract
Explanations obtained from transformer-based architectures in the form of raw attention can be seen as a class-agnostic saliency map. Additionally attention-based pooling serves as a form of masking the in feature space. Motivated by this observation we design an attention-based pooling mechanism intended to replace Global Average Pooling (GAP) at inference. This mechanism called Cross-Attention Stream (CA-Stream) comprises a stream of cross attention blocks interacting with features at different network depths. CA-Stream enhances interpretability properties in existing image recognition models while preserving their recognition performance.
Related Material