CA-Stream: Attention-based Pooling for Interpretable Image Recognition

Felipe Torres, Hanwei Zhang, Ronan Sicre, Stéphane Ayache, Yannis Avrithis; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 8206-8211

Abstract


Explanations obtained from transformer-based architectures in the form of raw attention can be seen as a class-agnostic saliency map. Additionally attention-based pooling serves as a form of masking the in feature space. Motivated by this observation we design an attention-based pooling mechanism intended to replace Global Average Pooling (GAP) at inference. This mechanism called Cross-Attention Stream (CA-Stream) comprises a stream of cross attention blocks interacting with features at different network depths. CA-Stream enhances interpretability properties in existing image recognition models while preserving their recognition performance.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Torres_2024_CVPR, author = {Torres, Felipe and Zhang, Hanwei and Sicre, Ronan and Ayache, St\'ephane and Avrithis, Yannis}, title = {CA-Stream: Attention-based Pooling for Interpretable Image Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {8206-8211} }