A 0-Shot Self-Attention Mechanism for Accelerated Diagonal Attention

Mario Viti, Nadiya Shvai, Arcadi Llanza, Amir Nakib; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 7308-7315

Abstract


The ability of Transformers to process longer sequences has led to unprecedented levels of generalization in visual tasks. However the complexity of Transformers is dominated by the quadratic cost associated with the computation of the attention blocks posing a bottleneck that impedes the scaling of sequence length and the realization of more advanced AI solutions. We propose and explore the hypothesis that the self-attention mechanism exhibits regularities that can be exploited to enhance performance and achieve linear-cost attention without significant loss of effectiveness. Specifically we investigate the attention matrix of Visual Transformers to identify and leverage these regularities in order to simplify the computation process. The resulting procedure significantly reduces the computational cost of Transformers by directly reducing attention block complexity. Moreover the designed procedure is 0-shot self-supervised thus it requires no retraining additional data or parameters as all Transformer parameters remain unchanged. Consequently the proposed method can be seamlessly applied to pre-trained Visual Transformers without the need for retraining. Experiments conducted on a series of Vision Transformers pre-trainedon ImageNet-1K dataset demonstrate the effectiveness of our proposed approach.

Related Material


[pdf]
[bibtex]
@InProceedings{Viti_2025_WACV, author = {Viti, Mario and Shvai, Nadiya and Llanza, Arcadi and Nakib, Amir}, title = {A 0-Shot Self-Attention Mechanism for Accelerated Diagonal Attention}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7308-7315} }