Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

Song, Lin; Chen, Yukang; Yang, Shuai; Ding, Xiaohan; Ge, Yixiao; Chen, Ying-Cong; Shan, Ying

Lin Song, Yukang Chen, Shuai Yang, Xiaohan Ding, Yixiao Ge, Ying-Cong Chen, Ying Shan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 13763-13773

Abstract

This paper focuses on the high computational complexity in Large Language Models (LLMs) a significant challenge in both natural language processing (NLP) and multi-modal tasks. We propose Low-Rank Approximation for Sparse At- tention (LoRA-Sparse) an innovative approach that strate- gically reduces this complexity. LoRA-Sparse introduces low-rank linear projection layers for sparse attention ap- proximation. It utilizes an order-mimic training methodol- ogy which is crucial for efficiently approximating the self- attention mechanism in LLMs. We empirically show that sparse attention not only reduces computational demands but also enhances model performance in both NLP and multi-modal tasks. This surprisingly shows that redundant attention in LLMs might be non-beneficial. We extensively validate LoRA-Sparse through rigorous empirical studies in both (NLP) and multi-modal tasks demonstrating its effec- tiveness and general applicability. Based on LLaMA and LLaVA models our methods can reduce more than half of the self-attention computation with even better performance than full-attention baselines.

Related Material

[pdf]

[bibtex]

@InProceedings{Song_2024_CVPR, author = {Song, Lin and Chen, Yukang and Yang, Shuai and Ding, Xiaohan and Ge, Yixiao and Chen, Ying-Cong and Shan, Ying}, title = {Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {13763-13773} }