Parallel Sequence Modeling via Generalized Spatial Propagation Network

Wang, Hongjun; Byeon, Wonmin; Xu, Jiarui; Gu, Jinwei; Cheung, Ka Chun; Wang, Xiaolong; Han, Kai; Kautz, Jan; Liu, Sifei

Hongjun Wang, Wonmin Byeon, Jiarui Xu, Jinwei Gu, Ka Chun Cheung, Xiaolong Wang, Kai Han, Jan Kautz, Sifei Liu; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 4473-4483

Abstract

We present the Generalized Spatial Propagation Network (GSPN), a new attention mechanism optimized for vision tasks that inherently captures 2D spatial structures. Existing attention models, including transformers, linear attention, and state-space models like Mamba, process multi-dimensional data as 1D sequences, compromising spatial coherence and efficiency. GSPN overcomes these limitations by directly operating on spatially coherent image data and forming dense pairwise connections through a unique line-scan approach. Central to GSPN is the Stability-Context Condition, which ensures stable, context-aware propagation across 2D sequences and reduces the effective sequence length to \sqrt N , significantly enhancing computational efficiency. With learnable, input-dependent weights and no reliance on positional embeddings, GSPN achieves superior spatial fidelity and state-of-the-art performance in vision tasks, including ImageNet classification, class-guided image generation, and text-to-image generation. Notably, GSPN accelerates SD-XL with softmax-attention by over 84xwhen generating 16K images.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Wang_2025_CVPR, author = {Wang, Hongjun and Byeon, Wonmin and Xu, Jiarui and Gu, Jinwei and Cheung, Ka Chun and Wang, Xiaolong and Han, Kai and Kautz, Jan and Liu, Sifei}, title = {Parallel Sequence Modeling via Generalized Spatial Propagation Network}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {4473-4483} }