Full-scale Selective Transformer for Semantic Segmentation

Fangjian Lin, Sitong Wu, Yizhe Ma, Shengwei Tian; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 2663-2679


In this paper, we rethink the multi-scale feature fusion from two perspectives (scale-level and spatial-level) and propose a full-scale selective fusion strategy for semantic segmentation. Based on such strategy, we design a novel segmentation network, named Full-scale Selective Transformer (FSFormer). Specifically, our FSFormer adaptively selects partial tokens from all tokens at all scales to construct a token subset of interest for each scale. Therefore, each token only interacts with the tokens within its corresponding token subset of interest. The proposed full-scale selective fusion strategy can not only filter out the noisy information propagation but also reduce the computational costs to some extent. We evaluate our FSFormer on four challenging semantic segmentation benchmarks, including PASCAL Context, ADE20K, COCO-Stuff 10K, and Cityscapes, outperforming the state-of-the-art methods.

Related Material

[pdf] [supp]
@InProceedings{Lin_2022_ACCV, author = {Lin, Fangjian and Wu, Sitong and Ma, Yizhe and Tian, Shengwei}, title = {Full-scale Selective Transformer for Semantic Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {2663-2679} }