-
[pdf]
[supp]
[bibtex]@InProceedings{Ruan_2021_CVPR, author = {Ruan, Dongsheng and Wang, Daiyin and Zheng, Yuan and Zheng, Nenggan and Zheng, Min}, title = {Gaussian Context Transformer}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {15129-15138} }
Gaussian Context Transformer
Abstract
Recently, a large number of channel attention blocks are proposed to boost the representational power of deep convolutional neural networks (CNNs). These approaches commonly learn the relationship between global contexts and attention activations by using fully-connected layers or linear transformations. However, we empirically find that though many parameters are introduced, these attention blocks may not learn the relationship well. In this paper, we hypothesize that the relationship is predetermined. Based on this hypothesis, we propose a simple yet extremely efficient channel attention block, called Gaussian Context Transformer (GCT), which achieves contextual feature excitation using a Gaussian function that satisfies the presupposed relationship. According to whether the standard deviation of the Gaussian function is learnable, we develop two versions of GCT: GCT-B0 and GCT-B1. GCT-B0 is a parameter-free channel attention block by fixing the standard deviation. It directly maps global contexts to attention activations without learning. In contrast, GCT-B1 is a parameterized channel attention block, which adaptively learns the standard deviation to enhance the mapping ability. Extensive experiments on ImageNet and MS COCO benchmarks demonstrate that our GCTs lead to consistent improvements across various deep CNNs and detectors. Compared with a bank of state-of-the-art channel attention blocks, such as SE and ECA , our GCTs are superior in effectiveness and efficiency.
Related Material