AFF-CAM: Adaptive Frequency Filtering based Channel Attention Module
Locality from bounded receptive fields is one of the biggest problems that needs to be solved in convolutional neural networks. Meanwhile, operating convolutions in frequency domain provides complementary viewpoint to this dilemma, as a point-wise update in frequency domain can globally modulate all input features involved in Discrete Cosine Transform. However, Discrete Cosine Transform concentrates majority of its information in a handful of coefficients in lower regions of frequency spectrum, often discarding other potentially useful frequency components, such as those of middle and high frequency spectrum. We believe valuable feature representations can be learned not only from lower frequency components, but also from such disregarded frequency distributions. In this paper, we propose a novel Adaptive Frequency Filtering based Channel Attention Module (AFF-CAM), which exploits non-local characteristics of frequency domain and also adaptively learns the importance of different bands of frequency spectrum by modeling global cross-channel interactions, where each channel serves as a distinct frequency distribution. As a result, AFF-CAM is able to re-calibrate channel-wise feature responses and guide feature representations from spatial domain to reason over high-level, global context, which simply cannot be obtained from local kernels in spatial convolutions. Extensive experiments are conducted on ImageNet-1K classification and MS COCO detection benchmarks to validate our AFF-CAM. By effectively aggregating global information of various frequency spectrum from frequency domain with local information from spatial domain, our method achieves state-of-the-art results compared to other attention mechanisms.