-
[pdf]
[supp]
[bibtex]@InProceedings{Kang_2025_WACV, author = {Kang, Hankyul and Ryu, Jongbin}, title = {Enriching Local Patterns with Multi-Token Attention for Broad-Sight Neural Networks}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8259-8268} }
Enriching Local Patterns with Multi-Token Attention for Broad-Sight Neural Networks
Abstract
In neural networks recognizing visual patterns is challenging because global average pooling disregards local patterns and solely relies on over-concentrated activation. Global average pooling enforces the network to learn objects regardless of their location so features tend to be activated only in specific regions. To support this claim we provide a novel analysis of the problems that over-concentration brings about in networks with extensive experiments. We analyze the over-concentration through problems arising from feature variance and dead neurons that are not activated. Based on our analysis we introduce a multi-token attention pooling layer to alleviate the over-concentration problem. Our attention-pooling layer captures broad-sight local patterns by learning multiple tokens with the proposed distillation algorithm. It resolves the high bias and high variance errors of learned multi-tokens which is crucial when aggregating local patterns with multi-tokens. Our method applies to various vision tasks and network architectures such as CNN ViT and MLP-Mixer. The proposed method improves baselines with few extra resources and a network employing our pooling method works favorably against state-of-the-art networks. We open-source the code at https://github.com/Lab-LVM/imagenet-models.
Related Material