Self-Attention Agreement Among Capsules

Rita Pucci, Christian Micheloni, Niki Martinel; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 272-280

Abstract


At the state of the art, Capsule Networks (CapsNets) have shown to be a promising alternative to Convolutional Neural Networks (CNNs) in many computer vision tasks, due to their ability to encode object viewpoint variations. Network capsules provide maps of votes that focus on entities presence in the image and their pose. Each map is the point of view of a given capsule. To compute such votes, CapsNets rely on the routing-by-agreement mechanism. This computationally costly iterative algorithm selects the most appropriate parent capsule to have nodes in a parse tree for all the active capsules but this behaviour is not ensured by the routing, hence it possibly causes vanishing weights during training. We hypothesise that an attention-like mechanism will help capsules to select the predominant regions among the maps to focus on, hence introducing a more reliable way of learning the agreement between the capsules in a single pass. We propose the Attention Agreement Capsule Networks (AA-Caps) architecture that builds upon CapsNet by introducing a self-attention layer to suppress irrelevant capsule votes thus keeping only the ones that are useful for capsules agreements on a specific entity. The generated capsule attention map is then assigned to classification layer responsible of emitting the predicted image class. The proposed AA-Caps model has been evaluated on five benchmark datasets to validate its ability in dealing with the diverse and complex data that CapsNet often fails with. The achieved results demonstrate that AA-Caps outperforms existing methods without the need of more complex architectures or model ensembles.

Related Material


[pdf]
[bibtex]
@InProceedings{Pucci_2021_ICCV, author = {Pucci, Rita and Micheloni, Christian and Martinel, Niki}, title = {Self-Attention Agreement Among Capsules}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {272-280} }