Region Pooling with Adaptive Feature Fusion for End-to-End Person Recognition

Vijay Kumar, Anoop Namboodiri, C.V. Jawahar; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 2133-2142

Abstract


Current approaches for person recognition train an ensemble of region specific convolutional neural networks for representation learning, and then adopt naive fusion strategies to combine their features or predictions during testing. In this paper, we propose an unified end-to-end architecture that generates a complete person representation based on pooling and aggregation of features from multiple body regions. Our network takes a person image and the pre-determined locations of body regions as input, and generates common feature maps that are shared across all the regions. Multiple features corresponding to different regions are then pooled and combined with an aggregation block, where the adaptive weights required for aggregation are obtained through an attention mechanism. Evaluations on three person recognition datasets - PIPA, Soccer and Hannah show that a single model trained end-to-end is computationally faster, requires fewer parameters and achieves improved performance over separately trained models.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Kumar_2020_WACV,
author = {Kumar, Vijay and Namboodiri, Anoop and Jawahar, C.V.},
title = {Region Pooling with Adaptive Feature Fusion for End-to-End Person Recognition},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}