Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization

Chenchen Liu, Xinyu Weng, Yadong Mu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1217-1226


Crowd counting is a new frontier in computer vision with far-reaching applications particularly in social safety management. A majority of existing works adopt a methodology that first estimates a person-density map and then calculates integral over this map to obtain the final count. As noticed by several prior investigations, the learned density map can significantly deviate from the true person density even though the final reported count is precise. This implies that the density map is unreliable for localizing crowd. To address this issue, this work proposes a novel framework that simultaneously solving two inherently related tasks - crowd counting and localization. The contributions are several-fold. First, our formulation is based on a crucial observation that localization tends to be inaccurate at high-density regions, and increasing the resolution is an effective albeit simple solution for improving localization. We thus propose Recurrent Attentive Zooming Network, which recurrently detects ambiguous image region and zooms it into high resolution for re-inspection. Second, the two tasks of counting and localization mutually reinforce each other. We propose an adaptive fusion scheme that effectively elevates the performance. Finally, a well-defined evaluation metric is proposed for the rarely-explored localization task. We conduct comprehensive evaluations on several crowd benchmarks, including the newly-developed large-scale UCF-QNRF dataset and demonstrate superior advantages over state-of-the-art methods.

Related Material

author = {Liu, Chenchen and Weng, Xinyu and Mu, Yadong},
title = {Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}