DECCNet: Depth Enhanced Crowd Counting

Shuo-Diao Yang, Hung-Ting Su, Winston H. Hsu, Wen-Chin Chen; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0-0


Crowd counting which aims to calculate the number of total instances on an image is a classic but crucial task that supports many applications. Most of the prior works are based on the RGB channels on the images and achieve satisfied performance. However, previous approaches suffer from counting highly congested region due to the incomplete and blurry shapes. In this paper, we present an effective crowd counting method, Depth Enhanced Crowd Counting Network (DECCNet), which leverages the estimated depth information with our novel Bidirectional Cross-modal Attention (BCA) mechanism. Utilizing the depth information enables our model to explicitly learn to pay attention to those congested regions on the basis of the depth information. Our BCA mechanism interactively fuses two different input modalities by learning to focus on the informative parts according to each other. In our experiments, we demonstrate that DECCNet outperforms the state-of-the-art on the two largest crowd counting datasets available, including UCF-QNRF, which has the highest crowd density. The visualized result shows that our method can accurately regress dense regions through leveraging depth information. Ablation studies also indicate that each component of our method is beneficial to final prediction.

Related Material

author = {Yang, Shuo-Diao and Su, Hung-Ting and Hsu, Winston H. and Chen, Wen-Chin},
title = {DECCNet: Depth Enhanced Crowd Counting},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}