Video-Based Crowd Counting Using a Multi-Scale Optical Flow Pyramid Network

Mohammad Asiful Hossain, Kevin Cannons, Daesik Jang, Fabio Cuzzolin, Zhan Xu; Proceedings of the Asian Conference on Computer Vision (ACCV), 2020


This paper presents a novel approach to the task of video-based crowd counting, which can be formalized as the regression problem of learning a mapping from an input image to an output crowd density map. Convolutional neural networks (CNNs) have demonstrated striking accuracy gains in a range of computer vision tasks, including crowd counting. However, the dominant focus within the crowd counting literature has been on the single-frame case or applying CNNs to videos in a frame-by-frame fashion without leveraging motion information. This paper proposes a novel architecture that exploits the spatiotemporal information captured in a video stream by combining an optical flow pyramid with an appearance-based CNN. Extensive empirical evaluation on five public datasets comparing against numerous state-of-the-art approaches demonstrates the efficacy of the proposed architecture, with our methods reporting best results on all datasets. Finally, a set of transfer learning experiments shows that, once the proposed model is trained on one dataset, it can be transferred to another using a limited number of training examples and still exhibit high accuracy.

Related Material

[pdf] [supp]
@InProceedings{Hossain_2020_ACCV, author = {Hossain, Mohammad Asiful and Cannons, Kevin and Jang, Daesik and Cuzzolin, Fabio and Xu, Zhan}, title = {Video-Based Crowd Counting Using a Multi-Scale Optical Flow Pyramid Network}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {November}, year = {2020} }