STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes

Peishan Cong, Xinge Zhu, Feng Qiao, Yiming Ren, Xidong Peng, Yuenan Hou, Lan Xu, Ruigang Yang, Dinesh Manocha, Yuexin Ma; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 19608-19617


Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales. The situation becomes even worse for dense crowds with severe occlusions. However, existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution, making it difficult to build a reliable pedestrian perception system especially in crowded scenes. To better evaluate pedestrian perception algorithms in crowded scenarios, we introduce a large-scale multimodal dataset, STCrowd. Specifically, in STCrowd, there are a total of 219K pedestrian instances and 20 persons per frame on average, with various levels of occlusion. We provide synchronized LiDAR point clouds and camera images as well as their corresponding 3D labels and joint IDs. STCrowd can be used for various tasks, including LiDAR-only, image-only, and sensor-fusion based pedestrian detection and tracking. We provide baselines for most of the tasks. In addition, considering the property of sparse global distribution and density-varying local distribution of pedestrians, we further propose a novel method, Density-aware Hierarchical heatmap Aggregation (DHA), to enhance pedestrian perception in crowded scenes. Extensive experiments show that our new method achieves state-of-the-art performance on the STCrowd dataset, especially on cases with severe occlusion. The dataset and code will be released to facilitate related research when the paper is published.

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Cong_2022_CVPR, author = {Cong, Peishan and Zhu, Xinge and Qiao, Feng and Ren, Yiming and Peng, Xidong and Hou, Yuenan and Xu, Lan and Yang, Ruigang and Manocha, Dinesh and Ma, Yuexin}, title = {STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {19608-19617} }