Stacked Homography Transformations for Multi-View Pedestrian Detection

Liangchen Song, Jialian Wu, Ming Yang, Qian Zhang, Yuan Li, Junsong Yuan; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 6049-6057

Abstract


Multi-view pedestrian detection aims to predict a bird's eye view (BEV) occupancy map from multiple camera views. This task is confronted with two challenges: how to establish the 3D correspondences from views to the BEV map and how to assemble occupancy information across views. In this paper, we propose a novel Stacked HOmography Transformations (SHOT) approach, which is motivated by approximating projections in 3D world coordinates via a stack of homographies. We first construct a stack of transformations for projecting views to the ground plane at different height levels. Then we design a soft selection module so that the network learns to predict the likelihood of the stack of transformations. Moreover, we provide an in-depth theoretical analysis on constructing SHOT and how well SHOT approximates projections in 3D world coordinates. SHOT is empirically verified to be capable of estimating accurate correspondences from individual views to the BEV map, leading to new state-of-the-art performance on standard evaluation benchmarks.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Song_2021_ICCV, author = {Song, Liangchen and Wu, Jialian and Yang, Ming and Zhang, Qian and Li, Yuan and Yuan, Junsong}, title = {Stacked Homography Transformations for Multi-View Pedestrian Detection}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {6049-6057} }