Human Detection and Segmentation via Multi-View Consensus

Isinsu Katircioglu, Helge Rhodin, Jörg Spörri, Mathieu Salzmann, Pascal Fua; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2855-2864

Abstract


Self-supervised detection and segmentation of foreground objects aims for accuracy without annotated training data. However, existing approaches predominantly rely on restrictive assumptions on appearance and motion. For scenes with dynamic activities and camera motion, we propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training via coarse 3D localization in a voxel grid and fine-grained offset regression. In this manner, we learn a joint distribution of proposals over multiple views. At inference time, our method operates on single RGB images. We outperform state-of-the-art techniques both on images that visually depart from those of standard benchmarks and on those of the classical Human3.6M dataset.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Katircioglu_2021_ICCV, author = {Katircioglu, Isinsu and Rhodin, Helge and Sp\"orri, J\"org and Salzmann, Mathieu and Fua, Pascal}, title = {Human Detection and Segmentation via Multi-View Consensus}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {2855-2864} }