PairDETR : Joint Detection and Association of Human Bodies and Faces

Ammar Ali, Georgii Gaikov, Denis Rybalchenko, Alexander Chigorin, Ivan Laptev, Sergey Zagoruyko; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 423-432

Abstract


Image and video analysis requires not only accurate object but also the understanding of relationships among detected objects. Common solutions to relation modeling typically resort to stand-alone object detectors followed by non-differentiable post-processing techniques. Recently introduced detection transformers (DETR) perform end-to-end object detection based on a bipartite matching loss. Such methods however lack the ability to jointly detect objects and resolve object associations. In this paper we build on the DETR approach and extend it to the joint detection of objects and their relationships by introducing an approximated bipartite matching. While our method can generalize to an arbitrary number of objects we here focus on the modeling of object pairs and their relations. In particular we apply our method PairDETR to the problem of detecting human bodies and faces and associating them for the same person. Our approach not only eliminates the need for hand-designed post-processing but also achieves excellent results for body-face associations. We evaluate PairDETR on the challenging CrowdHuman and CityPersons datasets and demonstrate a large improvement over the state of the art. Our training code and pre-trained models are available online.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Ali_2024_CVPR, author = {Ali, Ammar and Gaikov, Georgii and Rybalchenko, Denis and Chigorin, Alexander and Laptev, Ivan and Zagoruyko, Sergey}, title = {PairDETR : Joint Detection and Association of Human Bodies and Faces}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {423-432} }