Capturing Closely Interacted Two-Person Motions with Reaction Priors

Fang, Qi; Fan, Yinghui; Li, Yanjun; Dong, Junting; Wu, Dingwei; Zhang, Weidong; Chen, Kang

Qi Fang, Yinghui Fan, Yanjun Li, Junting Dong, Dingwei Wu, Weidong Zhang, Kang Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 655-665

Abstract

In this paper we focus on capturing closely interacted two-person motions from monocular videos an important yet understudied topic. Unlike less-interacted motions closely interacted motions contain frequently occurring inter-human occlusions which pose significant challenges to existing capturing algorithms. To address this problem our key observation is that close physical interactions between two subjects typically happen under very specific situations (e.g. handshake hug etc.) and such situational contexts contain strong prior semantics to help infer the poses of occluded joints. In this spirit we introduce reaction priors which are invertible neural networks that bi-directionally model the pose probability distributions of one person given the pose of the other. The learned reaction priors are then incorporated into a query-based pose estimator which is a decoder-only Transformer with self-attentions on both intra-joint and inter-joint relationships. We demonstrate that our design achieves considerably higher performance than previous methods on multiple benchmarks. What's more as existing datasets lack sufficient cases of close human-human interactions we also build a new dataset called Dual-Human to better evaluate different methods. Dual-Human contains around 2k sequences of closely interacted two-person motions each with synthetic multi-view renderings contact annotations and text descriptions. We believe that this new public dataset can significantly promote further research in this area.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Fang_2024_CVPR, author = {Fang, Qi and Fan, Yinghui and Li, Yanjun and Dong, Junting and Wu, Dingwei and Zhang, Weidong and Chen, Kang}, title = {Capturing Closely Interacted Two-Person Motions with Reaction Priors}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {655-665} }