TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking

Reddy, N Dinesh; Guigues, Laurent; Pishchulin, Leonid; Eledath, Jayan; Narasimhan, Srinivasa G.

N Dinesh Reddy, Laurent Guigues, Leonid Pishchulin, Jayan Eledath, Srinivasa G. Narasimhan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15190-15200

Abstract

We consider the task of 3D pose estimation and trackingof multiple people seen in an arbitrary number of camerafeeds. We propose TesseTrack, a novel top-down approachthat simultaneously reasons about multiple individuals' 3Dbody joint reconstructions and associations in space andtime in a single end-to-end learnable framework. At the core of our approach is a novel spatio-temporal formulation that operates in a common voxelized feature space aggregated from single- or multiple-camera views. After a person detection step, a 4D CNN produces short-term person-specific representations which are then linked across time by a differentiable matcher. The linked descriptions are then merged and deconvolved into 3D poses. This joint spatio-temporal formulation contrasts with previous piece-wise strategies that treat 2D pose estimation, 2D-to-3D lifting, and 3D pose tracking as independent sub-problems that are error-prone when solved in isolation. Furthermore, unlike previous methods, TesseTrack is robust to changes in the number of camera views and achieves very good results even if a single view is available at inference time. Quantitative evaluation of 3D pose reconstruction accuracy on standard benchmarks shows significant improvements over the state of the art. Evaluation of multi-person articulated 3D pose tracking in our novel evaluation framework demonstrates the superiority of TesseTrack over strong baselines.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Reddy_2021_CVPR, author = {Reddy, N Dinesh and Guigues, Laurent and Pishchulin, Leonid and Eledath, Jayan and Narasimhan, Srinivasa G.}, title = {TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {15190-15200} }