Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery

Fengyuan Yang, Kerui Gu, Ha Linh Nguyen, Tze Ho Elden Tse, Angela Yao; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 6069-6079

Abstract


Accurate camera motion estimation is essential for recovering global human motion in world coordinates from RGB video inputs. While SLAM is widely used for estimating camera trajectory and point cloud, monocular SLAM does so only up to an unknown scale factor. Previous works estimate the scale factor through optimization, but this is unreliable and time-consuming. This paper presents an optimization-free scale calibration framework, Human as Checkerboard (HAC). HAC explicitly leverages the human body predicted by human mesh recovery model as a calibration reference. Specifically, it innovatively uses the absolute depth of human-scene contact joints as references to calibrate the corresponding relative scene depth from SLAM. HAC benefits from geometric priors encoded in human mesh recovery models to estimate the SLAM scale and achieves precise global human motion estimation. Simple yet powerful, our method sets a new state-of-the-art performance for global human mesh estimation tasks. It reduces motion errors by 50% over prior local-to-global methods while using 100x less post-SLAM inference time than optimization-based methods. Our code is available at https://martayang.github.io/HAC/.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yang_2025_ICCV, author = {Yang, Fengyuan and Gu, Kerui and Nguyen, Ha Linh and Tse, Tze Ho Elden and Yao, Angela}, title = {Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {6069-6079} }