Short-term 3D Human Mesh Recovery with Virtual Markers Disentanglement

Xiyuan Kang, Yi Yuan, Xu Dong, Muhammad Awais, Lilian Tang, Josef Kittler, Zhenhua Feng; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, 2025, pp. 3901-3911

Abstract


Human mesh recovery is a fundamental and challenging task in computer vision. Existing image-based methods suffer from depth ambiguity due to the absence of explicit 3D contextual information. Conversely, video-based methods leverage multi-view input and temporal consistency to improve stability but struggle with capturing fine-grained spatial details and have high computational costs. To effectively combine the spatial precision of image-based techniques with the temporal robustness of video-based approaches, we propose a temporal Transformer framework augmented with the state-of-the-art image-based reconstruction model, Virtual Markers. Specifically, we introduce a novel disentanglement module designed to explicitly separate Virtual Markers into distinct pose and shape representations. Leveraging short-term temporal context, the proposed module enhances the consistency of body shape and pose coherence across frames, ensuring both spatial accuracy and computational efficiency. Experimental results demonstrate that the proposed method significantly enhances the performance and interpretability of virtual markers. Our model achieves state-of-the-art results on two widely used benchmarking datasets, outperforming previous image-based approaches across different evaluation metrics.

Related Material


[pdf]
[bibtex]
@InProceedings{Kang_2025_CVPR, author = {Kang, Xiyuan and Yuan, Yi and Dong, Xu and Awais, Muhammad and Tang, Lilian and Kittler, Josef and Feng, Zhenhua}, title = {Short-term 3D Human Mesh Recovery with Virtual Markers Disentanglement}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {3901-3911} }