Registration-Free Learnable Multi-View Capture of Faces in Dense Semantic Correspondence

Panagiotis P. Filntisis, George Retsinas, Radek Danecek, Vanessa Sklyarova, Petros Maragos, Timo Bolkart; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 14512-14523

Abstract


Recent frameworks like ToFu and TEMPEH provide an automated alternative to classical registration pipelines by predicting 3D meshes in dense semantic correspondence directly from calibrated multi-view images. However, these learning-based methods rely on the slow, manual registration pipelines they aim to replace for their training supervision. We overcome this limitation with MOCHI (Multi-view Optimizable Correspondence of Heads from Images), a multi-view 3D face prediction framework trained without requiring registered training data. MOCHI eliminates the registration data dependency by enforcing topological consistency through a pseudo-linear inverse kinematic solver. Semantic alignment is guided by dense keypoints from a 2D landmark predictor trained exclusively on synthetic data. Our analysis further reveals that standard point-to-surface distances induce training instabilities and visual artifacts in registration-free settings. We propose pointmap- and normal-based losses instead, which provide smoother gradients and superior reconstruction fidelity. Finally, we introduce a test-time optimization scheme that refines network weights over a few dozen iterations. This approach bridges the gap between feed-forward efficiency and iterative optimization precision, allowing MOCHI to outperform traditional labor-intensive pipelines in both reconstruction accuracy and visual quality. Code and model are public at: https://filby89.github.io/mochi.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Filntisis_2026_CVPR, author = {Filntisis, Panagiotis P. and Retsinas, George and Danecek, Radek and Sklyarova, Vanessa and Maragos, Petros and Bolkart, Timo}, title = {Registration-Free Learnable Multi-View Capture of Faces in Dense Semantic Correspondence}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {14512-14523} }