MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion

ICCV submission 111


Given sparse-view videos of dynamic scenes, our approach reconstructs 3D geometry and motion, enabling extreme novel view synthesis, 3D tracking, and feature distillation. Our sparse-view (4-camera) setup strikes a balance between ill-posed reconstructions from casual monocular captures and well-constrained reconstructions from dense multi-view studio captures.


4D Scene Reconstruction that Supports Free-View Synthesis


We show comprehensive results that cover all categories of egoexo4D! Even those complex, highly-occuluded scene. (We omit the training view for better visualization experience later)

Bike Repair

HealthCare - CPR

Music - Piano

Cooking - Scrumble Egg

Sports - Football

Panoptic - Baseball


EgoView Synthesis (Follow the Dance!)


With 4 camera set-up, we can enable egoview synthesis at anywhere in the scene, boosting possible embodied applications! [Left: gt provided by EgoExo4D, Right: our synthesised view] (Different colour is led by different camera sensors, foreground in our reconstruction has been removed)