camera comparison. GT: —, Estimate: —
Input Video
Trajectory comparison. GT: —, Estimate: —
Input Video
camera comparison. GT: —, Estimate: —
Input Video
camera comparison. GT: —, Estimate: —
Input Video
camera comparison. GT: —, Estimate: —
Input Video
camera comparison. GT: —, Estimate: —
Input Video
Challenge Categorization
Each clip manifests a range of challenges in ORBIT. We provide sub-category evaluations on 5 of the challenges, namely Low Texture, Low light, Presence of Crowd --Independent of camera moving Objects--, Presence of Parallel to camera moving Objects --PO--, and presence of Fluids. Please note that sub categories have overlap and checkout the supplementary.pdf file for more details.
We report the ATE and RPE-R for each method on each subcategory. Based on the results, the most challenging category is the presence of an object moving alongside the camera, affecting MegaSaM and COLMAP significantly. MonST3R and ORB-SLAM2 on the other hand struggle most when faced with low texture scenes. The extent of VGGT-Long's struggle is narrower compared to other methods; it struggles most with low texture and the presence of moving objects either independent or parallel to camera. We also observe that using RoMo masking for MegaSaM usually improves the rotational estimate and ATE of MegaSaM significantly on parallel object and low light challenge while worsening the results on low texture scenes, which is in line with out expectations out of a motion masking method. Overall, the following table shows that ORBIT exposes a diverse set of challenges and is a valuable tool for analyzing the current state-of-the-art by highlighting their failure modes.