In this folder, we provide video results of our method compared to COLMAP++ on examples from: (a) our StudioSFM data, and (b) the public LVU[1] data. Specifically,

a. StudioSfM_examples.mp4 - Visualization of 3-D reconstruction of four examples in StudioSfM dataset.
b. LVU_examples.mp4 - Visualization of 3-D reconstruction of four examples in LVU dataset[1].

All reconstruction results are visualized using the built-in image capture functionality of COLMAP[2].

As empirically demonstrated in our main paper that COLMAP++ performs better than other considered previous methods, here we focus on how our approach provides better results than COLMAP++. To help the viewer understand our video-results better, in the following we provide brief description of what to focus on in each example.

In the file StudioSfM_examples.mp4:

Example 01: COLMAP++ gives wrong camera pose as well as incorrect point cloud (depth of almost all points is essentially collapsed to one plane). Our method is able to infer both camera pose and point cloud much more accurately.

Example 02: COLMAP++ gives noisy point cloud as well as non-smooth camera motion. Our method estimates point cloud with much less noise where the structure of the room can be seen more clearly. Furthermore, our method provides a smoother camera motion.

Example 03: COLMAP++ has the same type of error as Example 01 while our approach estimates more accurate camera motion and point cloud. 

Example 04: COLMAP++ has the same type of error as Example 02 while our approach estimates more accurate camera motion and point cloud. 

In the file LVU_examples.mp4:

Example 01: COLMAP++ provides wrong camera motion while our approach estimates more accurate camera motion.

Example 02: COLMAP++ provides wrong point cloud where the points which are closer to camera are incorrectly estimated to be farther away and vice versa. In contrast, our method provides point cloud with accurate relative depth. 

Example 03: COLMAP++ provides point cloud with significantly fewer points than our method. This is because unlike COLMAP++, our method generates an initial point cloud with larger scene coverage during the initialization step, which leads to the final point cloud with more points, which in turn results in more accurate camera motion.

Example 04: Both COLMAP++ and our approach achieve similar accuracy for camera motion and point cloud because the scene has sufficient motion-parallax.

[1] Chao-Yuan Wu and Philipp Krahenbuhl. Towards long-form video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[2] https://colmap.github.io/