FocusTune: Tuning Visual Localization Through Focus-Guided Sampling

Son Tung Nguyen, Alejandro Fontan, Michael Milford, Tobias Fischer; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 3606-3615

Abstract


We propose FocusTune, a focus-guided sampling technique to improve the performance of visual localization algorithms. FocusTune directs a scene coordinate regression model towards regions critical for 3D point triangulation by exploiting key geometric constraints. Specifically, rather than uniformly sampling points across the image for training the scene coordinate regression model, we instead re-project 3D scene coordinates onto the 2D image plane and sample within a local neighborhood of the re-projected points. While our proposed sampling strategy is generally applicable, we showcase FocusTune by integrating it with the recently introduced Accelerated Coordinate Encoding (ACE) model. Our results demonstrate that FocusTune both improves or matches state-of-the-art performance whilst keeping ACE's appealing low storage and compute requirements, for example reducing translation error from 25 to 19 and 17 to 15 cm for single and ensemble models, respectively, on the Cambridge Landmarks dataset. This combination of high performance and low compute and storage requirements is particularly promising for applications in areas like mobile robotics and augmented reality. We made our code available at https://github.com/sontung/focus-tune.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Nguyen_2024_WACV, author = {Nguyen, Son Tung and Fontan, Alejandro and Milford, Michael and Fischer, Tobias}, title = {FocusTune: Tuning Visual Localization Through Focus-Guided Sampling}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {3606-3615} }