Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

Havrylov, Volodymyr; Huang, Haiwen; Zhang, Dan; Geiger, Andreas

Volodymyr Havrylov, Haiwen Huang, Dan Zhang, Andreas Geiger; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 210-219

Abstract

Vision Foundation Models (VFMs) are large-scale, pre-trained models that serve as general-purpose backbones for various computer vision tasks. As VFMs' popularity grows, there is an increasing interest in understanding their effectiveness for dense prediction tasks. However, VFMs typically produce low-resolution features, limiting their direct applicability in this context. One way to tackle this limitation is by employing a task-agnostic feature upsampling module that refines VFM features resolution. To assess the effectiveness of this approach, we investigate Interactive Segmentation (IS) as a novel benchmark for evaluating feature upsampling methods on VFMs. Due to its inherent multimodal input, consisting of an image and a set of user-defined clicks, as well as its dense mask output, IS creates a challenging environment that demands comprehensive visual scene understanding. Our benchmarking experiments show that selecting appropriate upsampling strategies significantly improves VFM features quality. The code is released at https://github.com/havrylovv/iSegProbe.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Havrylov_2025_ICCV, author = {Havrylov, Volodymyr and Huang, Haiwen and Zhang, Dan and Geiger, Andreas}, title = {Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {210-219} }