Region-Based Representations Revisited

Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao, Yuqun Wu, Sethuraman TV, Heyi Tao, Jae Yong Lee, Wilfredo Torres, Yu-Xiong Wang, Derek Hoiem; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 17107-17116

Abstract


We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmenters like SAM can be effectively combined with strong unsupervised representations like DINOv2 and used for a wide variety of tasks including semantic segmentation object-based image retrieval and multi-image analysis. Once the masks and features are extracted these representations even with linear decoders enable competitive performance making them well suited to applications that require custom queries. The compactness of the representation also makes it well-suited to video analysis and other problems requiring inference across many images.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Shlapentokh-Rothman_2024_CVPR, author = {Shlapentokh-Rothman, Michal and Blume, Ansel and Xiao, Yao and Wu, Yuqun and TV, Sethuraman and Tao, Heyi and Lee, Jae Yong and Torres, Wilfredo and Wang, Yu-Xiong and Hoiem, Derek}, title = {Region-Based Representations Revisited}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {17107-17116} }