-
[pdf]
[bibtex]@InProceedings{Rajic_2025_WACV, author = {Raji\v{c}, Frano and Ke, Lei and Tai, Yu-Wing and Tang, Chi-Keung and Danelljan, Martin and Yu, Fisher}, title = {Segment Anything Meets Point Tracking}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {9284-9293} }
Segment Anything Meets Point Tracking
Abstract
Foundation models have marked a significant stride toward addressing generalization challenges in deep learning. While the Segment Anything Model (SAM) has established a strong foothold in image segmentation existing video segmentation methods still require extensive mask labeling for fine-tuning or face performance drops on unseen data domains otherwise. In this paper we show how foundation models for image segmentation make a step toward enhancing domain generalizability in video segmentation. We discover that combined with long-term point tracking image segmentation models yield state-of-the-art results in zero-shot video segmentation across multiple benchmarks. Surprisingly point trackers exhibit generalization to domains beyond their synthetic pre-training sequences which we attribute to the trackers' ability to harness the rich local information in the vicinity of each tracked point. Thus we introduce SAM-PT an innovative method for point-centric video segmentation leveraging the capabilities of SAM alongside long-term point tracking. SAM-PT extends SAM's capability to tracking and segmenting anything in dynamic videos. Unlike traditional video segmentation methods that focus on object-centric mask propagation our approach uniquely exploits point propagation to utilize local structure information independent of object semantics. The effectiveness of point-based tracking is underscored by direct evaluation on the zero-shot open-world UVO benchmark. Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks including DAVIS YouTube-VOS and BDD100K suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions. We release our code at https://github.com/SysCV/sam-pt.
Related Material