Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection

Wenjun Hui, Zhenfeng Zhu, Shuai Zheng, Yao Zhao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 19058-19067

Abstract


The Segment Anything Model (SAM) a prompt-driven foundational model has demonstrated remarkable performance in natural image segmentation. However its application in video camouflaged object detection (VCOD) encounters challenges chiefly stemming from the overlooked temporal-spatial associations and the unreliability of user-provided prompts for camouflaged objects that are difficult to discern with the naked eye. To tackle the above issues we endow SAM with keen eyes and propose the Temporal-spatial Prompt SAM (TSP-SAM) a novel approach tailored for VCOD via an ingenious prompted learning scheme. Firstly motion-driven self-prompt learning is employed to capture the camouflaged object thereby bypassing the need for user-provided prompts. With the detected subtle motion cues across consecutive video frames the overall movement of the camouflaged object is captured for more precise spatial localization. Subsequently to eliminate the prompt bias resulting from inter-frame discontinuities the long-range consistency within the video sequences is taken into account to promote the robustness of the self-prompts. It is also injected into the encoder of SAM to enhance the representational capabilities. Extensive experimental results on two benchmarks demonstrate that the proposed TSP-SAM achieves a significant improvement over the state-of-the-art methods. With the mIoU metric increasing by 7.8% and 9.6% TSP-SAM emerges as a groundbreaking step forward in the field of VCOD.

Related Material


[pdf]
[bibtex]
@InProceedings{Hui_2024_CVPR, author = {Hui, Wenjun and Zhu, Zhenfeng and Zheng, Shuai and Zhao, Yao}, title = {Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {19058-19067} }