Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation

Stéphane Vujasinović, Stefan Becker, Sebastian Bullinger, Norbert Scherer-Negenborn, Michael Arens, Rainer Stiefelhagen; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 2784-2802

Abstract


In this paper, we introduce a variant of video object segmentation VOS) that bridges interactive and semi-automatic approaches, termed Lazy Video Object Segmentation (ziVOS). In contrast, to both tasks, which handle video object segmentation in an off-line manner (i.e., pre-recorded sequences), we propose through ziVOS to target online recorded sequences. Here, we strive to strike a balance between performance and robustness for long-term scenarios by soliciting user feedbacks on-the-fly during the segmentation process. Hence, we aim to maximize the tracking duration of an object of interest, while requiring minimal user corrections to maintain tracking over an extended period. We propose Lazy-XMem as a competitive baseline, that estimates the uncertainty of the tracking state to determine whether a user interaction is necessary to refine the model's prediction. We introduce complementary metrics alongside those already established in the field, to quantitatively assess the performance of our method and the user's workload. We evaluate our approach using the recently introduced LVOS dataset, which offers numerous long-term videos. Our code is available at https://github.com/Vujas-Eteph/LazyXMem.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Vujasinovic_2024_ACCV, author = {Vujasinovi\'c, St\'ephane and Becker, Stefan and Bullinger, Sebastian and Scherer-Negenborn, Norbert and Arens, Michael and Stiefelhagen, Rainer}, title = {Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {2784-2802} }