Video Object Matting via Hierarchical Space-Time Semantic Guidance

Yumeng Wang, Bo Xu, Ziwen Li, Han Huang, Cheng Lu, Yandong Guo; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 5120-5129

Abstract


Different from most existing approaches that require trimap generation for each frame, we reformulate video object matting (VOM) by introducing improved semantic guidance propagation. The proposed approach can achieve a higher degree of temporal coherence between frames with only a single coarse mask as reference. In this paper, we adapt the hierarchical memory matching mechanism into the space-time baseline to build an efficient and robust framework for semantic guidance propagation and alpha prediction. To enhance the temporal smoothness, we also propose a cross-frame attention refinement (CFAR) module that can refine the feature representations across multiple adjacent frames (both historical and current frames) based on the spatio-temporal correlation among the cross-frame pixels. Extensive experiments demonstrate the effectiveness of hierarchical spatio-temporal semantic guidance and the cross-video-frame attention refinement module, and our model outperforms the state-of-the-art VOM methods. We also analyze the significance of different components in our model.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Wang_2023_WACV, author = {Wang, Yumeng and Xu, Bo and Li, Ziwen and Huang, Han and Lu, Cheng and Guo, Yandong}, title = {Video Object Matting via Hierarchical Space-Time Semantic Guidance}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {5120-5129} }