Unifying Automatic and Interactive Matting with Pretrained ViTs

Ye, Zixuan; Liu, Wenze; Guo, He; Liang, Yujia; Hong, Chaoyi; Lu, Hao; Cao, Zhiguo

Zixuan Ye, Wenze Liu, He Guo, Yujia Liang, Chaoyi Hong, Hao Lu, Zhiguo Cao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 25585-25594

Abstract

Automatic and interactive matting largely improve image matting by respectively alleviating the need for auxiliary input and enabling object selection. Due to different settings on whether prompts exist they either suffer from weakness in instance completeness or region details. Also when dealing with different scenarios directly switching between the two matting models introduces inconvenience and higher workload. Therefore we wonder whether we can alleviate the limitations of both settings while achieving unification to facilitate more convenient use. Our key idea is to offer saliency guidance for automatic mode to enable its attention to detailed regions and also refine the instance completeness in interactive mode by replacing the binary mask guidance with a more probabilistic form. With different guidance for each mode we can achieve unification through adaptable guidance defined as saliency information in automatic mode and user cue for interactive one. It is instantiated as candidate feature in our method an automatic switch for class token in pretrained ViTs and average feature of user prompts controlled by the existence of user prompts. Then we use the candidate feature to generate a probabilistic similarity map as the guidance to alleviate the over-reliance on binary mask. Extensive experiments show that our method can adapt well to both automatic and interactive scenarios with more light-weight framework. Code available at https://github.com/coconuthust/SmartMatting.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Ye_2024_CVPR, author = {Ye, Zixuan and Liu, Wenze and Guo, He and Liang, Yujia and Hong, Chaoyi and Lu, Hao and Cao, Zhiguo}, title = {Unifying Automatic and Interactive Matting with Pretrained ViTs}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {25585-25594} }