GRIB: Combining Global Reception and Inductive Bias For Human Segmentation and Matting

Yezhi Shen, Weichen Xu, Qian Lin, Jan P. Allebach, Fengqing Zhu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 5576-5585

Abstract


Human video segmentation and matting are challenging computer vision tasks with many applications such as background replacement or background editing. Numerous methods have been proposed for human segmentation and matting in either portrait or first-person view videos. In this paper we propose a real-time network that performs first-person view hand and manipulated object segmentation as well as second-person view human video matting. We introduce a global reception inductive bias block in the network's encoder that aggregates the pixel features at short medium and long ranges. Furthermore we propose a multi-target optimization method that fully leverages segmentation and matting labels to accelerate training. Our model outperforms existing real-time methods by achieving 93.9% mIoU on HP-Portrait 95.1% mIoU on VideoMatte as well as 72.7% mIoU on EgoHOS datasets and achieves faster runtime.

Related Material


[pdf]
[bibtex]
@InProceedings{Shen_2024_CVPR, author = {Shen, Yezhi and Xu, Weichen and Lin, Qian and Allebach, Jan P. and Zhu, Fengqing}, title = {GRIB: Combining Global Reception and Inductive Bias For Human Segmentation and Matting}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {5576-5585} }