- [pdf] [supp]
Temporally-Consistent Video Semantic Segmentation With Bidirectional Occlusion-Guided Feature Propagation
Despite recent progress in static image segmentation, video segmentation is still challenging due to the need for an accurate, fast, and temporally consistent model. Conducting per-frame static image segmentation is not acceptable since it is computationally prohibitive and prone to temporal inconsistency. In this paper, we present bidirectional occlusion-guided feature propagation (BOFP) method with the goal of improving temporal consistency of segmentation results without sacrificing segmentation accuracy, while at the same time keeping the operations at a low computation cost. It leverages temporal coherence in the video by feature propagation from keyframes to other frames along the motion paths in both forward and backward directions. We propose an occlusion-based attention network to estimate the distorted areas based on bidirectional optical flows, and utilize them as cues for correcting and fusing the propagated features. Extensive experiments on benchmark datasets demonstrate that the proposed BOFP method achieves superior performance in terms of temporal consistency while maintaining comparable level of segmentation accuracy at a low computation cost, striking a great balance among the three metrics essential to evaluate video segmentation solutions.