Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

Shuyun Wang, Hu Zhang, Xin Shen, Dadong Wang, Xin Yu; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 22975-22984

Abstract


Bitstream-corrupted video recovery aims to fill in realistic video content due to bitstream corruption during video storage or transmission. Most existing methods typically assume that the predefined masks of the corrupted regions are known in advance. However, manually annotating these masks is laborious and time-consuming, limiting the applicability of existing methods in real-world scenarios. Therefore, we expect to relax this assumption by defining a new blind video recovery setting where the recovery of corrupted regions does not rely on predefined masks. There are two significant challenges in this setting: (i) without predefined masks, how accurately can a model identify the regions requiring recovery? (ii) how to recover contents from extensive and irregular regions, especially when large portions of frames are severely degraded? To address these challenges, we introduce a Metadata-Guided Diffusion Model, dubbed M-GDM. To enable a diffusion model focusing on the corrupted regions, we leverage intrinsic video metadata as a corruption indicator and design a dual-stream metadata encoder. This encoder first embeds the motion vectors and frame types of a video separately and then merges them into a unified metadata representation. The metadata representation will interact with the corrupted latent feature through cross-attention mechanisms at each diffusion step. Meanwhile, to preserve the intact regions, we propose a prior-driven mask predictor that generates pseudo masks for the corrupted regions by leveraging the metadata prior and diffusion prior. These pseudo masks enable the separation and recombination of intact and recovered regions through hard masking. However, imperfections in pseudo mask predictions and hard masking processes often result in boundary artifacts. Thus, we introduce a post-refinement module that refines the hard-masked outputs, enhancing the consistency between intact and recovered regions. Extensive experiment results validate the effectiveness of our method and demonstrate its superiority in the blind video recovery task.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Wang_2025_CVPR, author = {Wang, Shuyun and Zhang, Hu and Shen, Xin and Wang, Dadong and Yu, Xin}, title = {Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {22975-22984} }