-
[pdf]
[bibtex]@InProceedings{Wang_2026_CVPR, author = {Wang, Kai and Choudhury, Anustup and Su, Guan-Ming}, title = {DeGrainVAR: Film Grain Removal with Visual AutoRegressive Modeling}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2026}, pages = {5095-5104} }
DeGrainVAR: Film Grain Removal with Visual AutoRegressive Modeling
Abstract
Film grain noise is present in analog content due to the process of capturing content on film. It is also synthetically added to digital content to improve the aesthetic look. From a video coding perspective, there is a need to remove the film grain noise to improve compression efficiency. Film grain removal requires suppressing stochastic, signal-dependent noise while preserving genuine textures, edges, and local structure. The task is difficult because film grain noise overlaps with image detail in both spatial and frequency domains, so aggressive denoising will blur the structural details while conservative denoising leaves visible residual artifacts. We present DeGrainVAR, a conditional Visual AutoRegressive model for film-grain restoration. Built on Visual AutoRegressive Modeling (VAR), DeGrainVAR predicts clean latent tokens from a grain-corrupted input image with a conditional VAR backbone and reconstructs the final image with a hybrid decoder that combines latent-driven synthesis and noisy-image guided refinement. The model is built on three components: 1) Residual token initialization, which turns latent prediction into residual correction; 2) Per-layer cross-attention conditioning, which preserves conditioning signals throughout the transformer; and 3) A hybrid decoder trained with pixel and frequency-aware losses to recover details beyond the quantization bottleneck. Experiments on public film grain dataset demonstrates that DeGrainVAR outperforms state-of-the-art restoration (generative and transformer-based) methods.
Related Material

