High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer

Mingxian Li, Hao Sun, Yingtie Lei, Xiaofeng Zhang, Yihang Dong, Yilin Zhou, Zimeng Li, Xuhang Chen; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 7603-7613

Abstract


Document images are often degraded by various stains significantly impacting their readability and hindering downstream applications such as document digitization and analysis. The absence of a comprehensive stained document dataset has limited the effectiveness of existing document enhancement methods in removing stains while preserving fine-grained details. To address this challenge we construct StainDoc the first large-scale high-resolution (2145x2245) dataset specifically designed for document stain removal. StainDoc comprises over 5000 pairs of stained and clean document images across multiple scenes. This dataset encompasses a diverse range of stain types severities and document backgrounds facilitating robust training and evaluation of document stain removal algorithms. Furthermore we propose StainRestorer a Transformer-based document stain removal approach. StainRestorer employs a memory-augmented Transformer architecture that captures hierarchical stain representations at part instance and semantic levels via the DocMemory module. The Stain Removal Transformer (SRTransformer) leverages these feature representations through a dual attention mechanism: an enhanced spatial attention with an expanded receptive field and a channel attention captures channel-wise feature importance. This combination enables precise stain removal while preserving document content integrity. Extensive experiments demonstrate StainRestorer's superior performance over state-of-the-art methods on the StainDoc dataset and its variants StainDoc_Mark and StainDoc_Seal establishing a new benchmark for document stain removal. Our work highlights the potential of memory-augmented Transformers for this task and contributes a valuable dataset to advance future research.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Li_2025_WACV, author = {Li, Mingxian and Sun, Hao and Lei, Yingtie and Zhang, Xiaofeng and Dong, Yihang and Zhou, Yilin and Li, Zimeng and Chen, Xuhang}, title = {High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7603-7613} }