DREAM: Document Recognition with Explicit Adaptive Memory

Tianqi Zhao, Di Wu, Liangrui Peng, Yifan Huang, Kemeng Zhao, Shuo Li, Zhiyu Li, Yizhu Wang, Borui Jiang, Yuyang Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 2715-2724

Abstract


Large multimodal models (LMMs) have shown promising performance for various document recognition tasks. However, LMMs adopt implicit modeling, and the parameters lack interpretability. Inspired by recent advances in human memory and learning research, we propose an explicit multiscale prototype memory that augments document recognition models, explicitly modeling recurrent layout and stylistic patterns across different spatial resolutions. A Memory Retrieval Mechanism enables local regions to sparsely attend to a few prototypes (e.g., image borders, tilted text); the retrieved compositional factors are concatenated with visual features and passed to the decoder, providing explicit region-wise structural context. Prototype memory consolidation updates and stabilizes prototypes via attention-weighted exponential moving average (EMA) strategy, while sparsity and anti-collapse regularization promote selective activation. We further adopt hierarchical memory for multi-resolution encoding. The proposed DREAM module is a plug-and-play component, allowing seamless integration into various encoder-decoder architectures. We validate on two tasks including document recognition on public datasets and the self-built DreamDoc dataset, and handwriting recognition on the SCUT-HCCDoc and SCUT-EPT datasets. Experimental results show that the proposed method is effective.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhao_2026_CVPR, author = {Zhao, Tianqi and Wu, Di and Peng, Liangrui and Huang, Yifan and Zhao, Kemeng and Li, Shuo and Li, Zhiyu and Wang, Yizhu and Jiang, Borui and Li, Yuyang}, title = {DREAM: Document Recognition with Explicit Adaptive Memory}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {2715-2724} }