DocTTT: Test-Time Training for Handwritten Document Recognition using Meta-Auxiliary Learning

Gu, Wenhao; Gu, Li; Wang, Ziqiang; Suen, Ching Y; Wang, Yang

Wenhao Gu, Li Gu, Ziqiang Wang, Ching Y Suen, Yang Wang; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 1904-1913

Abstract

Despite recent significant advancements in Handwritten Document Recognition (HDR) the efficient and accurate recognition of text against complex backgrounds diverse handwriting styles and varying document layouts remains a practical challenge. Moreover this issue is seldom addressed in academic research particularly in scenarios with minimal annotated data available. In this paper we introduce the DocTTT framework to address these challenges. The key innovation of our approach is that it uses test-time training to adapt the model to each specific input during testing. We propose a novel Meta-Auxiliary learning approach that combines Meta-learning and self-supervised Masked Autoencoder(MAE). During testing we adapt the visual representation parameters using a self-supervised MAE loss. During training we learn the model parameters using a meta-learning framework so that the model parameters are learned to adapt to a new input effectively. Experimental results show that our proposed method significantly outperforms existing state-of-the-art approaches on benchmark datasets.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Gu_2025_WACV, author = {Gu, Wenhao and Gu, Li and Wang, Ziqiang and Suen, Ching Y and Wang, Yang}, title = {DocTTT: Test-Time Training for Handwritten Document Recognition using Meta-Auxiliary Learning}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {1904-1913} }