Learning Beyond Labels: Self-Supervised Handwritten Text Recognition

Shree Mitra, Ajoy Mondal, C.V. Jawahar; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026, pp. 6653-6663

Abstract


This paper addresses a key challenge in Handwritten Text Recognition (HTR): the dependence on large volumes of labeled data. To overcome this, we propose a self-supervised learning (SSL) framework, LoGo-HTR, that minimizes labeling requirements while achieving strong recognition performance. We introduce a large-scale dataset, SSL-HWD of 10 million word-level handwritten images from diverse scanned documents, partitioned into a small labeled subset and a much larger unlabeled subset. LoGo-HTR combines a local contrastive loss for spatial consistency and a global decorrelation loss to enhance feature diversity. This dual objective enables robust, invariant, and spatially discriminative feature learning. After self-supervised pretraining, we fine-tune a transformer-based decoder using limited labeled data. Extensive experiments on standard HTR benchmarks, which include multilingual and historical data, demonstrate that, after SSL pretraining on our unlabeled dataset, our method consistently outperforms state-of-the-art approaches, even when fine-tuned using only 80% and 20% of the available labeled training data from the respective benchmarks. Ablation studies highlight the effectiveness of our dual loss design and demonstrate the potential of scalable, label-efficient handwritten text recognition. The SSL-HWD dataset and LoGo-HTR model with code are publicly available at https://logo-ssl.github.io/.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Mitra_2026_WACV, author = {Mitra, Shree and Mondal, Ajoy and Jawahar, C.V.}, title = {Learning Beyond Labels: Self-Supervised Handwritten Text Recognition}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {March}, year = {2026}, pages = {6653-6663} }