- [pdf] [supp]
CMT-Co: Contrastive Learning with Character Movement Task for Handwritten Text Recognition
Mainstream handwritten text recognition (HTR) approaches require large-scale labeled data for training to achieve satisfactory performance. Recently, contrastive learning has been introduced to perform self-supervised training on unlabeled data to improve representational capacity. It minimizes the distance between the positive pairs while maximizing their distance to the negative ones. Previous studies typically consider each frame or a fixed window of frames in a sequential feature map as a separate instance for contrastive learning. However, owing to the arbitrariness of handwriting and the diversity of word length, such modeling may contain the information of multiple consecutive characters or an over-segmented sub-character, which may confuse the model to perceive semantic clues information. To address this issue, in this paper, we design a character-level pretext task termed Character Movement Task, to assist word-level contrastive learning, namely CMT-Co. It moves the characters in a word to generate artifacts and guides the model to perceive the text content by using the moving direction and distance as supervision. In addition, we customize a data augmentation strategy specifically for handwritten text, which significantly contributes to the construction of training pairs for contrastive learning. Experiments have shown that the proposed CMT-Co achieves competitive or even superior performance compared to previous methods on public handwritten benchmarks.