Dictionary-Guided Scene Text Recognition

Nguyen Nguyen, Thu Nguyen, Vinh Tran, Minh-Triet Tran, Thanh Duc Ngo, Thien Huu Nguyen, Minh Hoai; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7383-7392


Language prior plays an important role in the way humans perceive and recognize text in the wild. In this work, we present an approach to train and use scene text recognition models by exploiting multiple clues from a language reference. Current scene text recognition methods have used lexicons to improve recognition performance, but their naive approach of simply casting the output into a dictionary word based purely on the edit distance has many limitations. We introduce here a novel approach to incorporate a dictionary in both the training and inference stage of a scene text recognition system. We use the dictionary to generate a list of possible outcomes and find the one that is most compatible with the visual appearance of the text. The proposed method leads to a robust scene text recognition model, which is better at handling ambiguous cases encountered in the wild, and improves the overall performance of a state-of-the-art scene text spotting framework. Our work suggests that incorporating language prior is a potential approach to advance scene text detection and recognition methods. Besides, we contribute a challenging scene text dataset for Vietnamese, where some characters are equivocal in the visual form due to accent symbols. This dataset will serve as a challenging benchmark for measuring the applicability and robustness of scene text detection and recognition algorithms.

Related Material

[pdf] [supp]
@InProceedings{Nguyen_2021_CVPR, author = {Nguyen, Nguyen and Nguyen, Thu and Tran, Vinh and Tran, Minh-Triet and Ngo, Thanh Duc and Nguyen, Thien Huu and Hoai, Minh}, title = {Dictionary-Guided Scene Text Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {7383-7392} }