Synthetically Supervised Feature Learning for Scene Text Recognition

Yang Liu, Zhaowen Wang, Hailin Jin, Ian Wassell ; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 435-451


We address the problem of image feature learning for scene text recognition. The image features in the state-of-the-art methods are learned from large-scale synthetic image datasets. However, most methods only rely on outputs of the synthetic data generation process, namely realistically looking images, and completely ignore the rest of the process. We propose to leverage the parameters that lead to the output images to improve image feature learning. Specifically, for every image out of the data generation process, we obtain the associated parameters and render another "clean" image that is free of select distortion factors that are applied to the output image. Because of the absence of distortion factors, the clean image tends to be easier to recognize than the original image. We design a multi-task network with an encoder-discriminator-generator architecture to guide the feature of the original image toward that of the clean image. The experiments show that our method significantly outperforms the state-of-the-art methods on standard scene text recognition benchmarks. Furthermore, we show that without explicitly handling, our method works on challenging cases where input images contain severe geometric distortion, such as text on a curved path.

Related Material

author = {Liu, Yang and Wang, Zhaowen and Jin, Hailin and Wassell, Ian},
title = {Synthetically Supervised Feature Learning for Scene Text Recognition},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}