Recursive Recurrent Nets With Attention Modeling for OCR in the Wild

Lee, Chen-Yu; Osindero, Simon

Chen-Yu Lee, Simon Osindero; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2231-2239

Abstract

We present recursive recurrent neural networks with attention modeling (R2AM) for lexicon-free optical character recognition in natural scene images. The primary advantages of the proposed method are: (1) use of recursive convolutional neural networks (CNNs), which allow for parametrically efficient and effective image feature extraction; (2) an implicitly learned character-level language model, embodied in a recurrent neural network which avoids the need to use N-grams; and (3) the use of a soft-attention mechanism, allowing the model to selectively exploit image features in a coordinated way, and allowing for end-to-end training within a standard backpropagation framework. We validate our method with state-of-the-art performance on challenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Lee_2016_CVPR,
author = {Lee, Chen-Yu and Osindero, Simon},
title = {Recursive Recurrent Nets With Attention Modeling for OCR in the Wild},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}