Towards End-To-End Text Spotting With Convolutional Recurrent Neural Networks

Hui Li, Peng Wang, Chunhua Shen; The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5238-5246

Abstract


In this work, we jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks. We propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes, such as image cropping, feature re-calculation, word separation, and character grouping. In contrast to existing approaches that consider text detection and recognition as two distinct tasks and tackle them one by one, the proposed framework settles these two tasks concurrently. The whole framework can be trained end-to-end, requiring only images, ground-truth bounding boxes and text labels. The convolutional features are calculated only once and shared by both detection and recognition, which saves processing time. Through multi-task training, the learned features become more informative and improves the overall performance. Our proposed method has achieved competitive performance on several benchmark datasets.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Li_2017_ICCV,
author = {Li, Hui and Wang, Peng and Shen, Chunhua},
title = {Towards End-To-End Text Spotting With Convolutional Recurrent Neural Networks},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}