Query by Strings and Return Ranking Word Regions with Only One Look

Peng Zhao, Wenyuan Xue, Qingyong Li, Siqi Cai; Proceedings of the Asian Conference on Computer Vision (ACCV), 2020


Word spotting helps people like archaeologists, historian and internet censors to retrieve regions of interest from document images according to the queries defined by them. However, words in handwritten historical document images are generally densely distributed and have many overlapping strokes, which make it challenging to apply word spotting in such scenarios. Recently, deep learning based methods have achieved significant performance improvement, which usually adopt two-stage object detectors to produce word segmentation results and then embed cropped word regions into a word embedding space. Different from these multi-stage methods, this paper presents an effective end-to-end trainable method for segmentation-free query-by-string word spotting. To the best of our knowledge, this is the first work that uses a single network to simultaneously predict word bounding box and word embedding in only one stage by adopting feature sharing and multi-task learning strategy. Experiments on several benchmarks demonstrate that the proposed method surpasses the previous state-of-the-art segmentation-free methods.

Related Material

[pdf] [code]
@InProceedings{Zhao_2020_ACCV, author = {Zhao, Peng and Xue, Wenyuan and Li, Qingyong and Cai, Siqi}, title = {Query by Strings and Return Ranking Word Regions with Only One Look}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {November}, year = {2020} }