Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

Shangbang Long, Siyang Qin, Yasuhisa Fujii, Alessandro Bissacco, Michalis Raptis; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 903-913

Abstract


We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis. HTS can recognize text in an image and identify its 4-level hierarchical structure: characters, words, lines, and paragraphs. The proposed HTS is characterized by two novel components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve polygons of text lines and an affinity matrix for paragraph grouping between detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits lines into characters and further merges them back into words. HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Long_2024_WACV, author = {Long, Shangbang and Qin, Siyang and Fujii, Yasuhisa and Bissacco, Alessandro and Raptis, Michalis}, title = {Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {903-913} }