Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

Bi, Tianci; Zhang, Xiaoyi; Zhang, Zhizheng; Xie, Wenxuan; Lan, Cuiling; Lu, Yan; Zheng, Nanning

Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang, Wenxuan Xie, Cuiling Lan, Yan Lu, Nanning Zheng; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 28150-28159

Abstract

Significant progress has been made in scene text detection models since the rise of deep learning but scene text layout analysis which aims to group detected text instances as paragraphs has not kept pace. Previous works either treated text detection and grouping using separate models or train a model from scratch while using a unified one. All of them have not yet made full use of the already well-trained text detectors and easily obtainable detection datasets. In this paper we present Text Grouping Adapter (TGA) a module that can enable the utilization of various pre-trained text detectors to learn layout analysis allowing us to adopt a well-trained text detector right off the shelf or just fine-tune it efficiently. Designed to be compatible with various text detector architectures TGA takes detected text regions and image features as universal inputs to assemble text instance features. To capture broader contextual information for layout analysis we propose to predict text group masks from text instance features by one-to-many assignment. Our comprehensive experiments demonstrate that even with frozen pre-trained models incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance simultaneously inheriting generalized text detection ability from pre-training. In the case of full parameter fine-tuning we can further improve layout analysis performance.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Bi_2024_CVPR, author = {Bi, Tianci and Zhang, Xiaoyi and Zhang, Zhizheng and Xie, Wenxuan and Lan, Cuiling and Lu, Yan and Zheng, Nanning}, title = {Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {28150-28159} }