Harnessing the Power of Multi-Lingual Datasets for Pre-Training: Towards Enhancing Text Spotting Performance

Das, Alloy; Biswas, Sanket; Banerjee, Ayan; Lladós, Josep; Pal, Umapada; Bhattacharya, Saumik

Alloy Das, Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal, Saumik Bhattacharya; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 718-728

Abstract

The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene image datasets, which do not directly exploit the feature interaction between multiple domains. In this work, we investigate the problem of domain-adapted scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular (ICDAR2015) and arbitrary-shaped scene text (CTW1500, TotalText) along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations on text spotting benchmarks across multiple domains (e.g. language, synth to real, and documents) both in terms of accuracy and model efficiency.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Das_2024_WACV, author = {Das, Alloy and Biswas, Sanket and Banerjee, Ayan and Llad\'os, Josep and Pal, Umapada and Bhattacharya, Saumik}, title = {Harnessing the Power of Multi-Lingual Datasets for Pre-Training: Towards Enhancing Text Spotting Performance}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {718-728} }