TPD-STR: Text Polygon Detection with Split Transformers

Sangyeon Kim, Sangkuk Lee, Jeesoo Kim, Nojun Kwak; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 8940-8949

Abstract


Regressing text in natural scenes with polygonal representations is challenging due to shape prediction difficulties. To address this we introduce Text Polygon Detection with Split Transformers (TPD-STR) which directly regresses polygonal points. TPD-STR incorporates the Decoder Split (DS) architecture to separate polygonal point regression and textness classification and the Positional Information Propagation (PIP) module to enhance classification. Both modules are effective and compatible with existing methods. TPD-STR achieves state-of-the-art (SOTA) performance among regression-based methods surpassing segmentation-based methods on MSRA-TD500 without external data. Adding DS and PIP to existing models further improves performance. Experiments demonstrate the model's ability to detect text instances effectively.

Related Material


[pdf]
[bibtex]
@InProceedings{Kim_2025_WACV, author = {Kim, Sangyeon and Lee, Sangkuk and Kim, Jeesoo and Kwak, Nojun}, title = {TPD-STR: Text Polygon Detection with Split Transformers}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8940-8949} }