KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

Vannkinh Nom, Souhail Bakkali, Muhammad Muzzamil Luqman, Mickaël Coustaty, Jean-Marc Ogier; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 1777-1792

Abstract


Developing effective scene text detection and recognition models hinges on extensive training data, which can be both laborious and costly to obtain, especially for low-resourced languages. Conventional methods tailored for Latin characters often falter with non-Latin scripts due to challenges like character stacking, diacritics, and variable character widths without clear word boundaries. In this paper, we introduce the first Khmer scene-text dataset, featuring 1,544 expert-annotated images, including 997 indoor and 547 outdoor scenes. This diverse dataset includes flat text, raised text, poorly illuminated text, distant and partially obscured text. Annotations provide line-level text and polygonal bounding box coordinates for each scene.The benchmark includes baseline models for scene-text detection and recognition tasks, providing a robust starting point for future research endeavors. The KhmerST dataset is publicly accessible.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Nom_2024_ACCV, author = {Nom, Vannkinh and Bakkali, Souhail and Luqman, Muhammad Muzzamil and Coustaty, Micka\"el and Ogier, Jean-Marc}, title = {KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {1777-1792} }