-
[pdf]
[bibtex]@InProceedings{Singh_2025_WACV, author = {Singh, Iknoor and Colom, Miguel and Bontcheva, Kalina}, title = {A Comparative Analysis of OCR Models on Diverse Datasets: Insights from Memes and Hiertext Dataset}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {1343-1353} }
A Comparative Analysis of OCR Models on Diverse Datasets: Insights from Memes and Hiertext Dataset
Abstract
Optical Character Recognition (OCR) plays a critical role in various text extraction applications yet its performance varies significantly across languages visual appearances and document types. This study benchmarks several state-of-the-art OCR models on two distinct and challenging datasets: MemeDataset which contains informal meme-based images and HierText featuring multilingual text in natural scenes and documents. We comprehensively evaluate a range of OCR models on these datasets highlighting their strengths and limitations. While large multimodal models excel in complex text extraction they are hindered by slower processing times and irrelevant text generation in challenging scenarios. Also we examine the models' ability to handle multilingual content finding that some newer models perform well in certain languages but struggle in others. Our results emphasise the need for further enhancements in open-sourced large multimodal models to improve efficiency and accuracy particularly in multilingual scenarios. In contrast other open-source models such as EasyOCR deliver faster inference times with reasonably competitive performance making them more suitable for time-sensitive applications. The findings offer valuable insights for researchers and practitioners in selecting the right models for diverse use cases.
Related Material