Zero-Shot Table Extraction in Business Documents: A Unified Benchmark with Error Taxonomy and Ecological Analysis

Eliott Thomas, Mickael Coustaty, Aurélie Joseph, Tri-Cong Pham, Gaspar Deloin, Elodie Carel, Vincent Poulain D'andecy, Jean-Marc Ogier; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026, pp. 4974-4983

Abstract


Tables in business documents power analytics and compliance, yet task-specific datasets are costly to build. Practitioners therefore turn to zero-shot vision-language models (VLMs). We study zero-shot realism for table detection (TD) and table structure recognition (TSR) under a unified protocol on DocILE-QUEST and a private STM154 corpus. We report TD with GIoU, Purity, and Completeness, and TSR with TEDS and TEDS-S, evaluating commercial VLMs (GPT-4o, GPT-5-mini), compact detectors, and supervised YOLO/DETR baselines. Zero-shot VLMs are strong for TSR and competitive for TD, while fine-tuned or from-scratch detectors lead when box quality and robustness to clutter matter. We add an automated error taxonomy that isolates actionable failures (missed, merged/split tables, header-body confusions, cell topology). Finally, we quantify emissions, finding a 10^4 gap between the lightest and heaviest systems.

Related Material


[pdf]
[bibtex]
@InProceedings{Thomas_2026_WACV, author = {Thomas, Eliott and Coustaty, Mickael and Joseph, Aur\'elie and Pham, Tri-Cong and Deloin, Gaspar and Carel, Elodie and D'andecy, Vincent Poulain and Ogier, Jean-Marc}, title = {Zero-Shot Table Extraction in Business Documents: A Unified Benchmark with Error Taxonomy and Ecological Analysis}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {March}, year = {2026}, pages = {4974-4983} }