-
[pdf]
[supp]
[bibtex]@InProceedings{Mathur_2023_WACV, author = {Mathur, Puneet and Jain, Rajiv and Mehra, Ashutosh and Gu, Jiuxiang and Dernoncourt, Franck and N., Anandhavelu and Tran, Quan and Kaynig-Fittkau, Verena and Nenkova, Ani and Manocha, Dinesh and Morariu, Vlad I.}, title = {LayerDoc: Layer-Wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {3610-3620} }
LayerDoc: Layer-Wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents
Abstract
Digital documents often contain images and scanned text. Parsing such visually-rich documents is a core task for workflow automation, but it remains challenging since most documents do not encode explicit layout information, e.g., how characters and words are grouped into boxes and ordered into larger semantic entities. Current state-of-the-art layout extraction methods are challenged on such documents as they rely on word sequences to have correct reading order and do not exploit their hierarchical structure. We propose LayerDoc, an approach that uses visual features, textual semantics, and spatial coordinates along with constraint inference to extract the hierarchical layout structure of documents in a bottom-up layer-wise fashion. LayerDoc recursively groups smaller regions into larger semantic elements in 2D to infer complex nested hierarchies. Experiments show that our approach outperforms competitive baselines by 10-15% on three diverse datasets of forms and mobile app screen layouts for the tasks of spatial region classification, higher-order group identification, layout hierarchy extraction, reading order detection, and word grouping.
Related Material