Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents

Yessine Khanfir, Marwa Dhiaf, Emna Ghodhbani, Ahmed Cheikh Rouhou, Yousri Kessentini; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 504-512

Abstract


Automating Information Extraction (IE) from handwritten documents is a challenging task due to the wide variety of handwriting styles, the presence of noise, and the lack of labeled data. In this work, we propose an end-to-end encoder-decoder model, that incorporates transformers and Graph Convolutional Networks (GCN), to jointly perform Handwritten Text Recognition (HTR) and Named Entity Recognition (NER). The proposed architecture is mainly composed of two parts: a Sparse Graph Transformer Encoder (SGTE), to capture efficient representations of input text images while controlling the propagation of information through the model. The SGTE is followed by a transformer decoder enhanced with a GCN that combines the outputs of the last SGTE layer and the Multi-Head Attention (MHA) block to reinforce the alignment of visual features to characters and Named Entity (NE) tags, resulting in more robust learned representations. The proposed model shows promising results and achieves state-of-the-art performance on the IAM dataset, and in the ICDAR 2017 Information Extraction competition using the Esposalles database.

Related Material


[pdf]
[bibtex]
@InProceedings{Khanfir_2024_WACV, author = {Khanfir, Yessine and Dhiaf, Marwa and Ghodhbani, Emna and Rouhou, Ahmed Cheikh and Kessentini, Yousri}, title = {Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {504-512} }