Pix2Poly: A Sequence Prediction Method for End-to-End Polygonal Building Footprint Extraction from Remote Sensing Imagery

Yeshwanth Kumar Adimoolam, Charalambos Poullis, Melinos Averkiou; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 8473-8482

Abstract


Extraction of building footprint polygons from remotely sensed data is essential for several urban understanding tasks such as reconstruction navigation & mapping. Despite significant progress in the area extracting accurate polygonal vector building footprints remains an open problem. In this paper we introduce Pix2Poly an attention-based end-to-end trainable & differentiable deep neural network capable of directly generating explicit high-quality building footprints in a ring graph format. Pix2Poly employs a generative encoder-decoder transformer to produce a sequence of graph vertex tokens whose connectivity information is learned by an optimal matching network. Compared to previous graph learning methods ours is a truly end-to-end trainable approach that extracts high-quality building footprints & road networks without requiring complicated computationally intensive raster loss functions & intricate training pipelines. Upon evaluating Pix2Poly on several complex & challenging datasets we report that Pix2Poly outperforms state-of-the-art methods in several vector shape quality metrics while being an entirely explicit method. Our code is available at https://github.com/yeshwanth95/Pix2Poly.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Adimoolam_2025_WACV, author = {Adimoolam, Yeshwanth Kumar and Poullis, Charalambos and Averkiou, Melinos}, title = {Pix2Poly: A Sequence Prediction Method for End-to-End Polygonal Building Footprint Extraction from Remote Sensing Imagery}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8473-8482} }