Hierarchical Semantic Segmentation with Autoregressive Language Modeling

Josh Myers-Dean, Brian Price, Yifei Fan, Danna Gurari; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 4129-4139

Abstract


Hierarchical semantic segmentation entails progressively decomposing objects into smaller nested parts. Existing approaches either require multiple inference passes or multiple, fixed decoders. We instead introduce HALLUMI, an autoregressive language modeling framework that performs the task in one inference pass, relying on special tokens to indicate parent-child relationships so the hierarchy can be recovered from the generated text. Experiments on a hierarchical semantic segmentation dataset to the subpart-level (SPIN) show HALLUMI achieves state-of-the-art results.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Myers-Dean_2025_CVPR, author = {Myers-Dean, Josh and Price, Brian and Fan, Yifei and Gurari, Danna}, title = {Hierarchical Semantic Segmentation with Autoregressive Language Modeling}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {4129-4139} }