Semantic Segmentation Using Foundation Models for Cultural Heritage: an Experimental Study on Notre-Dame de Paris

Kévin Réby, Anaïs Guilhelm, Livio De Luca; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 1689-1697

Abstract


The zero-shot performance of foundation models has captured a lot of attention. Specifically, the Segment Anything Model (SAM) has gained popularity in computer vision due to its label-free segmentation capabilities. Our study proposes using SAM on cultural heritage data, specifically images of Notre-Dame de Paris, with a controlled vocabulary. SAM can successfully identify objects within the cathedral. To further improve segmentation, we utilized Grounding DINO to detect objects and CLIP to automatically add labels from the segmentation masks generated by SAM. Our study demonstrates the usefulness of foundation models for zero-shot semantic segmentation of cultural heritage data.

Related Material


[pdf]
[bibtex]
@InProceedings{Reby_2023_ICCV, author = {R\'eby, K\'evin and Guilhelm, Ana{\"\i}s and De Luca, Livio}, title = {Semantic Segmentation Using Foundation Models for Cultural Heritage: an Experimental Study on Notre-Dame de Paris}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {1689-1697} }