Visual Narratives: Large-Scale Hierarchical Classification of Art-Historical Images

Springstein, Matthias; Schneider, Stefanie; Rahnama, Javad; Stalter, Julian; Kristen, Maximilian; Müller-Budack, Eric; Ewerth, Ralph

Matthias Springstein, Stefanie Schneider, Javad Rahnama, Julian Stalter, Maximilian Kristen, Eric Müller-Budack, Ralph Ewerth; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 7220-7230

Abstract

Iconography refers to the methodical study and interpretation of thematic content in the visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, and literary themes, among others. However, given the hierarchical nature and inherent complexity of such a taxonomy, it is highly desirable to use automated methods for (Iconclass-based) image classification. Previous studies either focused narrowly on certain subsets of narratives or failed to exploit Iconclass's hierarchical structure. In this paper, we propose a novel approach for Hierarchical Multi-label Classification (HMC) of iconographic concepts in images. We present three strategies, including Large Language Models (LLMs), for the generation of textual image descriptions using keywords extracted from Iconclass. These descriptions are utilized to pre-train a Vision-Language Model (VLM) based on a newly introduced data set of 477,569 images with more than 20,000 Iconclass concepts, far more than considered in previous studies. Furthermore, we present five approaches to multi-label classification, including a novel transformer decoder that leverages hierarchical information from the Iconclass taxonomy. Experimental results show the superiority of this approach over reasonable baselines.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Springstein_2024_WACV, author = {Springstein, Matthias and Schneider, Stefanie and Rahnama, Javad and Stalter, Julian and Kristen, Maximilian and M\"uller-Budack, Eric and Ewerth, Ralph}, title = {Visual Narratives: Large-Scale Hierarchical Classification of Art-Historical Images}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {7220-7230} }