MAdVerse: A Hierarchical Dataset of Multi-Lingual Ads From Diverse Sources and Categories

Amruth Sagar, Rishabh Srivastava, Rakshitha R. T., Venkata Kesav Venna, Ravi Kiran Sarvadevabhatla; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 8087-8096

Abstract


The convergence of computer vision and advertising has sparked substantial interest lately. Existing advertisement datasets often derive from subsets of established data with highly specialized annotations or feature diverse annotations without a cohesive taxonomy among ad images. Notably, no datasets encompass diverse advertisement styles or semantic grouping at various levels of granularity for a better understanding of ads. Our work addresses this gap by introducing MAdVerse, an extensive, multilingual compilation of more than 50,000 ads from the web, social media websites and e-newspapers. Advertisements are hierarchically grouped with uniform granularity into 11 categories, divided into 51 sub-categories, and 524 fine-grained brands at leaf level, each featuring ads in various languages. Furthermore, we provide comprehensive baseline classification results for various pertinent prediction tasks within the realm of advertising analysis. Specifically, these tasks include hierarchical ad classification, source classification, multilingual classification and inducing hierarchy in existing ad datasets. The dataset, code and models are available on the project page https://madverse24.github.io/

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Sagar_2024_WACV, author = {Sagar, Amruth and Srivastava, Rishabh and T., Rakshitha R. and Venna, Venkata Kesav and Sarvadevabhatla, Ravi Kiran}, title = {MAdVerse: A Hierarchical Dataset of Multi-Lingual Ads From Diverse Sources and Categories}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {8087-8096} }