Multimodal Understanding of Memes with Fair Explanations

Yang Zhong, Bhiman Kumar Baghel; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2007-2017

Abstract


Digital Memes have been widely utilized in people's daily lives over social media platforms. Composed of images and descriptive texts memes are often distributed with the flair of sarcasm or humor yet can also spread harmful content or biases from social and cultural factors. Aside from mainstream tasks such as meme generation and classification generating explanations for memes has become more vital and poses challenges in avoiding propagating already embedded biases. Our work studied whether recent advanced Vision Language Models (VL models) can fairly explain meme contents from different domains/topics contributing to a unified benchmark for meme explanation. With the dataset we semi-automatically and manually evaluate the quality of VL model-generated explanations identifying the major categories of biases in meme explanations.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhong_2024_CVPR, author = {Zhong, Yang and Baghel, Bhiman Kumar}, title = {Multimodal Understanding of Memes with Fair Explanations}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2007-2017} }