External Commonsense Knowledge as a Modality for Social Intelligence Question-Answering

Sanika Natu, Shounak Sural, Sulagna Sarkar; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 3044-3050

Abstract


Artificial Social Intelligence (ASI) refers to the perception and understanding of social interactions. It involves the usage of contextual information about social cues to perform tasks such as Question-Answering (QA) in social situations. In this work, the social intelligence-based Social-IQ dataset consisting of videos with visual, audio, and textual modalities is used for QA in such social contexts. Our approach involves the incorporation of external commonsense knowledge to deal with the lack of reasoning in multimodal machine learning models in the context of question answering. In this work, we use Commonsense Transformers (COMET) to generate contextual information from the textual modality along VisualCOMET for the visual modality. These are incorporated into our model to improve binary QA accuracy over state-of-the-art methods and highlight the need for commonsense understanding in question-answering tasks.

Related Material


[pdf]
[bibtex]
@InProceedings{Natu_2023_ICCV, author = {Natu, Sanika and Sural, Shounak and Sarkar, Sulagna}, title = {External Commonsense Knowledge as a Modality for Social Intelligence Question-Answering}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3044-3050} }