Just Ask Plus: Using Transcripts for VideoQA

Mohammad Javad Pirhadi, Motahhare Mirzaei, Sauleh Eetemadi; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 3082-3085

Abstract


Social-IQ 2.0 challenge is designed to benchmark recent AI technologies' skills to reason about social interactions, which is referred as Artificial Social Intelligence in the form of a VideoQA task. In this work, we use Just Ask and SpeechT5 models as feature extractors, and reason by adding one attention layer and two transformer encoders. Our best configuration reaches 53.35% accuracy on the validation set. The code is publicly available on GitHub.

Related Material


[pdf]
[bibtex]
@InProceedings{Pirhadi_2023_ICCV, author = {Pirhadi, Mohammad Javad and Mirzaei, Motahhare and Eetemadi, Sauleh}, title = {Just Ask Plus: Using Transcripts for VideoQA}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3082-3085} }