-
[pdf]
[bibtex]@InProceedings{Pirhadi_2023_ICCV, author = {Pirhadi, Mohammad Javad and Mirzaei, Motahhare and Eetemadi, Sauleh}, title = {Just Ask Plus: Using Transcripts for VideoQA}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3082-3085} }
Just Ask Plus: Using Transcripts for VideoQA
Abstract
Social-IQ 2.0 challenge is designed to benchmark recent AI technologies' skills to reason about social interactions, which is referred as Artificial Social Intelligence in the form of a VideoQA task. In this work, we use Just Ask and SpeechT5 models as feature extractors, and reason by adding one attention layer and two transformer encoders. Our best configuration reaches 53.35% accuracy on the validation set. The code is publicly available on GitHub.
Related Material