Interactive Video Retrieval With Dialog

Sho Maeoki, Kohei Uehara, Tatsuya Harada; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 952-953


In the contemporary world, recording videos can be done quickly and easily. The quantity and availability of videos have continued to increase, therefore, an effective video retrieval method has also become important. To retrieve a target video from a large collection of videos, a video retrieval system needs to obtain appropriate queries from a user. Given a sentence query, there are many similar videos related to the query. The video retrieval system requires more information in addition to the sentence to distinguish the target video from others. If the system actively collects more information on the target video, we can perform video retrieval effectively. Thus, we propose a system to retrieve videos by asking questions about the content of the videos, and leveraging the user's responses to the questions and the dialog history. Additionally, we confirmed the usefulness of the proposed system through experiments using the dataset called AVSD which includes videos and dialogs about the videos.

