Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval

Dong, Jianfeng; Zhang, Minsong; Zhang, Zheng; Chen, Xianke; Liu, Daizong; Qu, Xiaoye; Wang, Xun; Liu, Baolong

Jianfeng Dong, Minsong Zhang, Zheng Zhang, Xianke Chen, Daizong Liu, Xiaoye Qu, Xun Wang, Baolong Liu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 11302-11312

Abstract

Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with short durations. However, in practice, videos are generally untrimmed containing much background content. In this work, we investigate the more practical but challenging Partially Relevant Video Retrieval (PRVR) task, which aims to retrieve partially relevant untrimmed videos with the query input. Particularly, we propose to address PRVR from a new perspective, i.e., distilling the generalization knowledge from the large-scale vision-language pre-trained model and transferring it to a task-specific PRVR network. To be specific, we introduce a Dual Learning framework with Dynamic Knowledge Distillation (DL-DKD), which exploits the knowledge of a large vision-language model as the teacher to guide a student model. During the knowledge distillation, an inheritance student branch is devised to absorb the knowledge from the teacher model. Considering that the large model may be of mediocre performance due to the domain gaps, we further develop an exploration student branch to take the benefits of task-specific information. By jointly training the above two branches in a dual-learning way, our model is able to selectively acquire appropriate knowledge from the teacher model while capturing the task-specific property. In addition, a dynamical knowledge distillation strategy is further devised to adjust the effect of each student branch learning during the training. Experiment results demonstrate that our proposed model achieves state-of-the-art performance on ActivityNet and TVR datasets for PRVR.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Dong_2023_ICCV, author = {Dong, Jianfeng and Zhang, Minsong and Zhang, Zheng and Chen, Xianke and Liu, Daizong and Qu, Xiaoye and Wang, Xun and Liu, Baolong}, title = {Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {11302-11312} }