What Matters for Ad-Hoc Video Search? A Large-Scale Evaluation on TRECVID

Aozhu Chen, Fan Hu, Zihan Wang, Fangming Zhou, Xirong Li; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 2317-2322

Abstract


For quantifying progress in Ad-hoc Video Search (AVS), the annual TRECVID AVS task is an important international evaluation. Solutions submitted by the task participants vary in terms of their choices of cross-modal matching models, visual features and training data. As such, what one may conclude from the evaluation is at a high level that is insufficient to reveal the influence of the individual components. In order to bridge the gap between the current solution-level comparison and the desired component-wise comparison, we propose in this paper a large-scale and systematic evaluation on TRECVID. By selected combinations of state-of-the-art matching models, visual features and (pre-)training data, we construct a set of 25 different solutions and evaluate them on the TRECVID AVS tasks 2016--2020. The presented evaluation helps answer the key question of what matters for AVS. The resultant observations and learned lessons are also instructive for developing novel AVS solutions.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Chen_2021_ICCV, author = {Chen, Aozhu and Hu, Fan and Wang, Zihan and Zhou, Fangming and Li, Xirong}, title = {What Matters for Ad-Hoc Video Search? A Large-Scale Evaluation on TRECVID}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {2317-2322} }