Order-Prompted Tag Sequence Generation for Video Tagging

Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Yingmin Luo, Zekun Li, Chunfeng Yuan, Bing Li, Xiaohu Qie, Ying Shan, Weiming Hu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 15681-15690

Abstract


Video Tagging intends to infer multiple tags spanning relevant content for a given video. Typically, video tags are freely defined and uploaded by a variety of users, so they have two characteristics: abundant in quantity and disordered intra-video. It is difficult for the existing multi-label classification and generation methods to adapt directly to this task. This paper proposes a novel generative model, Order-Prompted Tag Sequence Generation (OP-TSG), according to the above characteristics. It regards video tagging as a tag sequence generation problem guided by sample-dependent order prompts. These prompts are semantically aligned with tags and enable to decouple tag generation order, making the model focus on modeling the tag dependencies. Moreover, the word-based generation strategy enables the model to generate novel tags. To verify the effectiveness and generalization of the proposed method, a Chinese video tagging benchmark CREATE-tagging, and an English image tagging benchmark Pexel-tagging are established. Extensive results show that OP-TSG is significantly superior to other methods, especially the results on rare tags improve by 3.3% and 3% over SOTA methods on CREATE-tagging and Pexel-tagging, and novel tags generated on CREATE-tagging exhibit a tag gain of 7.04%.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Ma_2023_ICCV, author = {Ma, Zongyang and Zhang, Ziqi and Chen, Yuxin and Qi, Zhongang and Luo, Yingmin and Li, Zekun and Yuan, Chunfeng and Li, Bing and Qie, Xiaohu and Shan, Ying and Hu, Weiming}, title = {Order-Prompted Tag Sequence Generation for Video Tagging}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {15681-15690} }