Instance Search via Fusing Hierarchical Multi-Level Retrieval and Human-Object Interaction Detection

Wenhao Yang, Yinan Song, Zhicheng Zhao, Fei Su; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 2323-2327

Abstract


Aiming to retrieve specific persons with specific actions, instance-based video search (INS) has attracted rising attention with the development of video understanding. In this paper, a novel hierarchical multi-task INS retrieval framework is proposed. Firstly, a multi-level action recognition framework and a face matching scheme are introduced to obtain initial action and person retrieval scores separately. In particular, a novel graph-based human-object interaction (HOI) detection model, named interaction-centric graph parsing network (iCGPN), is proposed to recognize interactions between human and objects. Secondly, an improved query extension strategy is adopted to re-rank the initial person retrieval results. Thirdly, more elaborate action features are extracted to recognize complicated actions. Finally, a specially designed fusion strategy is used to integrate the retrieval results of persons and actions to generate the final INS ranking list. The experimental results show the effectiveness of the proposed framework.

Related Material


[pdf]
[bibtex]
@InProceedings{Yang_2021_ICCV, author = {Yang, Wenhao and Song, Yinan and Zhao, Zhicheng and Su, Fei}, title = {Instance Search via Fusing Hierarchical Multi-Level Retrieval and Human-Object Interaction Detection}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {2323-2327} }