Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li, Xingao Li, Changxing Ding, Xiangmin Xu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 28191-28201

Abstract


Detecting human-object interaction (HOI) has long been limited by the amount of supervised data available. Recent approaches address this issue by pre-training according to pseudo-labels which align object regions with HOI triplets parsed from image captions. However pseudo-labeling is tricky and noisy making HOI pre-training a complex process. Therefore we propose an efficient disentangled pre-training method for HOI detection (DP-HOI) to address this problem. First DP-HOI utilizes object detection and action recognition datasets to pre-train the detection and interaction decoder layers respectively. Then we arrange these decoder layers so that the pre-training architecture is consistent with the downstream HOI detection task. This facilitates efficient knowledge transfer. Specifically the detection decoder identifies reliable human instances in each action recognition dataset image generates one corresponding query and feeds it into the interaction decoder for verb classification. Next we combine the human instance verb predictions in the same image and impose image-level supervision. The DP-HOI structure can be easily adapted to the HOI detection task enabling effective model parameter initialization. Therefore it significantly enhances the performance of existing HOI detection models on a broad range of rare categories. The code and pre-trained weight are available at https://github.com/xingaoli/DP-HOI.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Li_2024_CVPR, author = {Li, Zhuolong and Li, Xingao and Ding, Changxing and Xu, Xiangmin}, title = {Disentangled Pre-training for Human-Object Interaction Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {28191-28201} }