Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness

Guangzhi Wang, Yangyang Guo, Ziwei Xu, Mohan Kankanhalli; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 27970-27980

Abstract


Human-Object Interaction (HOI) Detection constitutes an important aspect of human-centric scene understanding which requires precise object detection and interaction recognition. Despite increasing advancement in detection recognizing subtle and intricate interactions remains challenging. Recent methods have endeavored to leverage the rich semantic representation from pre-trained CLIP yet fail to efficiently capture finer-grained spatial features that are highly informative for interaction discrimination. In this work instead of solely using representations from CLIP we fill the gap by proposing a spatial adapter that efficiently utilizes the multi-scale spatial information in the pre-trained detector. This leads to a bilateral adaptation that mutually produces complementary features. To further improve interaction recognition under occlusion which is common in crowded scenarios we propose an Occluded Part Extrapolation module that guides the model to recover the spatial details from manually occluded feature maps. Moreover we design a Conditional Contextual Mining module that further mines informative contextual clues from the spatial features via a tailored cross-attention mechanism. Extensive experiments on V-COCO and HICO-DET benchmarks demonstrate that our method significantly outperforms prior art on both standard and zero-shot settings resulting in new state-of-the-art performance. Additional ablation studies further validate the effectiveness of each component in our method.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Wang_2024_CVPR, author = {Wang, Guangzhi and Guo, Yangyang and Xu, Ziwei and Kankanhalli, Mohan}, title = {Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {27970-27980} }