Relation Parsing Neural Network for Human-Object Interaction Detection

Penghao Zhou, Mingmin Chi; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 843-851

Abstract


Human-Object Interaction Detection devotes to infer a triplet < human, verb, object > between human and objects. In this paper, we propose a novel model, i.e., Relation Parsing Neural Network (RPNN), to detect human-object interactions. Specifically, the network is represented by two graphs, i.e., Object-Bodypart Graph and Human-Bodypart Graph. Here, the Object-Bodypart Graph dynamically captures the relationship between body parts and the surrounding objects. The Human-Bodypart Graph infers the relationship between human and body parts, and assembles body part contexts to predict actions. These two graphs are associated through an action passing mechanism. The proposed RPNN model is able to implicitly parse a pairwise relation in two graphs without supervised labels. Experiments conducted on V-COCO and HICO-DET datasets confirm the effectiveness of the proposed RPNN network which significantly outperforms state-of-the-art methods.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhou_2019_ICCV,
author = {Zhou, Penghao and Chi, Mingmin},
title = {Relation Parsing Neural Network for Human-Object Interaction Detection},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}