Detecting Human-Object Interaction With Mixed Supervision

Suresh Kirthi Kumaraswamy, Miaojing Shi, Ewa Kijak; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 1228-1237


Human object interaction (HOI) detection is an important task in image understanding and reasoning. It is in a form of HOI triplet<human,verb,object> , requiring bounding boxes for humans and objects, and action be-tween them for the task completion. In other words, this task requires strong supervision for training, which is how-ever hard to procure. A natural solution to overcome this is to pursue weakly-supervised learning, where we only know the presence of certain HOI triplets in images but their ex-act location is unknown. Most weakly-supervised learning methods do not make provision for leveraging data with strong supervision, when they are available; and indeed a naive combination of this two paradigms in HOI detection fails to make contributions to each other. In this regard we propose a mixed-supervised HOI detection pipeline: thanks to a specific design of momentum-independent learning, it learns seamlessly across these two types of supervision. Moreover, in light of the annotation insufficiency in mixed supervision, we introduce an HOI element swap-ping technique to synthesize diverse and hard negatives across images and improve the robustness of the model. Our method is evaluated on the challenging HICO-DET dataset. It outperforms the state of the art weakly- and fully-supervised methods under the same setting; and performs close to or even better than many fully-supervised methods by using a mixed amount of full and weak supervision.

Related Material

[pdf] [arXiv]
@InProceedings{Kumaraswamy_2021_WACV, author = {Kumaraswamy, Suresh Kirthi and Shi, Miaojing and Kijak, Ewa}, title = {Detecting Human-Object Interaction With Mixed Supervision}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2021}, pages = {1228-1237} }