Towards Precise End-to-End Weakly Supervised Object Detection Network

Ke Yang, Dongsheng Li, Yong Dou; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 8372-8381


It is challenging for weakly supervised object detection network to precisely predict the positions of the objects, since there are no instance-level category annotations. Most existing methods tend to solve this problem by using a two-phase learning procedure, i.e., multiple instance learning detector followed by a fully supervised learning detector with bounding-box regression. Based on our observation, this procedure may lead to local minima for some object categories. In this paper, we propose to jointly train the two phases in an end-to-end manner to tackle this problem. Specifically, we design a single network with both multiple instance learning and bounding-box regression branches that share the same backbone. Meanwhile, a guided attention module using classification loss is added to the backbone for effectively extracting the implicit location information in the features. Experimental results on public datasets show that our method achieves state-of-the-art performance.

Related Material

author = {Yang, Ke and Li, Dongsheng and Dou, Yong},
title = {Towards Precise End-to-End Weakly Supervised Object Detection Network},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}