Training Multi-Object Detector by Estimating Bounding Box Distribution for Input Image

Jaeyoung Yoo, Hojun Lee, Inseop Chung, Geonseok Seo, Nojun Kwak; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 3437-3446

Abstract


In multi-object detection using neural networks, the fundamental problem is, "How should the network learn a variable number of bounding boxes in different input images?". Previous methods train a multi-object detection network through a procedure that directly assigns the ground truth bounding boxes to the specific locations of the network's output. However, this procedure makes the training of a multi-object detection network too heuristic and complicated. In this paper, we reformulate the multi-object detection task as a problem of density estimation of bounding boxes. Instead of assigning each ground truth to specific locations of network's output, we train a network by estimating the probability density of bounding boxes in an input image using a mixture model. For this purpose, we propose a novel network for object detection called Mixture Density Object Detector (MDOD), and the corresponding objective function for the density-estimation-based training. We applied MDOD to MS COCO dataset. Our proposed method not only deals with multi-object detection problems in a new approach, but also improves detection performances through MDOD. The code is available: https://github.com/yoojy31/MDOD.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yoo_2021_ICCV, author = {Yoo, Jaeyoung and Lee, Hojun and Chung, Inseop and Seo, Geonseok and Kwak, Nojun}, title = {Training Multi-Object Detector by Estimating Bounding Box Distribution for Input Image}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {3437-3446} }