Favoring One Among Equals - Not a Good Idea: Many-to-One Matching for Robust Transformer Based Pedestrian Detection

K.N. Ajay Shastry, K. Ravi Sri Teja, Aditya Nigam, Chetan Arora; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 759-768

Abstract


We investigate the reasons for lower performance of transformer based pedestrian detection models compared to convolutional neural network (CNN) based ones. CNN models generate dense pedestrian proposals, refine each proposal individually, and follow it up with non-maximal-suppression (NMS) to generate sparse predictions. In contrast, transformer models select one proposal per ground-truth (GT) pedestrian box and backpropagate positive gradient from them. All other proposals, many of them highly similar to the selected ones, are passed negative gradient. Though this leads to sparse predictions, obviating the need of NMS, the arbitrary selection of one among many similar proposals, hinders effective training, and lower accuracy of pedestrian detection. To mitigate the problem, instead of commonly used Kuhn-Munkres matching algorithm, we propose Min-cost-flow based formulation, and incorporate constraints such as, each ground truth box is matched to atleast one proposal, and many equally good proposals can be matched to a single ground truth box. We propose first transformer based pedestrian detection model incorporating our matching algorithm. Extensive experiments reveal that our approach achieves a miss rate (lower is better) of 3.7 / 17.4 / 21.8 / 8.3 / 2.0 on Eurocity / TJU-traffic / TJU-campus / Cityperson / Caltech datasets compared to 4.7 / 18.7 / 24.8 / 8.5 / 3.1 by the current SOTA. Code is available at https://ajayshastry08.github.io/flow_matcher

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Shastry_2024_WACV, author = {Shastry, K.N. Ajay and Teja, K. Ravi Sri and Nigam, Aditya and Arora, Chetan}, title = {Favoring One Among Equals - Not a Good Idea: Many-to-One Matching for Robust Transformer Based Pedestrian Detection}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {759-768} }