Faster, Lighter, Robuster: A Weakly-Supervised Crowd Analysis Enhancement Network and a Generic Feature Extraction Framework
With bounding box labels needed for training, object detection is viewed unfavorably in terms of crowd analysis, due to the intensive labor for labeling and the unsatisfactory performance in clutters and severe occlusions. Another feasible method, density-based regression, despite its proficiency in counting and only point-level labels used for training, cannot get the location of each person, and the time and space consumption is relatively high. In this paper, we propose a generic feature extraction framework, Adaptive Pyramid Score (APS), based on object detection and designed specifically for extracting quantitative and spatial-semantic features. Moreover, as an intuitive and feasible solution regarding crowd analysis, we propose the weakly-supervised Confidence-Threshold-Foresight Network (CTFNet) under our APS feature extraction framework, which only needs count-level labels for training and improves the performance of various methods dramatically. Our system realizes the triple enhancement of counting, localization, and detection, which is also proved to be faster than advanced crowd analysis methods, lighter to be transplanted to various object detection methods, and robuster to tackle tasks of extreme scenes. Furthermore, the weakly-supervised paradigm leverage the intensive labor for labeling profoundly.