Multi-Frame Attention With Feature-Level Warping for Drone Crowd Tracking
Drone crowd tracking has various applications such as crowd management and video surveillance. Unlike in general multi-object tracking, the size of the objects to be tracked are small, and the ground truth is given by a point-level annotation, which has no region information. This causes the lack of discriminative features for finding the same objects from many similar objects. Thus, similarity-based trackingtechniques, which are widely used for multi-object tracking with bounding-box, are difficult to use. To deal with this problem, we take into account the temporal context of the local area. To aggregate temporal context in a local area, we propose a multi-frame attention with feature-level warping. The feature-level warping can align the features of the same object in multiple frame, and then multi-frame attention can effectively aggregate the temporal context from the warped features. The experimental results show the effectiveness of our method. Our method outperformed the state-of-the-art method in DroneCrowd dataset.