IR Reasoner: Real-Time Infrared Object Detection by Visual Reasoning
Thermal Infrared (IR) imagery is utilized in several applications due to their unique properties. However, there are a number of challenges, such as small target objects, image noise, lack of textural information, and background clutter, negatively affecting detection of objects in IR images. Current real-time object detection methods treat each image region separately and, in face of these challenges, this sole dependency on feature maps extracted by convolutional layers is not ideal. In this paper, we introduce a new architecture for real-time object detection in IR images by reasoning the relations between image regions by using self-attention. The proposed method, IR Reasoner, takes the spatial and semantic coherency between image regions into account to enhance the feature maps. We integrated this approach into the current state-of-the-art one-stage object detectors YOLOv4, YOLOR, and YOLOv7, and trained them from scratch on the FLIR ADAS dataset. Experimental evaluations show that the Reasoner variants perform better than the baseline models while still running in real-time. Our best performing Reasoner model YOLOv7-W6-Reasoner achieves 40.5% AP at 32.7 FPS. The code is publicly available.