Multi-Sensor Ensemble-Guided Attention Network for Aerial Vehicle Perception Beyond Visible Spectrum
Researchers from different market domains have made significant developments in Artificial Intelligence (AI) enabling more advanced automated sensing systems and, thus, eliminating the need for the time-consuming manual analysis of data, which is prone to human errors. However, successful deployment of such systems in real world applications requires careful design and analysis of the proposed models. This work focuses on perception done on Unmanned Aerial Vehicles (UAV) using multi-task learning. There are multiple challenges when considering such platforms. First of all, they often operate in difficult and dynamic conditions affected by various factors, such as background noises, ego-noise of the motors and occluded views. At the same time, they require high performance local compute, co-designed with optimized software solutions that meet small size, weight, and power (SWaP) requirements. Therefore, the AI models designed for such systems should not introduce computational and memory overheads to allow for real time processing at the embedded edge. Taking this into account, this work proposes a novel neural network-based system that utilizes ensemble-guided modulations of audio path fused with the infrared (IR) visual embedding using the attention mechanism. The ensemble mechanism doesn't require spawning new ensemble members, but instead operates on FiLM (Feature-wise Linear Modulation) activation, making it suitable for resource-constraints embedded edge platforms. The performed experiments show that the proposed network outperforms a single FiLM network by 15% and is more robust to noise.