RL-CAM: Visual Explanations for Convolutional Networks Using Reinforcement Learning
Convolutional Neural Networks (CNNs) are state-of-the-art models for computer vision tasks such as image classification, object detection, and segmentation. However, these models suffer from their inability to explain decisions, particularly in fields like healthcare and security, where interpretability is critical. Previous research has developed various methods for interpreting CNNs, including visualization-based approaches (e.g., saliency maps) that aim to reveal the underlying features used by the model to make predictions. In this work, we propose a novel approach that uses reinforcement learning to generate a visual explanation for CNNs. Our method considers the black-box CNN model and relies solely on the probability distribution of the model's output to localize the features contributing to a particular prediction. The proposed reinforcement learning algorithm has an agent with two actions, a forward action that explores the input image and identifies the most sensitive region to generate a localization mask and a reverse action that fine-tunes the localization mask. We evaluate the performance of our approach using multiple image segmentation metrics and compare it with existing visualization-based methods. The experimental results demonstrate that our proposed method outperforms the existing techniques, producing more accurate localization masks of regions of interest in the input images.