True Black-Box Explanation in Facial Analysis
When explaining a recognition approach that can be used in facial analysis, e.g, face verification, face detection, attribute recognition, etc., the task is to answer: how relevant are the parts of a given image to establish the recognition. In many cases, however, the trained models cannot be manipulated and must be treated as "black-boxes". In this paper, we present a saliency map methodology, called MinPlus, that can be used to explain any facial analysis approach with no manipulation inside of the recognition model, because it only needs the input-output function of the black-box fx. The key idea of the method is based on how the probability of recognition of the given image changes when it is perturbed. Our method removes and aggregates different parts of the image, and measures contributions of these parts individually and in-collaboration as well. We test and compare our method in four different scenarios: face verification (with ArcFace), face expression recognition (with Xception), face detection (with MTCNN) and masked face detection (with YOLOv5s). We conclude that MinPlus achieves saliency maps that are stable and interpretable to humans. In addition, our method shows promising results in comparison with other state-of-the-art methods like AVG, LIME and RISE.