Towards Black-Box Explainability With Gaussian Discriminant Knowledge Distillation
In this paper, we propose a method for post-hoc explainability of black-box models. The key component of the semantic and quantitative local explanation is a knowledge distillation (KD) process which is used to mimic the teacher's behavior by means of an explainable generative model. Therefore, we introduce a Concept Probability Density Encoder (CPDE) in conjunction with a Gaussian Discriminant Decoder (GDD) to describe the contribution of high-level concepts (e.g. object parts, color, shape). We argue that our objective function encourages both, an explanation given by a set of likelihood ratios and a measure to describe how far the explainer deviates from the training data distribution of the concepts. The method can leverage any pre-trained concept classifier that admits concept scores (e.g. logits) or probabilities. We demonstrate the effectiveness of the proposed method in the context of object detection utilizing the DensePose dataset.