Probing for Artifacts: Detecting Imagenet Model Evasions

Jeremiah Rounds, Addie Kingsland, Michael J. Henry, Kayla R. Duskin; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 790-791


While deep learning models have made incredible progress across a variety of machine learning tasks, they remain vulnerable to adversarial examples crafted to fool otherwise trustworthy models. Previous work has proposed examining the internal activation of Imagenet models to detect adversarial examples. Our work expands the scale and scope of previous research by simultaneously probing every activation within an Imagenet model using a novel probe block. This probe block model is trained against multiple adversarial algorithms to create a more robust detector. Parameterization of the probe block and adversarial classification networks that utilize probe block output are examined in an ablation experiment with probes of Resnet-50, Inception-v3 and Xception. Considered adversarial classification networks include examples built with Mobilenet-v2 which is shown to be better than a VGG alternative for detecting adversarial artifacts. Results are compared to logistic regression feature squeezing results, which we suggest is an improvement to feature squeezing.

Related Material

[pdf] [supp]
author = {Rounds, Jeremiah and Kingsland, Addie and Henry, Michael J. and Duskin, Kayla R.},
title = {Probing for Artifacts: Detecting Imagenet Model Evasions},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}