Robustness Against Gradient Based Attacks Through Cost Effective Network Fine-Tuning
Adversarial perturbations aim to modify the image pixels in an imperceptible manner such that the CNN classifier misclassifies an image, whereas humans can predict the original class. Several defense algorithms against adversarial attacks are proposed in the literature, such as binary classification which aims to detect adversarial examples, and network retraining using perturbed images. The challenge with the adversarial detection approach is that once the perturbed samples are detected, they are discarded, and the system requires fresh input. On the other hand, adversarial training requires the generation of adversarial images for data augmentation and hence is computationally demanding. It is well known that training a deep CNN architecture is resource-intensive, and therefore retraining again from scratch is not feasible in resource-constrained scenarios. We propose computationally efficient fine-tuning of pre-trained networks to increase their robustness against the prevalent gradient-based attacks. The proposed fine-tuning is performed in a complete black-box fashion, where we do not know the training setting such as optimizer, batch size, and learning rate used in the training of the network. Extensive experiments using multiple CNN architectures such as VGG and ResNet show that the proposed fine-tuning provides significant robustness against various widespread gradient attacks.