FAIR-TAT: Improving Model Fairness using Targeted Adversarial Training

Medi, Tejaswini; Jung, Steffen; Keuper, Margret

Tejaswini Medi, Steffen Jung, Margret Keuper; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 7816-7825

Abstract

Deep neural networks are susceptible to adversarial attacks and common corruptions which undermine their robustness. In order to enhance model resilience against such challenges Adversarial Training (AT) has emerged as a prominent solution. Nevertheless adversarial robustness is often attained at the expense of model fairness during AT i.e. disparity in class-wise robustness of the model. While distinctive classes become more robust towards such adversaries hard to detect classes suffer. Recently research has focused on improving model fairness specifically for perturbed images overlooking the accuracy of the most likely non-perturbed data. Additionally despite their robustness against the adversaries encountered during model training state-of-the-art adversarial trained models have difficulty maintaining robustness and fairness when confronted with diverse adversarial threats or common corruptions. In this work we address the above concerns by introducing a novel approach called Fair Targeted Adversarial Training (FAIR-TAT). We show that using targeted adversarial attacks for adversarial training (instead of untargeted attacks) can allow for more favorable trade-offs with respect to adversarial fairness. Empirical results validate the efficacy of our approach.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Medi_2025_WACV, author = {Medi, Tejaswini and Jung, Steffen and Keuper, Margret}, title = {FAIR-TAT: Improving Model Fairness using Targeted Adversarial Training}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7816-7825} }