The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning

Virat Shejwalkar, Lingjuan Lyu, Amir Houmansadr; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 4730-4740

Abstract


Semi-supervised learning (SSL) is gaining popularity as it reduces cost of machine learning (ML) by training high performance models using unlabeled data. In this paper, we reveal that the key feature of SSL, i.e., learning from (non-inspected) unlabeled data, exposes SSL to strong poisoning attacks that can significantly damage its security. Poisoning is a long-standing problem in conventional supervised ML, but we argue that, as SSL relies on non-inspected unlabeled data, poisoning poses a more significant threat to SSL. We demonstrate this by designing a backdoor poisoning attack on SSL that can be conducted by a weak adversary with no knowledge of the target SSL pipeline. This is unlike prior poisoning attacks on supervised ML that assume strong adversaries with impractical capabilities. We show that by poisoning only 0.2% of the unlabeled training data, our (weak) adversary can successfully cause misclassification on more than 80% of test inputs (when they contain the backdoor trigger). Our attack remains effective across different benchmark datasets and SSL algorithms, and even circumvents state-of-the-art defenses against backdoor attacks. Our work raises significant concerns about the security of SSL in real-world security critical applications.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Shejwalkar_2023_ICCV, author = {Shejwalkar, Virat and Lyu, Lingjuan and Houmansadr, Amir}, title = {The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {4730-4740} }