ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection

Taiba Majid Wani, Reeva Gulzar, Irene Amerini; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2464-2472


In response to the escalating challenge of audio deepfake detection this study introduces ABC-CapsNet (Attention-Based Cascaded Capsule Network) a novel architecture that merges the perceptual strengths of Mel spectrograms with the robust feature extraction capabilities of VGG18 enhanced by a strategically placed attention mechanism. This architecture pioneers the use of cascaded capsule networks to delve deeper into complex audio data patterns setting a new standard in the precision of identifying manipulated audio content. Distinctively ABC-CapsNet not only addresses the inherent limitations found in traditional CNN models but also showcases remarkable effectiveness across diverse datasets. The proposed method achieved an equal error rate EER of 0.06% on the ASVspoof2019 dataset and an EER of 0.04% on the FoR dataset underscoring the superior accuracy and reliability of the proposed system in combating the sophisticated threat of audio deepfakes.

Related Material

@InProceedings{Wani_2024_CVPR, author = {Wani, Taiba Majid and Gulzar, Reeva and Amerini, Irene}, title = {ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2464-2472} }