CNN-based System for Speaker Independent Cell-Phone Identification from Recorded Audio

Vinay Verma, Nitin Khanna; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 53-61

Abstract


This paper proposes a cell-phone identification system independent of speech content as well as the speaker. Audio recorded from a cell-phone contains specific signatures corresponding to that cell-phone. These unique signatures of the cell-phone implicitly captured in the recorded audio can be utilized to identify the cell-phone. These signatures of a cell-phone obtained from the recorded audio are visually more distinct in the frequency domain than in the time domain signal. Thus, by utilizing the distinctiveness of the signatures in the frequency domain and learning capability of the Convolutional Neural Network (CNN), we propose a system which learns unique signatures of the cell-phones from the frequency domain representation of the audio. In particular, we have used the magnitude of the Discrete Fourier Transform (DFT) as the frequency representation of an audio signal. An extensive set of experiments performed on a large duration dataset shows that the proposed system outperforms the existing state-of-the- art systems, notably in the cases where recordings used for training and testing the systems contain mutually exclusive audio content as well as speakers.

Related Material


[pdf]
[bibtex]
@InProceedings{Verma_2019_CVPR_Workshops,
author = {Verma, Vinay and Khanna, Nitin},
title = {CNN-based System for Speaker Independent Cell-Phone Identification from Recorded Audio},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}