International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE Audio Deepfake Detection using Deep Learning Techniques
ABSTRACT With the rapid development of speech synthesis and voice conversion technologies, Audio Deepfake has become a se- rious threat to the Automatic Speaker Verification (ASV) sys- tem. Numerous countermeasures are proposed to detect this type of attack. In this paper, we report our efforts to combine the self-supervised WavLM model and Multi-Fusion Atten- tive classifier for audio deepfake detection. Our method ex- ploits the WavLM model to extract features that are more con- ducive to spoofing detection for the first time. Then, we pro- pose a novel Multi-Fusion Attentive (MFA) classifier based on the Attentive Statistics Pooling (ASP) layer. The MFA captures the complementary information of audio features at both time and layer levels. Experiments demonstrate that our methods achieve state-of-the-art results on the ASVspoof 2021 DF set and provide competitive results on the ASVspoof 2019 and 2021 LA set.
AUTHOR PROF. DR. CHETANA PRAKASH, SAGAR K, SYED KHALIL Department of Computer Science and Design, Bapuji Institute of Engineering and Technology, Davangere, India
VOLUME 177
DOI DOI: 10.15680/IJIRCCE.2025.1312149
PDF pdf/149_Audio Deepfake Detection using Deep Learning Techniques.pdf
KEYWORDS
References 1. Z. K. Anjum and R. K. Swamy, “Spoofing and counter- measures for speaker verification: A review,” in 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2017, pp. 467–471.
2. A. Khan, K. M. Malik, J. Ryan, and M. Saravanan, “Voice spoofing countermeasures: Taxonomy, state- of-the-art, experimental analysis of generalizability, open challenges, and the way forward,” ArXiv, vol. abs/2210.00417, 2022.
3. Y. Xie, Z. Zhang, and Y. Yang, “Siamese network with wav2vec feature for spoofing speech detection,” in Proc. Interspeech 2021, 2021, pp. 4269–4273.
4. A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learn- ing of speech representations,” in Advances in Neu- ral Information Processing Systems, 2020, vol. 33, pp. 12449–12460.
5. J. M. Mart´ın-Don˜as and A. A´ lvarez, “The vicomtech audio deepfake detection system based on wav2vec2 for the 2022 add challenge,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Sig- nal Processing (ICASSP), 2022, pp. 9241–9245.
6. X. Wang and J. Yamagishi, “Investigating Self- Supervised Front Ends for Speech Spoofing Counter- measures,” in Proc. The Speaker and Language Recog- nition Workshop (Odyssey 2022), 2022, pp. 100–106.
7. Babu, C. Wang, A. Tjandra, K. Lakhotia, Q. Xu, N. Goyal, K. Singh, P. von Platen, Y. Saraf, J. Pino, A. Baevski, A. Conneau, and M. Auli, “Xls-r: Self- supervised cross-lingual speech representation learning at scale,” in Proc. Interspeech 2022, 2022, pp. 2278–2282.
8. H. Tak, M. Todisco, X. Wang, J. weon Jung, J. Yam- agishi, and N. Evans, “Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation,” in Proc. The Speaker and Lan- guage Recognition Workshop (Odyssey 2022), 2022, pp. 112–119.
9. S. Chen, C. Wang, Z. Chen, Y. Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y. Qian, Y. Qian, J. Wu, M. Zeng, X. Yu, and F. Wei, “Wavlm: Large-scale self-supervised pre- training for full stack speech processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, July 2022.
10. K. Okabe, T. Koshinaka, and K. Shinoda, “Attentive Statistics Pooling for Deep Speaker Embedding,” in Proc. Interspeech 2018, 2018, pp. 2252–2256.
11. B. Desplanques, J. Thienpondt, and K. Demuynck, “Ecapa-tdnn: Emphasized channel attention, propaga- tion and aggregation in tdnn based speaker verification,” in Proc. Interspeech 2020, 2020, pp. 3830–3834.
12. Z. Chen, S. Chen, Y. Wu, Y. Qian, C. Wang, S. Liu, Y. Qian, and M. Zeng, “Large-scale self-supervised speech representation learning for automatic speaker verification,” in ICASSP 2022 - 2022 IEEE Interna- tional Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), 2022, pp. 6147–6151.
13. L. Pepino, P. Riera, and L. Ferrer, “Emotion Recogni- tion from Speech Using wav2vec 2.0 Embeddings,” in Proc. Interspeech 2021, 2021, pp. 3400–3404.
14. S. wen Yang, P.-H. Chi, Y.-S. Chuang, C.-I. J. Lai, K. Lakhotia, Y. Y. Lin, A. T. Liu, J. Shi, X. Chang, G.-T. Lin, T.-H. Huang, W.-C. Tseng, K. tik Lee, D.-R. Liu, Z. Huang, S. Dong, S.-W. Li, S. Watanabe, A. Mo- hamed, and H. yi Lee, “Superb: Speech processing uni- versal performance benchmark,” in Proc. Interspeech 2021, 2021, pp. 1194–1198.
15. J. Yamagishi, X. Wang, M. Todisco, M. Sahidullah, J. Patino, A. Nautsch, X. Liu, K. A. Lee, T. Kinnunen, N. Evans, and H. Delgado, “ASVspoof 2021: accelerat- ing progress in spoofed and deepfake speech detection,” in Proc. 2021 Edition of the Automatic Speaker Verifi- cation and Spoofing Countermeasures Challenge, 2021, pp. 47–54.
16. M. Todisco, X. Wang, V. Vestman, M. Sahidullah,
17. H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. H. Kinnunen, and K. A. Lee, “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” in Proc. Interspeech 2019, 2019, pp. 1008–1012.
18. T. Kinnunen, K. A. Lee, H. Delgado, N. Evans, M. Todisco, M. Sahidullah, J. Yamagishi, and D. A. Reynolds, “t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification,” in Speaker Odyssey 2018 The Speaker and Language Recognition Work- shop, 2018.
19. G. Hua, A. B. J. Teoh, and H. Zhang, “Towards end-to- end synthetic speech detection,” IEEE Signal Process- ing Letters, vol. 28, pp. 1265–1269, 2021.
20. M. Yang, K. Zheng, X. Wang, Y. Sun, and Z. Chen, “Comparative analysis of asv spoofing countermea- sures: Evaluating res2net-based approaches,” IEEE Sig- nal Processing Letters, 2023.
21. Y. Zhang12, W. Wang12, and P. Zhang12, “The effect of silence and dual-band fusion in anti-spoofing system,” in Proc. Interspeech, 2021.
22. H. Tak, J. weon Jung, J. Patino, M. Kamble, M. Todisco, and N. Evans, “End-to-end spectro-temporal graph at- tention networks for speaker verification anti-spoofing and speech deepfake detection,” in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Coun- termeasures Challenge, 2021, pp. 1–8.
23. J.-w. Jung, H.-S. Heo, H. Tak, H.-j. Shim, J. S. Chung, B.-J. Lee, H.-J. Yu, and N. Evans, “Aasist: Audio anti- spoofing using integrated spectro-temporal graph atten- tion networks,” in ICASSP 2022 - 2022 IEEE Interna- tional Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), 2022, pp. 6367–6371.
24. B. Huang, S. Cui, J. Huang, and X. Kang, “Discrim- inative frequency information learning for end-to-end speech anti-spoofing,” IEEE Signal Processing Letters, vol. 30, pp. 185–189, 2023.
image
Copyright © IJIRCCE 2020.All right reserved