Journal Press India®

QLSTM-based Joint-Training for Noise Robust Hindi Speech Recognition

Vol 1 , Issue 1 , January - June 2021 | Pages: 13-21 | Research Paper  

https://doi.org/10.17492/computology.v1i1.2102


Author Details ( * ) denotes Corresponding author

1. * Ankit Kumar, Department of Computer Science & Information Technology, KIET Group of Institution, Ghaziabad, Uttar Pradesh, India (anketvit@gmail.com)

In recent years, the field of speech recognition has benefited more from deep learning. The substantial improvement was reported by current technology; how- ever, speech recognition did not work well in a noisy environment. Improving speech recognition in noisy conditions is a critical task. The goal of this work is to propose a high accuracy noise-robust Hindi speech recognition system. In this series, we apply Bi-directional Quaternion Long-Short-Term Memory (QLSTM) neural network to train the speech enhancement and speech recognition model jointly. The role of the i-vector and Recurrent Neural Network (RNN) language model is also investigated. Using a 2.5-hour Hindi speech dataset and the Kaldi and Pytorch-Kaldi toolkit, all of the experiments were carried out. The proposed model reports the 2% Word Error Rate (WER) reduction over the state-of-the-art (SOTA) techniques.

Keywords

Quaternion Neural Network; Joint-training; Hindi Speech Recognition; Noise-Robusr ASR


  1. M. Brandstein and D. Ward, Microphone arrays, 2002.

  2. M. Dua, R. K. Aggarwal and M. Biswas, Performance evaluation of Hindi speech recognition sys- tem using optimized filter banks, Engineering Science and Technology, an International Journal, 21 (2018), 389–398.

  3. M. Dua, R. K. Aggarwal and M. Biswas, GFCC based discriminatively trained noise robust continuous ASR system for Hindi language, Journal of Ambient Intelligence and Humanized Computing, 10 (2019), 2301–2314.

  4. T. Gao, J. Du, L.-R. Dai and C.-H. Lee, Joint training of front-end and back-end deep neural networks for robust speech recognition, in 2015 IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 4375–4379.

  5. F. Ge, K. Li, B. Wu, S. M. Siniscalchi, Y. Yan and C.-H. Lee, Joint training of multi-channel- condition de-reverberation and acoustic modeling of microphone array speech for robust distant speech recognition in Interspeech, 2017, 3847–3851.

  6. I. Goodfellow, Y. Bengio and A. Courville, Deep learning, MIT press, 2016.

  7. E. Hänsler and G. Schmidt, Speech and audio processing in adverse environments, Springer Science & Business Media, 2008.

  8. J. Hu and J. Wang, Global stability of complex- valued recurrent neural networks with time-delays, IEEE Transactions on Neural Networks and Learning Systems, 23 (2012), 853–865.

  9. S. Kriman, S. Beliaev, B. Ginsburg, J. Huang, O. Kuchaiev, V. Lavrukhin, R. Leary, J. Li and Y. Zhang, Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2020, 6124–6128.

  10. S. Makino, T.-W. Lee and H. Sawada, Blind speech separation, vol. 615, Springer, 2007.

  11. L. R. Medsker and L. Jain, Recurrent neural net- works, Design and Applications, 5.

  12. T. Parcollet, M. Morchid and G. Linarès, A survey of quaternion neural networks, Artificial Intelligence Review, 53 (2020), 2957–2982.

  13. T. Parcollet, M. Morchid, G. Linarès and R. De Mori, Bidirectional quaternion long short- term memory recurrent neural networks for speech recognition, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, 8519–8523.

  14. T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, C. Trabelsi, R. De Mori and Y. Bengio, Quater- nion recurrent neural networks, arXiv preprint arXiv:1806.04418.

  15. D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., The kaldi speech recognition toolkit, in IEEE 2011 workshop on automatic speech recognition and understanding, CONF, IEEE Signal Processing Society, 2011.

  16. M. Ravanelli and Y. Bengio, Interpretable convolutional filters with sincnet, arXiv preprint arXiv:1811.09725.

  17. M. Ravanelli, P. Brakel, M. Omologo and Y. Ben- gio, Batch-normalized joint training for dnn-based distant speech recognition, in 2016 IEEE Spoken Language Technology Workshop (SLT), IEEE, 2016, 28–34.

  18. M. Ravanelli, P. Brakel, M. Omologo and Y. Ben- gio, Light gated recurrent units for speech recognition, IEEE Transactions on Emerging Topics in Computational Intelligence, 2 (2018), 92–102.

  19. M. Ravanelli, T. Parcollet and Y. Bengio, The pytorch-kaldi speech recognition toolkit, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, 6465–6469.

  20. V. Roger, J. Farinas and J. Pinquier, Deep neural networks for automatic speech processing: A survey from large corpora to limited data, arXiv preprint arXiv:2003.04241.

  21. K. Samudravijaya, P. Rao and S. Agrawal, Hindi speech database, in Sixth International Conference on Spoken Language Processing, 2000.

  22. Y. Shangguan, J. Li, L. Qiao, R. Alvarez and I. McGraw, Optimizing speech recognition for the edge, arXiv preprint arXiv:1909.12408.

  23. J. Song and Y. Yam, Complex recurrent neural net- work for computing the inverse and pseudo-inverse of the complex matrix, Applied mathematics and computation, 93 (1998), 195–205.

  24. Z.-Q. Wang and D. Wang, A joint training framework for robust automatic speech recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (2016), 796–806.

  25. F. Weninger, H. Erdogan, S. Watanabe, E. Vincent,

  26. J. Le Roux, J. R. Hershey and B. Schuller, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, in International Conference on Latent Variable Analysis and Signal Separation, Springer, 2015, 91–99.

  27. D. Yu and L. Deng, Automatic Speech Recognition, Springer, 2016.



  28.  
  29.  
Abstract Views: 102
PDF Views: 100

Advanced Search

News/Events

Indira School of Bus...

Indira School of Mangement Studies PGDM, Pune Organizing Internatio...

Indira Institute of ...

Indira Institute of Management, Pune Organizing International Confe...

D. Y. Patil Internat...

D. Y. Patil International University, Akurdi-Pune Organizing Nation...

ISBM College of Engi...

ISBM College of Engineering, Pune Organizing International Conferen...

Periyar Maniammai In...

Department of Commerce Periyar Maniammai Institute of Science &...

Institute of Managem...

Vivekanand Education Society's Institute of Management Studies ...

Institute of Managem...

Deccan Education Society Institute of Management Development and Re...

S.B. Patil Institute...

Pimpri Chinchwad Education Trust's S.B. Patil Institute of Mana...

D. Y. Patil IMCAM, A...

D. Y. Patil Institute of Master of Computer Applications & Managem...

Vignana Jyothi Insti...

Vignana Jyothi Institute of Management International Conference on ...

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.