Computology: Journal of Applied Computer Science and Intelligent Technologies
QLSTM-based Joint-Training for Noise Robust Hindi Speech Recognition

Ankit Kumar, Department of Computer Science & Information Technology, KIET Group of Institution, Ghaziabad, Uttar Pradesh, India

In recent years, the field of speech recognition has benefited more from deep learning. The substantial improvement was reported by current technology; how- ever, speech recognition did not work well in a noisy environment. Improving speech recognition in noisy conditions is a critical task. The goal of this work is to propose a high accuracy noise-robust Hindi speech recognition system. In this series, we apply Bi-directional Quaternion Long-Short-Term Memory (QLSTM) neural network to train the speech enhancement and speech recognition model jointly. The role of the i-vector and Recurrent Neural Network (RNN) language model is also investigated. Using a 2.5-hour Hindi speech dataset and the Kaldi and Pytorch-Kaldi toolkit, all of the experiments were carried out. The proposed model reports the 2% Word Error Rate (WER) reduction over the state-of-the-art (SOTA) techniques.


Quaternion Neural Network; Joint-training; Hindi Speech Recognition; Noise-Robusr ASR

