Author Details
( * ) denotes Corresponding author
In recent years, the field of speech recognition has benefited more from deep learning. The substantial improvement was reported by current technology; how- ever, speech recognition did not work well in a noisy environment. Improving speech recognition in noisy conditions is a critical task. The goal of this work is to propose a high accuracy noise-robust Hindi speech recognition system. In this series, we apply Bi-directional Quaternion Long-Short-Term Memory (QLSTM) neural network to train the speech enhancement and speech recognition model jointly. The role of the i-vector and Recurrent Neural Network (RNN) language model is also investigated. Using a 2.5-hour Hindi speech dataset and the Kaldi and Pytorch-Kaldi toolkit, all of the experiments were carried out. The proposed model reports the 2% Word Error Rate (WER) reduction over the state-of-the-art (SOTA) techniques.
Keywords
Quaternion Neural Network; Joint-training; Hindi Speech Recognition; Noise-Robusr ASR