A comparison between aconventional LSTM network and agrid LSTM network applied onspeech recognition

University essay from KTH/Skolan för teknikvetenskap (SCI)

Author: Gustav Edholm; Xuechen Zuo; [2018]

Keywords: ;

Abstract: In this paper, a comparision between the conventional LSTM network and the one-dimensionalgrid LSTM network applied on single word speech recognition is conducted. The performanceof the networks are measured in terms of accuracy and training time. The conventional LSTMmodel is the current state of the art method to model speech recognition. However, thegrid LSTM architecture has proven to be successful in solving other emperical tasks such astranslation and handwriting recognition. When implementing the two networks in the sametraining framework with the same training data of single word audio files, the conventionalLSTM network yielded an accuracy rate of 64.8 % while the grid LSTM network yielded anaccuracy rate of 65.2 %. Statistically, there was no difference in the accuracy rate betweenthe models. In addition, the conventional LSTM network took 2 % longer to train. However,this difference in training time is considered to be of little significance when tralnslating it toabsolute time. Thus, it can be concluded that the one-dimensional grid LSTM model performsjust as well as the conventional one.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)