Understanding of Speech and Speaker Model for Recognition of a Language

Tilendra Shishir Sinha, Gautam Sanyal


In the present paper, soft computing based techniques: hybrid approach and genetic algorithm have been used for the simultaneous automatic speech and speaker recognition (ASSR) from a noise-free artificial word model (AWM) and vowel-diphthong model (VDM). For the recognition process, first the test data has been studied using the methods adopted for the formation of AWM and VDM and then for the optimal solution the trained data set have been matched for the best-fit not only by using forward-backward dynamic programming (FBDP) but also by using genetic algorithm. The authors have postulated the mechanism for the usage of a noise-free AWM and VDM for the recognition process and also a' algorithm called RCGSTSS (real-coded genetic search technique for the speech and speaker) has been proposed. The practical implementation of the algorithm has been tested using Bengali (Indian) language and the complexity has been also computed. In the present work, a noise-free AWM has been used for the recognition of speech with an average of 86% accuracy and a noise-free VDM has been used for the recognition of speaker with an average of 92% accuracy. The algorithm has been tested for the recognition process by involving 12 speakers of varying ages of Bengali language with OOV word error – rate with an average of 14.82%.


Artificial word model (AWM); Forward-backward dynamic programming (FBDP); Unidirectional temporary associative memory (UTAM); Genetic algorithm (GA); Automatic speech and speaker recognition (ASSR)

