Assamese Numeral Speech Recognition using Multiple Features and Cooperative LVQ -Architectures
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32769
Assamese Numeral Speech Recognition using Multiple Features and Cooperative LVQ -Architectures

Authors: Manash Pratim Sarma, Kandarpa Kumar Sarma

Abstract:

A set of Artificial Neural Network (ANN) based methods for the design of an effective system of speech recognition of numerals of Assamese language captured under varied recording conditions and moods is presented here. The work is related to the formulation of several ANN models configured to use Linear Predictive Code (LPC), Principal Component Analysis (PCA) and other features to tackle mood and gender variations uttering numbers as part of an Automatic Speech Recognition (ASR) system in Assamese. The ANN models are designed using a combination of Self Organizing Map (SOM) and Multi Layer Perceptron (MLP) constituting a Learning Vector Quantization (LVQ) block trained in a cooperative environment to handle male and female speech samples of numerals of Assamese- a language spoken by a sizable population in the North-Eastern part of India. The work provides a comparative evaluation of several such combinations while subjected to handle speech samples with gender based differences captured by a microphone in four different conditions viz. noiseless, noise mixed, stressed and stress-free.

Keywords: Assamese, Recognition, LPC, Spectral, ANN.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1079784

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1941

References:


[1] "National Institute on Deafness and Other Communication Disorders", (www.nidcd.nih.gov/health/voice/whatisvsl.htm)
[2] A. Saxena and A. Singh: "A Microprocessor based Speech Recognizer for Isolated Hindi Digits", Department of Electrical Engineering, Indian Institute of Technology Kanpur, India (www.stanford.edu/ asaxena/ research/speechrecognizer.shtml)
[3] B. Gas: "Self-Organizing MultiLayer Perceptron", IEEE Transactions on Neural Networks, Vol.: 1(99), pp: 1 - 1, 2010.
[4] L. Shuling, W. Chaoli, D. Jiaming: "Nonspecific Speech Recognition Method Based on Composite LVQ1 and LVQ2 Network", IEEE Conference on Control and Decision Conference, 2009 (CCDC -09), pp: 2304 - 2308, 2009.
[5] L. Shuling, W. Chaoli, D. Jiaming: "Nonspecific Speech Recognition based on HMM / LVQ Hybrid Network", Second IEEE International Conference on Intelligent Computation Technology and Automation, 2009 (ICICTA -09), Vol: 1, pp: 645 - 648, 2009.
[6] L. Qiong, L. Stephen, W. Ying and H. Thomas: "Robot Speech Learning via Entropy Guided LVQ and Memory Association", Proceedings of IEEE International Joint Conference on Neural Networks, 2001 (IJCNN -01), Vol: 3, pp: 2176 - 2181, 2001.
[7] H. Jaakko, T. Volker and S. Olli: "A Learning Vector Quantization Algorithm For Probabilistic Models", X European Signal Processing Conference (EUSIPCO 2000), Vol. II, pp: 721-724, 2000.
[8] J. S. Baras, and S. Dey: "Combined Compression and Classification with Learning Vector Quantization" IEEE Transactions on Information Theory, Vol: 45 (6), pp: 1911 - 1920, 1999.
[9] N. B. Karayiannis: "An Axiomatic Approach to Soft Learning Vector Quantization and Clustering", IEEE Transactions on Neural Networks, Vol: 10 (5), pp: 1153 - 1165, 1999.
[10] H. K. Kwan: "Fuzzy Neural Network For Phoneme Sequence Recognition", IEEE International Symposium on Circuits and Systems (ISCAS 2002), Volume: 2, pp: II- 847 -850, 2002.
[11] N. S. Lechn, J. I. Godino-Llorente, V. Osma-Ruiz, M. Blanco-Velasco and F. Cruz-Roldn: "Automatic Assessment of Voice Quality According to the GRBAS Scale", 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS -06), pp: 2478 - 2481, 2006.
[12] J. I. Godino-Llorente, P. Gomez-Vilda: "Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors", IEEE Transactions on Biomedical Engineering, Vol: 51 (2), pp: 380 - 384, 2004.
[13] tdil.mit.gov.in/assamesecodechartoct02.pdf (courtesy: Prof. Gautam Baruah, Dept. of CSE, IIT Guwahati, Guwahati, Assam, India.)
[14] A. Dev, S. S. Agrawal and D. R. Choudhury: "Categorization of Hindi phonemes by neural networks", Spinger Journal of AI and Society, vol. 17 (3-4), pp. 375-382, 2003.
[15] A. Sharma, M. C. Shrotriya, O. Farooq and Z. A. Abbasi: "Hybrid wavelet based LPC features for Hindi speech recognition", International Journal of Information and Communication Technology, vol. 1 (3-4), pp. 373-381 (9), 2009.
[16] D. K. Rajoriya, R. S. Anand and R. P. Maheshwari: "Hindi paired word recognition using probabilistic neural network", International Journal of Computational Intelligence Studies (IJCISTUDIES), Vol. 1, No. 3, pp. 291 - 308, 2010.
[17] M. Sarma, K. Dutta and K. K. Sarma: "Assamese Numeral Corpus for Speech Recognition using Cooperative ANN Architecture", International Journal of International Journal of Electrical and Electronics Engineering, vol.3:8, pp. 458 - 468, 2009.
[18] M. Sarma, K. Dutta and K. K. Sarma: "Speech Corpus of Assamese Numerals Extracted using an Adaptive Pre-emphasis Filter for Speech Recognition", Proceedings of IEEE International Conference on Computer and Communication Technology (ICCCT-2010), Allahabad, India, 2010.
[19] M. Sarma, K. Dutta and K. K. Sarma: "LPC-Cepstrum Corpus of Assamese Numerals for Speech Recognition Using Recurrent Neural Network", Proceedings of IEEE Communications Society Sponsored Conference International Conference on Advances in Communication, Network and Computing (CNC 2010), Calicut, India, 2010.
[20] M. P. Sarma and K. K. Sarma: "Speech Recognition of Assamese Numerals using combinations of LPC - features and heterogenous ANNs", Proceedings of International Conference on Advances in Information and Communication Technologies (ICT 2010), Kochi, Kerala, India, 2010.
[21] A. P. Simpson, "Phonetic differences between male and female speech", Language and Linguistics Compass 3/2, pp.: 621 640, 2009.
[22] B. Yegnanarayana, Artificial Neural Networks, 1st Ed., PHI, New Delhi, 2003.
[23] Feature Extraction, cslu.cse.ogi.edu /toolkit /old /old /version 2.0a /.../ node5.html.
[24] B. Atal, "Efficient coding of LPC parameters by temporal decomposition", Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP -83), Vol. 8, pp: 81 - 84, 1983.
[25] K. Y. Lee, A. M. Kondoz, and B. G. Evans: "Speaker adaptive vector quantisation of LPC parameters of speech", Electronics Letters, Vol. 24 (22), pp: 1392 - 1393, 1988.
[26] M. P. Kesarkar, Feature Extraction for Speech Recogntion, M.Tech. Credit Seminar Report, Electronic Systems Group, EE. Dept, IIT Bombay, November, 2003.
[27] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, 1st Ed., Prentice Hall, 1978.
[28] V. Tyagi, I. McCowan, H. Misra and H. Bourlard: " Mel-Cepstrum Modulation Spectrum (MCMS) Feature for Robust ASR", Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP), P.O. Box 592, CH- 1920, Martigny, Switzerland.
[29] S. Haykin, Neural Networks A Comprehensive Foundation, Pearson Education, 2nd edition, 2003.
[30] S. Kumar, Neural Networks A Classroom Approach, Tata McGraw Hill, 8th Reprint, 2009.
[31] E. Alhoneiemi, J. Hollmn, O. Simula and J. Vesanto: "Process monitoring and modeling using the self-organizing map", Integrated Computer Aided Engineering, Vol. 6 (1), pp. 3-14, 1999.
[32] S. Kaski and K. Lagus: "Comparing self-organizing maps" , Proceeding of International Conference on Neural Networks, pp. 809- 814, 1997.
[33] H. U. Bauer and K. Pawelzik: "Quantifying the neighborhood preservation of self-organizing feature maps", IEEE Transactions on Neural Networks, Vol. 3, no. 4, pp. 570-579, 1992.
[34] R. Rojas, Neural Networks-A Systematic Introduction, Springer, Berlin, 1996.