Recognition by Online Modeling – a New Approach of Recognizing Voice Signals in Linear Time

Jyh-Da Wei; Hsin-Chen Tsai

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Recognition by Online Modeling – a New Approach of Recognizing Voice Signals in Linear Time

Authors: Jyh-Da Wei, Hsin-Chen Tsai

Abstract:

This work presents a novel means of extracting fixedlength parameters from voice signals, such that words can be recognized in linear time. The power and the zero crossing rate are first calculated segment by segment from a voice signal; by doing so, two feature sequences are generated. We then construct an FIR system across these two sequences. The parameters of this FIR system, used as the input of a multilayer proceptron recognizer, can be derived by recursive LSE (least-square estimation), implying that the complexity of overall process is linear to the signal size. In the second part of this work, we introduce a weighting factor λ to emphasize recent input; therefore, we can further recognize continuous speech signals. Experiments employ the voice signals of numbers, from zero to nine, spoken in Mandarin Chinese. The proposed method is verified to recognize voice signals efficiently and accurately.

Keywords: Speech Recognition, FIR system, Recursive LSE, Multilayer Perceptron

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1330727

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1419

References:

[1] X. Huang and A. Acero and H. W. Won, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Prentice Hall, Upper Saddle River, NJ, 2001.
[2] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach, McGraw-Hill, New York, NY, 2001.
[3] J.R.Deller and J.G.Proakis and J. H. L. Hansen, Discrete Time Processing of Speech Signals, Mac Millan, 1993.
[4] Tanja Schultz, Alan W. Black, Stephan Vogel, and Monika Woszczyna, "Flexible speech translation systems," IEEE Transactions on Audio, Speech, and Language Processin, vol. 14, pp. 403-411, 2006.
[5] William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, "Support vector machines for speaker and language recognition," Computer Speech & Language, vol. 20, no. 2-3, pp. 210-229, 2006.
[6] N.U. Nair and T.V. Sreenivas, "Multi pattern dynamic time warping for automatic speech recognition," in Proc. 2008 IEEE Region 10 Conference, 2008, pp. 1-6.
[7] Peter Jan╦çcovi╦çc and M┬¿unevver K┬¿ok┬¿uer, "Incorporating the voicing information into hmm-based automatic speech recognition in noisy environments," Speech Commun., vol. 51, no. 5, pp. 438-451, 2009.
[8] A' ngel de la Torre, Antonio M. Peinado, Antonio J. Rubio, Jose' C. Segura, and Carmen Ben'─▒tez, "Discriminative feature weighting for hmm-based continuous speech recognizers," Speech Commun., vol. 38, no. 3-4, pp. 267-286, 2002.
[9] Alan V. Oppenheim and Ronald W. Schafer, Discrete-Time Signal Processing, Prentice Hall, 2009.
[10] Ben M. Chen and Kemao Peng and Tong H. Lee and Venkatakrishnan Venkataramanan, System Modeling and Identification, Springer London,2006.
[11] J. P. Norton, An Introduction to Identification, Dover Publications, Inc., New York, NY, USA, 2009.