Formant Tracking Linear Prediction Model using HMMs for Noisy Speech Processing

Zaineb Ben Messaoud; Dorra Gargouri; Saida Zribi; Ahmed Ben Hamida

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Formant Tracking Linear Prediction Model using HMMs for Noisy Speech Processing

Authors: Zaineb Ben Messaoud, Dorra Gargouri, Saida Zribi, Ahmed Ben Hamida

Abstract:

This paper presents a formant-tracking linear prediction (FTLP) model for speech processing in noise. The main focus of this work is the detection of formant trajectory based on Hidden Markov Models (HMM), for improved formant estimation in noise. The approach proposed in this paper provides a systematic framework for modelling and utilization of a time- sequence of peaks which satisfies continuity constraints on parameter; the within peaks are modelled by the LP parameters. The formant tracking LP model estimation is composed of three stages: (1) a pre-cleaning multi-band spectral subtraction stage to reduce the effect of residue noise on formants (2) estimation stage where an initial estimate of the LP model of speech for each frame is obtained (3) a formant classification using probability models of formants and Viterbi-decoders. The evaluation results for the estimation of the formant tracking LP model tested in Gaussian white noise background, demonstrate that the proposed combination of the initial noise reduction stage with formant tracking and LPC variable order analysis, results in a significant reduction in errors and distortions. The performance was evaluated with noisy natual vowels extracted from international french and English vocabulary speech signals at SNR value of 10dB. In each case, the estimated formants are compared to reference formants.

Keywords: Formants Estimation, HMM, Multi Band Spectral Subtraction, Variable order LPC coding, White Gauusien Noise.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1060325

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1926

References:

[1] X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing. Prentice Hall PTR, 2001.
[2] R.C. Snell and F. Milinazzo,Formant location from LPC analysis data. IEEE Trans. Speech Audio Processing, vol. 1, pp. 129-134, Apr. 1993.
[3] S. McCandless,An algorithm for automatic formant extraction using linear prediction spectra. IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-22, pp. 135-141, 1974.
[4] Noll, A.Cepstrum speech determination. Journal of the Acoustic Society of America 41 (1), 293-309. 1967.
[5] R. Shafer and L. Rabiner,System for Automatic Formant Analysis of Voiced Speech. JASA Vol. 47, 1970, pp. 634-648.
[6] C. Espy-Wilson,An Acoustic-Phonetic approach to speech Recognition: Application to the Semivowels. RLE Technical Report 531, MIT, 1987.
[7] V. Chari,Extraction of Formant Frequencies by Adaptive Enhancement of Fourier Spectra. MS Th., Boston Univ, 1992.
[8] D. Talkin,Speech Formant Trajectory Estimation Using Dynamic Programming with Modulated Transition Costs. JASA, S1, 1987 p. S55.
[9] L. Welling and H. Ney,A Model for Efficient Formant Estimation. Proc. ICASSP 1996 pp. 797-800.
[10] K. Xia and C. Epsy-Wilson.A New Strategy of Formant Tracking based on Dynamic Programming. In International Conf. on Spoken Language Processing - ICSLP2000, Beijing, China, October 2000.
[11] A. Acero,Formant analysis and synthesis using hidden markov models. in Proc. Eur. Conf. Speech Communication Technology, 1999.
[12] Roy Streit and Ross Barrett,Frequency line traking using Hidden Markov Model. IEEE Trans. On Acoust. Speech, and Signal Proc., vol. ASSP- 38, April 1990.
[13] Depalle, P.,G. Garca, and X. Rodet, Tracking of partials for additive sound synthesis using hidden markov models. In Proceedings of the International Conference on Acoustics Speech and Signal Processing 1993.
[14] I. C. Bruce, N. V. Karkhanis, E. D. Young, and M. B.Sachs,Robust formant tracking in noise. in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 281-284. 2002.
[15] A. Rao and R. Kumaersan,On decomposing into modulated speech components. IEEE Transactions on Speech and Audio Processing, pp. 240-254, 2000.
[16] S. Kamath, and P. Loizou,A multi-band spectral subtraction method on enhancing speech corrupted by colored noise. Proceedings of ICASSP- 2002, Orlando, FL, May 2002.
[17] Dorra. Gargouri, M. A. Zerzri and Ahmed Ben Hamida,Formants Estimation Algorithm in Noisy Environment. GESTS Int-l Trans. Computer Science and Engr., Vol.45, No.1, pp. 221-241, Mar. 2008.
[18] K. Weber, S. Bengio, and H. Bourlard, HMM2 extraction of formant structures and their use for robust ASR. in Proc. Eur. Conf. Speech Communications and Technology (EUROSPEECH), pp. 607-610 ,2001.
[19] M. A. Kammoun, Dorra Gargouri, Mondher Frikha and Ahmed Ben Hamida,Cepstrum vs. LPC: A Comparative Study for Speech Formant Frequencies Estimation. GESTS Int-l Trans. Communication and Signal Proce., Vol.9, No.1,pp. 87-102, Oct 2006.
[20] Calliope, La parole et son traitement automatique. ed. J.P.Tubach, Masson,1989.
[21] L. R. Rabiner,Tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, vol. 77, no. 2, pp. 257-278, Feb. 1989.
[22] G. D. Forney, Jr.,The Viterbi algorithm. Proc. IEEE, vol. 61, pp. 268-278, Mar. 1973.
[23] Mathworks. Inc. Matlab MEX File API Documentation. Mathworks, Inc. 2002.
[24] A. P. Varga et al.,The NOISEX-92 - Study on the effect of additive noise on an automatic speech recognition. In Technical Report; DRA Speech Research Unit; 1992