Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis
Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze
Abstract:
The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.
Keywords: Auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1095933
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2326References:
[1] H. G. Hirsch, D. Pearce, "The AURORA Experiment Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Condition”, ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the Next Millennium, France, 2000.
[2] M. Brookes, "VOICEBOX: Speech Processing Toolbox for MATLAB”, Software, available (Mar, 2011).
[3] C. Hsieh, E. Lai, and Y. Wang, "Robust speaker identification system based on wavelet transform and Gaussian mixture model”, Journal of Information Science and Engineering, 19, pp. 267-282, 2003.
[4] Schlüter, R., Bezrukov, I., Wagner, H., Ney, H, "Gamma tone features and feature combination for large vocabulary speech recognition”, In ICASSP 2007. Honolulu (HI, USA), April 2007, p. 649-652.
[5] Irino. T, E. Okamoto, R. Nisimura, Hideki Kawahara and Roy D. Patterson, "A Gammachirp Auditory Filterbank for Reliable Estimation of Vocal Tract Length from both Voiced and Whispered Speech", The 4th Annual Conference of the British Society of Audiology, Keele, UK, 4-6, Sept, 2013.
[6] T. Irino and M. Unoki, "An Analysis Auditory Filterbank Based on an IIR Implementation of the Gammachirp”, J. Acoust. Soc Japan. 20(6): 397-406, November, 1999.
[7] Daniel PW Ellis and Byunk Suk Lee, "Noise robust pitch tracking by subband autocorrelation classification”, in 13th Annual Conference of the International Speech Communication Association, 2012.
[8] D. Povey, L. Burget, et al., "The Subspace Gaussian Mixture Model–A Structured Model for Speech Recognition”, Computer Speech & Language, vol. 25, no. 2, pp. 404–439, April 2011.
[9] H. Hermansky and N. Morgan, "RASTA Process ing of Speech", IEEE. Trans. on Speech and Audio Processing, Vol.2, No.4, Oct. 1994.