Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
Speaker Identification Using Admissible Wavelet Packet Based Decomposition
Authors: Mangesh S. Deshpande, Raghunath S. Holambe
Abstract:
Mel Frequency Cepstral Coefficient (MFCC) features are widely used as acoustic features for speech recognition as well as speaker recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable for speaker features that are located in high frequency regions. The speaker individual information, which is non-uniformly distributed in the high frequencies, is equally important for speaker recognition. Based on this fact we proposed an admissible wavelet packet based filter structure for speaker identification. Multiresolution capabilities of wavelet packet transform are used to derive the new features. The proposed scheme differs from previous wavelet based works, mainly in designing the filter structure. Unlike others, the proposed filter structure does not follow Mel scale. The closed-set speaker identification experiments performed on the TIMIT database shows improved identification performance compared to other commonly used Mel scale based filter structures using wavelets.Keywords: Speaker identification, Wavelet transform, Feature extraction, MFCC, GMM.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1330917
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1986References:
[1] Z. Tufekci and J.N. Gowdy, "Feature extraction using discrete wavelet transform for speech recognition," in Proc. IEEE Southeastcon, USA, 2000, pp. 116-123.
[2] O. Farooq and S. Datta, "Phoneme recognition using wavelet based features," Information Sciences, vol. 150, no.1-2, Mar. 2003, pp. 5-15.
[3] O. Farooq and S. Datta, "Wavelet based robust sub-band features for phoneme recognition," in IEE Proc. Image signal process., vol. 151, no. 3, June 2004, pp. 187-192.
[4] O. Farooq and S. Datta, "Mel filter like admissible wavelet packet structure for speech recognition," IEEE Signal Process. Lett., vol. 8, no. 7, pp. 196-199, July 2001.
[5] R. Sarikaya and H. L. Hansen, "High resolution speech feature parameterization for monophone-based stressed speech recognition," IEEE Signal Process. Lett., vol. 7, no. 7, pp. 182-185, July 2000.
[6] R. Sarikaya, B. L. Pellom and H. L. Hansen, "Wavelet packet transform features with application to speaker identification," in Proc. IEEE Nordic Signal processing Symposium, Visgo, Denmark, 1998, pp. 81-84.
[7] C. T. Hsieh, E. Lai and Y. C. Wang, "Robust speech features based on wavelet transform with application to speaker identification," in IEE Proc. Image signal process., vol. 149, no. 2, April 2002, pp. 108-114.
[8] S.-Y. Lung, "Further reduced form of wavelet feature for text independent speaker recognition," Pattern recognition, vol. 37, 2004, pp. 1565-1566.
[9] S.-Y. Lung, "Wavelet feature selection based neural networks with application to the text independent speaker recognition," Pattern recognition, vol. 39, 2006, pp. 1518-1521.
[10] H. M. Torres and H. L. Rufiner, "Automatic speaker identification by means of Mel cepstrum, wavelets and wavelets packets," in Proc. IEEE international conference, EMBS, Chicago, IL, 2002, pp. 978-981.
[11] Xugang Lu and Jianwu Dang, "An investigation of dependencies between frequency components and speaker characteristics for textindependent speaker identification," Speech communication, vol. 50, 2008, pp. 312-322.
[12] S. Hayakawa and F. Itakura, "Text-dependent speaker recognition using the information in the higher frequency band," in Proc. IEEE international conference on Acoustic Speech and signal Processing, ICASSP, Adelaide, Australia, 1994, pp. 137-140.
[13] S. Mallat, A wavelet tour of signal processing. Second ed., Academic Press, 1998.
[14] D.A. Reynolds and R.C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," IEEE Trans. Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, Jan. 1995.
[15] K. Markov and S. Nakagawa, "Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models," in Proc. IEEE ICSLP, 1996, pp. 1764-1767.