Accent Identification by Clustering and Scoring Formants

Dejan Stantic; Jun Jo

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32799

Accent Identification by Clustering and Scoring Formants

Authors: Dejan Stantic, Jun Jo

Abstract:

There have been significant improvements in automatic voice recognition technology. However, existing systems still face difficulties, particularly when used by non-native speakers with accents. In this paper we address a problem of identifying the English accented speech of speakers from different backgrounds. Once an accent is identified the speech recognition software can utilise training set from appropriate accent and therefore improve the efficiency and accuracy of the speech recognition system. We introduced the Q factor, which is defined by the sum of relationships between frequencies of the formants. Four different accents were considered and experimented for this research. A scoring method was introduced in order to effectively analyse accents. The proposed concept indicates that the accent could be identified by analysing their formants.

Keywords: Accent Identification, Formants, Q Factor.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1060992

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2043

References:

[1] K. Bartkova and D. Jouvet. Automatic detection of foreign accent for automatic speech recognition. In Proceedings of the International Congress of Phonetic Sciences ICPS07, pages 2185-2188, 2007.
[2] T. Chen, C. Huang, E. Chang, and J. Wang. Automatic accent identification using Gaussian mixture models. In Proceedings of the IEEE Workshop on Automatic Speech Recognition, pages 343-346, 2001.
[3] G. Doddington. Speaker recognition based on idiolectal differences between speakers. In the Proceedings of the 5th European Conference on Speech Communication and Technology - Eurospeech01, Aalborg, Denmark, pages 2521-2524, 2001.
[4] Paola Escudero, Paul Boersma, Andreia Schurt Rauber, and Ricardo Bion. A Cross-dialect Acoustic Description of Vowels: Brazilian and European Portuguese. Journal of the Acoustical Society of America, 126(3):1379-1393, 2009.
[5] G. Fant. Acoustic Theory of Speech Production. Mouton and Co, The Hague, Netherlands, 1960.
[6] James Emil Flege, Ocke-Schwen Bohn, and Sunyoung Jang. Effects of experience on non-native speakers production and perception of English vowels. Journal of Phonetics, 5(1):437-470, 1997.
[7] M. Greitans. Adaptive STFT-like Time-Frequency analysis from arbitrary distributed signal samples. International Workshop on Sampling Theory and Application, 2005.
[8] Therese Leinonen. Factor analysis of vowel pronunciation in swedish dialects. International Journal of Humanities and Arts Computing, 2(1):189-204, 2009.
[9] Gina Levow. Investigating pitch accent recognition in non-native speech. In the Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics, Singapore, pages 269-272, 2009.
[10] S. Matsunaga, A. Ogawa, Y. Yamaguchi, and A. Imamura. Non-native English speech recognition using bilingual English lexicon and acoustic models. In the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing - ICASSP03, pages 340-343, 2003.
[11] W. C. McDermott. The Scalability of Degrees of Foreign Accent. PhD thesis. Cornell University, 1986.
[12] M. J. Munro, T. M. Derwing, and J. E. Flege. Canadians in Alabama: A perceptual study of dialect acquisition in adults. Studies in Second Language Acquisition, 27:385-403, 1999.
[13] K.J. Preacher, P.J. Curran, and D.J. Bauer. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis. Journal of Educational and Behavioral Statistics, 31(4):437-448, 2006.
[14] E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and A. Stolcke. Modeling prosodic feature sequences for speaker recognition. Speech Communication, Special Issue on Quantitative Prosody Modelling for Natural Speech Description and Generation, 46(2):455-472, 2005.
[15] Kamil Wojcicki, Mitar Milacic, Anthony Stark, James Lyons, and Kuldip Paliwal. Exploiting conjugate symmetry of the short-time fourier spectrum for speech enhancement. 2008.
[16] Qin Yan and Saeed Vaseghi. Modeling and synthesis of English regional accents with pitch and duration correlates. Computer Speech and Language, 24:711-725, 2010.
[17] Y. Zheng, R. Sproat, L. Gu, I. Shafran, H. Zhou, Y. Su, D. Jurafsky, R. Starr, and S.Y. Yoon. Accent detection and speech recognition for Shanghai-accented Mandarin. In the Proceedings of the 9th European Conference on Speech Communication and Technology - Eurospeech05, pages 217-220, 2005.
[18] M. A. Zissman and E. Singer. Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling. In the Proceedings of the Acoustics, Speech, and Signal Processing ICASSP, Adelaide, Australia, pages 305-308, 1994.