Adaption Model for Building Agile Pronunciation Dictionaries Using Phonemic Distance Measurements

Akella Amarendra Babu; Rama Devi Yellasiri; Natukula Sainath

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Adaption Model for Building Agile Pronunciation Dictionaries Using Phonemic Distance Measurements

Authors: Akella Amarendra Babu, Rama Devi Yellasiri, Natukula Sainath

Abstract:

Where human beings can easily learn and adopt pronunciation variations, machines need training before put into use. Also humans keep minimum vocabulary and their pronunciation variations are stored in front-end of their memory for ready reference, while machines keep the entire pronunciation dictionary for ready reference. Supervised methods are used for preparation of pronunciation dictionaries which take large amounts of manual effort, cost, time and are not suitable for real time use. This paper presents an unsupervised adaptation model for building agile and dynamic pronunciation dictionaries online. These methods mimic human approach in learning the new pronunciations in real time. A new algorithm for measuring sound distances called Dynamic Phone Warping is presented and tested. Performance of the system is measured using an adaptation model and the precision metrics is found to be better than 86 percent.

Keywords: Pronunciation variations, dynamic programming, machine learning, natural language processing.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1316865

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 747

References:

[1] S. H. Dumpala, K. V. Sridaran, S. V. Gangashetty, B. Yegnanarayana, “Analysis of laughter and speech-laugh signals using excitation source information”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, (2014) 975 – 979.
[2] B. Y. Thati, B. Yegnanarayana, Analysis of breathy voice based on excitation characteristics of speech production, International Conference on Signal Processing and Communications (SPCOM), Bangalore, (2012) 1 – 5.
[3] Jennifer E. Arnold, Michael K. Tanenhaus, “Disfluency effects in comprehension: how new information can become accessible”, In Gibson, E., and Perlmutter, N. (Eds.), The processing and acquisition of reference, MIT Press, January 2011, pp. 1-30.
[4] Akella Amarendra Babu, Y. Ramadevi, A. Ananda Rao, “Dynamic pronunciation modeling for unsupervised learning of ASR systems”, IETE Journal of Research, vol. 62, no 5, pp. 546-556, 2016.
[5] Janet M. Baker, Li Deng, James Glass, Sanjeev Khudanpur, Chin-Hui Lee, Nelson Morgan, and Douglas O’Shaughnessy, “Research developments and directions in speech recognition and understanding, part 1,” IEEE Signal Processing Magazine, vol. 75, May 2009.
[6] Hesham Tolba, Douglas O'Shaughnessy, “Speech recognition by intelligent machines,” IEEE Canadian Review – Summer, 2001.
[7] Stefan Hahn, Paul Vozila, Maximilian Bisani, Comparison of Grapheme-to-Phoneme Methods on Large Pronunciation Dictionaries and LVCSR Tasks, IEEE proceedings of INTERSPEECH 2012.
[8] M. Divay and A.-J. Vitale. Algorithms for grapheme-phoneme translation for English and French: Applications for database searches and speech synthesis. Computational linguistics, 23(4):495–523, 1997.
[9] M. Adda-Decker and L Lamel. Pronunciation variants across system configuration, language and speaking style. Speech Communication, 29:83–98, 1999.
[10] M. Wester. Pronunciation modeling for ASR- knowledge-based and data-driven methods. Computer Speech and Language, pages 69–85, 2003.
[11] H. Strik and C. Cucchiarini. Modeling pronunciation variation for ASR: A survey of the literature, Speech Communication, vol. 29 no. 4, pp. 225–246, 1999.
[12] Martijn Wieling, Eliza Margaretha, John Nerbonne, “Inducing phonetic distances from dialect variation,” Computational Linguistics in the Netherlands Journal 1, 2011, pp. 109-118.
[13] Ben Hixon, Eric Schneider, Susan L. Epstein, “phonemic similarity metrics to compare pronunciation methods”, INTERSPEECH 2011.
[14] Michael Pucher, Andreas Türk1, Jitendra Ajmera, Natalie Fecher, “Phonetic distance measures for speech recognition vocabulary,” 3rd Congress of the Alps Adria Acoustics Association 27–28 September 2007, Graz – Austria.
[15] Maider Lehr et al., “Discriminative pronunciation modeling for dialectal speech recognition,” INTERSPEECH 2014, Singapore.
[16] Alex, S. Park. and James R. Glass, “Unsupervised Pattern Discovery in Speech,” IEEE Transactions On Audio, Speech, And Language Processing, Vol. 16, No. 1, January 2008.
[17] Li Deng, Xiao Li, “Machine learning paradigms for speech recognition: An Overview”, IEEE Transactions on audio, speech, and language processing, vol. 21, no. 5, May 2013, pp. 1–30.
[18] L. Rabiner, B. Juang and B. Yegnanarayana, Fundamentals of speech recognition, second ed., Prentice Hall, Englewood Cliffs, N.J., 2010.