Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Application of Smooth Ergodic Hidden Markov Model in Text to Speech Systems

Authors: Armin Ghayoori, Faramarz Hendessi, Asrar Sheikh

Abstract:

In developing a text-to-speech system, it is well known that the accuracy of information extracted from a text is crucial to produce high quality synthesized speech. In this paper, a new scheme for converting text into its equivalent phonetic spelling is introduced and developed. This method is applicable to many applications in text to speech converting systems and has many advantages over other methods. The proposed method can also complement the other methods with a purpose of improving their performance. The proposed method is a probabilistic model and is based on Smooth Ergodic Hidden Markov Model. This model can be considered as an extension to HMM. The proposed method is applied to Persian language and its accuracy in converting text to speech phonetics is evaluated using simulations.

Keywords: Hidden Markov Models, text, synthesis.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1070939

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1201

References:


[1] R. Sproat, J. Hu, H. Chen, "Emu: An e-mail preprocessor for text-tospeech,-- Proc. IEEE Workshop on Multimedia Signal Proc., pp. 239- 244, Dec. 1998.
[2] C.-H. Wu and J. -H. Chen, "Speech activated telephony e-mail reader (SATER) based on speaker verification and text-to-speech conversion,-- IEEE Trans. Consumer Electronics, vol. 43, no. 3, pp. 707-716, Aug. 1997.
[3] Sejnowski, T.J. and C.R. Rosenberg, "NETTalk: A Parallel network that learns to read aloud", Electrical Engineering and Computer Science, Technical Report JHU/EECS-86/01, Johns Hopkins University, Baltimore, 1986.
[4] Sejnowski, T.J. and C.R. Rosenberg, "Parallel networks that learn to pronounce English text", Complex Systems, vol.1, 145-168, 1987.
[5] Neural Networks in Text-to-Speech Systems for the Greek Language, 10th IEEE Mediterranean Electro-technical Conference, MELECON, pp. 574-577, May 2000.
[6] F. Hendessi, A. Ghayoori, T. A. Gulliver, "A new text-to-speech system for Persian using a neural network and a SEHMM", Accepted for publication in the ACM Trans. Asian Lang. Proc., p.24, 2004.
[7] F. Hendessi, A. Ghayoori, "Text-to-phoneme Conversion using Smooth Ergodic Hidden Markov Model", Proceedings of the 12th Iranian Conference on Electrical Engineering, May 2004.
[8] L.R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, Vol. 77, No. 2, pp. 257-286, Feb. 1989.
[9] Baum, T. Petrie, G. Soules & N. Weiss, "A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains", Annuals of Mathematical Statistics pp. 41.164-171, 1970.
[10] F. Hendessi, A. Ghayoori, "Text to Phoneme Conversion in Persian using Neural Networks", Proceedings of 9th annual of Iran computer conference, 2004.