SMaTTS: Standard Malay Text to Speech System

Othman O. Khalifa; Zakiah Hanim Ahmad; Teddy Surya Gunawan

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

SMaTTS: Standard Malay Text to Speech System

Authors: Othman O. Khalifa, Zakiah Hanim Ahmad, Teddy Surya Gunawan

Abstract:

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.

Keywords: Natural Language Processing, Text-To-Speech (TTS), Diphone, source filter, low-/ high- level synthesis.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1079188

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1926

References:

[1] Allen J., Hunnicut S., Klatt D. (1987). "From Text To Speech, The MITTALK System". Cambridge University Press, USA
[2] Allen J., Hunnicut S., Klatt D. (1987). "From Text To Speech, The MITTALK System". Cambridge University Press, USA.
[3] Dutoit T. (1996), "A Short Introduction to Text-to-Speech Synthesis". TTS research team, TCTS Lab., Mons, Belgium, http://tcts.fpms.ac.be/synthesis/introtts.html
[4] Ferencz A., Zaiu D., Ferencz M., Raţiu T., Toderean G. (1989). "A Text- To-Speech System for the Romanian Language" , http://www.racai.ro/books/awde/ferencz.html
[5] Klatt D.H. (1987). "Review of Text-to-Speech Conversion for English". Washington, USA, http://www.mindspring.com/~dmaxey/ssshp/dk_737a.htm
[6] Miller C.A. (1998). "Pronounciation Modeling in Speech Synthesis". Presented to the Faculties of University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy, University of Pennsylvania, Pennsylvania, USA, http://citeseer.nj.nec.com/miller98pronunciation.html
[7] Sproat R. (1998), "Text Interpretation for TTS Synthesis", Bell Labs., Murray Hill, New Jersey, USA, http://cslu.cse.ogi.edu/HLTsurvey/ch5node5.html#SECTION53
[8] Wolters M. (1997). "A Diphone-Based Text-to-Speech for Scottish Gaelic". A Thesis Submitted in Fulfillment of the Requirements for the Degree of Diplom in Informatik to the University of Bonn, University of Bonn, Bonn, Germany, http://citeseer.nj.nec.com/309369.html.
[9] Samsudin, Nur-Hana and Kong, Tang Enya. (2004, October). A Simple Malay Speech Synthesizer Using Syllable Concatenation Approach, MMU International Symposium on Information and Communications Technologies 2004 (M2USIC 2004).
[10] Bamini, P. K. (2003). FGPA-based Implementation of Concatenative Speech Synthesis Algorithm. Master thesis, Dept. of Computer Science and Engineering, University of South Florida
[11] Benjamin, Nettre. (2000). Synthesis by Concatenation.for Text-to- Speech. Tokyo Institute of Technology.
[12] Bozkurt, Baris and Dutoit, Thierry. (2001). An Implementation and Evaluation of Two Diphone-Based Synthesizers for Turkish, Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, 247-250.
[13] Sankaranarayanan, A. (2002). A Text-Independent Approach to Speaker Identification. Retrieved July 17, 2006. http://www.techonline.com/community/ed_resource/feature_article/2106 8__JD7349406658EL
[14] Childers, Donald G. (1999). Speech Processing and Synthesis Toolboxes. John Wiley & Sons, New York.
[15] Dutoit, Thierry (1993). High Quality Text-To-Speech Synthesis of the French Language. Doctorial dissertation, Faculte Polytechnique de Mons.
[16] Dutoit, Thierry (1997). An Introduction to Text-To-Speech Synthesis. Kluwer Academics Publisher, The Netherlands.
[17] Dutoit, Thierry (1999) Short Introduction to Text-To-Speech Synthesis. Retrieved April 16, 2005. http://tcts.fpms.ac.be/synthesis/introtts_old.html
[18] H├ñrm├ñ, Aki and Laine, Unto K. (2001), A Comparison of Warped and Conventional Linear Predictive Coding. IEEE Transactions on Speech and Audio Processing, vol. 9, 579-588.
[19] Helander, Elina (2005). SGN-1656 Signal Processing Laboratory. Retrieved January 11, 2005, http://ww.cs.tut.fi/kurssit/SGN-4010/.
[20] Howitt, Andrew Wilson (1995). Linear Predictive Coding. Retrieved July 10, 2006 http://www.otolith.com/otolith/olt/lpc.html
[21] Klabbers, Esther A. M. (2000). Segmental and Prosodic Improvements to Speech Generation. PhD dissertation. Technische Universiteit Eindhoven, The Netherlands.
[22] Lemmetty, Sami (1999). Review of Speech Synthesis. Master thesis, Dept. of Electrical and Communications Engineering, Helsinky University of Technology
[23] Laws, Mark R. (2003). Speech Data Analysis for Diphone Construction of a Maori Online Text- to- Speech Synthesizer, SIP 2003, 103-108
[24] Lehana, P. K. and Pandey, P. CP.K. Lehana and P.C. Pandey (2004). Harmonic Plus Noise Model Based Speech Synthesis in Hindi And Pitch Modification. Proc. 18th International Congress on Acoustics, ICA 2004, 3333-3336
[25] Seong, Teoh Boon. (1994). The Sound System of Malay Revisited. Percetakan Dewan Bahasa Dan Pustaka. Selangor, MalaysiaStylianou,Yannis, Dutoit,Thierry and Schroeter, Juergen. (1997). Diphone Concatenation Using A Harmonic Plus Noise Model Of Speech. Proc. Eurospeech. 613-616.
[26] Yi, Jon Rong-Wei. (1998). Natural-Sounding Speech Synthesis Using Variable-Length Units. Master thesis. Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.
[27] Malay Language, retrieved 2006, May. http://en.wikipedia.org/wiki/Malay_language
[28] Kee, Tan Yeow, Seong, Teoh Boon and Haizhou, Li. (2004). Grapheme to Phoneme Conversion for Standard Malay.