Effect of Visual Speech in Sign Speech Synthesis

Zdenek Krnoul

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Effect of Visual Speech in Sign Speech Synthesis

Authors: Zdenek Krnoul

Abstract:

This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips. Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech. Proposed methodology is inspected on Czech speech. Hence, this article presents a procedure of recording of speech data in order to set a synthesis system as well as to evaluate synthesized speech. Furthermore, one option of the evaluation process is elaborated in the form of a perceptual test. This test procedure is verified on the measured data with two settings of the synthesis system. The results of the perceptual test are presented as a statistically significant increase of intelligibility evoked by real and synthesized visual speech. Now, the aim is to show one part of evaluation process which leads to more comprehensive evaluation of the sign speech synthesis system.

Keywords: Perception test, Sign speech synthesis, Talking head, Visual speech.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1332124

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1415

References:

[1] R. Conrad, The deaf school child. London: Harper & Row, 1979.
[2] O. Velehradsk'a and K. Kuchler, "Pr ┬░uzkum ╦çcten'a╦çrsk'ych dovednost'─▒ na ╦çskol'ach pro d╦çeti s vadami sluchu," INFO-Zpravodaj FRPSP, vol. 6, 1998.
[3] P. Campr, M. Hr 'uz, A. Karpov, P. Santemiz, M. ╦çZelezn'y, and O. Aran, "Sign-language-enabled information kiosk," 2009. (Online). Available: http://www.kky.zcu.cz/en/publications/CamprP 2009 Sign-language-enabled
[4] M. ╦çZelezn'y, Z. Kr ╦çnoul, P. C'─▒sa╦çr, and J. Matou╦çsek, "Design, implementation and evaluation of the czech realistic audio-visual speech synthesis," Signal Procesing, Special section: Multimodal human-computer interfaces, vol. 86, pp. 3657-3673, 2006.
[5] V. Radov'a and P. Vop'alka, "Methods of sentences selection for readspeech corpus design," Lecture Notes In Computer Science, vol. 1692, 1999.
[6] J. Psutka, L. M┬¿uller, J. Matou╦çsek, and V. Radov'a, Mluv'─▒me s po╦çc'─▒ta╦çcem ╦çcesky, 1st ed. Praha: Academia, 2006.
[7] A. MacLeod and Q. Summerfield, "A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: rationale, evaluation, and recommendations for use," British Journal of Audiology, 24(1), 29-43, 1990.
[8] A. B┬¿ohmov'a, J. Haji╦çc, E. Haji╦çcov'a, and B. Hladk'a, "The prague dependency treebank: Three-level annotation scenario," Treebanks: Building and Using Syntactically Annotated Corpora, ed. Anne Abeille. Kluwer Academic Publishers, 2001.
[9] M. M. Cohen and D. W. Massaro, "Modeling coarticulation in synthetic visual speech," in Models and Techniques in Computer Animation, N. M. Thalmann & D. Thalmann, Ed. Tokyo: Springer-Verlag, 1993.
[10] Z. Kr ╦çnoul and M. ╦çZelezn'y, "Development of czech talking head," in Proceedings of Interspeech 2008, Brisbane, Australia, 2008.
[11] J. Beskow, "Trainable articulatory control models for visual speech synthesis," International Journal of Speech Technology, 2004, submitted.
[12] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, 1st ed. Chapman and Hall, Boca Raton, 1998.
[13] Z. Kr ╦çnoul and M. ╦çZelezn'y, "Realistic face animation for a Czech Talking Head," in Proceedings of TEXT, SPEECH and DIALOGUE, TSD 2004, Brno, Czech republic, 2004.
[14] Z. Kr ╦çnoul, M. ╦çZelezn'y, P. C'─▒sa╦çr, and J. Holas, "Viseme analysis for speech-driven facial animation for czech audio-visual speech synthesis," in Proceedings of SPECOM 2005, University of Patras, Greece, 2005.
[15] Z. Kr ╦çnoul, P. C'─▒sa╦çr, and M. ╦çZelezn'y, "Face model reconstruction for czech audio-visual speech synthesis," in SPECOM 2004, St. Petersburg, Russian Federation, 2004.