Effect of Visual Speech in Sign Speech Synthesis
Authors: Zdenek Krnoul
Abstract:
This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips. Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech. Proposed methodology is inspected on Czech speech. Hence, this article presents a procedure of recording of speech data in order to set a synthesis system as well as to evaluate synthesized speech. Furthermore, one option of the evaluation process is elaborated in the form of a perceptual test. This test procedure is verified on the measured data with two settings of the synthesis system. The results of the perceptual test are presented as a statistically significant increase of intelligibility evoked by real and synthesized visual speech. Now, the aim is to show one part of evaluation process which leads to more comprehensive evaluation of the sign speech synthesis system.
Keywords: Perception test, Sign speech synthesis, Talking head, Visual speech.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1332124
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480References:
[1] R. Conrad, The deaf school child. London: Harper & Row, 1979.
[2] O. Velehradsk'a and K. Kuchler, "Pr °uzkum ˇcten'aˇrsk'ych dovednost'ı na ˇskol'ach pro dˇeti s vadami sluchu," INFO-Zpravodaj FRPSP, vol. 6, 1998.
[3] P. Campr, M. Hr 'uz, A. Karpov, P. Santemiz, M. ˇZelezn'y, and O. Aran, "Sign-language-enabled information kiosk," 2009. (Online). Available: http://www.kky.zcu.cz/en/publications/CamprP 2009 Sign-language-enabled
[4] M. ˇZelezn'y, Z. Kr ˇnoul, P. C'ısaˇr, and J. Matouˇsek, "Design, implementation and evaluation of the czech realistic audio-visual speech synthesis," Signal Procesing, Special section: Multimodal human-computer interfaces, vol. 86, pp. 3657-3673, 2006.
[5] V. Radov'a and P. Vop'alka, "Methods of sentences selection for readspeech corpus design," Lecture Notes In Computer Science, vol. 1692, 1999.
[6] J. Psutka, L. M¨uller, J. Matouˇsek, and V. Radov'a, Mluv'ıme s poˇc'ıtaˇcem ˇcesky, 1st ed. Praha: Academia, 2006.
[7] A. MacLeod and Q. Summerfield, "A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: rationale, evaluation, and recommendations for use," British Journal of Audiology, 24(1), 29-43, 1990.
[8] A. B¨ohmov'a, J. Hajiˇc, E. Hajiˇcov'a, and B. Hladk'a, "The prague dependency treebank: Three-level annotation scenario," Treebanks: Building and Using Syntactically Annotated Corpora, ed. Anne Abeille. Kluwer Academic Publishers, 2001.
[9] M. M. Cohen and D. W. Massaro, "Modeling coarticulation in synthetic visual speech," in Models and Techniques in Computer Animation, N. M. Thalmann & D. Thalmann, Ed. Tokyo: Springer-Verlag, 1993.
[10] Z. Kr ˇnoul and M. ˇZelezn'y, "Development of czech talking head," in Proceedings of Interspeech 2008, Brisbane, Australia, 2008.
[11] J. Beskow, "Trainable articulatory control models for visual speech synthesis," International Journal of Speech Technology, 2004, submitted.
[12] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, 1st ed. Chapman and Hall, Boca Raton, 1998.
[13] Z. Kr ˇnoul and M. ˇZelezn'y, "Realistic face animation for a Czech Talking Head," in Proceedings of TEXT, SPEECH and DIALOGUE, TSD 2004, Brno, Czech republic, 2004.
[14] Z. Kr ˇnoul, M. ˇZelezn'y, P. C'ısaˇr, and J. Holas, "Viseme analysis for speech-driven facial animation for czech audio-visual speech synthesis," in Proceedings of SPECOM 2005, University of Patras, Greece, 2005.
[15] Z. Kr ˇnoul, P. C'ısaˇr, and M. ˇZelezn'y, "Face model reconstruction for czech audio-visual speech synthesis," in SPECOM 2004, St. Petersburg, Russian Federation, 2004.