Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31108
Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian

Authors: Sanja Seljan, Ivan Dunđer


The paper presents combined automatic speech recognition (ASR) of English and machine translation (MT) for English and Croatian and Croatian-English language pairs in the domain of business correspondence. The first part presents results of training the ASR commercial system on English data sets, enriched by error analysis. The second part presents results of machine translation performed by free online tool for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.

Keywords: Speech Recognition, automatic machine translation, quality evaluation, integrated language technologies

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2564


[1] S.-A. Selouani, T.-H. Lê, C. Moghrabi, B. Lanteigne, and J. Roy, “Online Collaborative Learning System Using Speech Technology,” WASET, International Journal of Social Sciences, 2(2), 2008, pp. 665- 670.
[2] S.-Y. Suk, and H.-Y. Chung, “A speech and character combined recognition engine for mobile devices,” International Journal of Pervasive Computing and Communications, 4(2), 2008, pp. 232-249.
[3] M. Wald, “Creating accessible educational multimedia through editing automatic speech recognition captioning in real time,” Interactive Technology and Smart Education, 3(2), 2006, pp. 131-141.
[4] M. Miyabe, T. Fukushima, T. Yoshino, and A. Shigeno, „Development of Circulating Support Environment of Multilingual Medical Communication using Parallel Texts for Foreign Patients”, International Conference on Health and Medical Informatics (ICHMI 2010), World Academy of Science, Engineering and Technology – WASET (4), 2010, pp. 212-216.
[5] S. Yamamoto, “10 Emerging Technologies That Will Change Your World. Engineering Management Review,” IEEE, 32(2), 2004, pp. 32- 51.
[6] E. Vidal, F. Casacuberta, L. Rodriguez, J. Civera, and C. D. M. Hinarejos, “Computer-assisted translation using speech recognition,” Audio, Speech, and Language Processing, IEEE Transactions, 14(3), 2006, pp. 941-951.
[7] L. Frädrich, and D. Anastasiou, “Siri vs. Windows Speech Recognition,” Translation Journal, 16(3), 2012.
[8] K. Harrenstien, “Automatic captions in You Tube, Google Official Blog,” Retrieved on 3rd of April 2014 from: html
[9] H. Sawaf, “Automatic Speech Recognition and Hybrid Machine translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast,” Proceedings of Association for Machine Translation in the Americas – AMTA. San Diego, United States of America, 2012.
[10] M. J. F. Gales, X. Liu, R. Sinha, P. C. Woodland, K. Yu, S. Matsoukas, T. Ng, K. Nguyen, L. Nguyen, J.-L. Gauvain, L. Lamel, and A. Messaoudi, “Speech Recognition System Combination for Machine Translation,” Proceedings of the International Conference On Acoustics, Speech and Signal Processing. Honolulu, United States of America, 2007, pp. 1277-1280.
[11] J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Garret, and B. Strope, “Google Search by Voice: A case study,” Advances in Speech Recognition. Mobile Environments, Call Centers and Clinics, 2010, pp. 61-90.
[12] A. Reddy, R.-C. Rose, and A. Désilets, “Integration of ASR and Machine Translation Models in a Document Translation Task,” Proceedings of the International Conference of the International Speech Communication INTERSPEECH, Antwerp, Belgium, 2007, pp. 2457- 2460.
[13] S. Burgstahler, “Working Together: People with Disabilities and Computer Technology,” Seattle, University of Washington, 2012.
[14] B. Andresen, “Literacy, assistive technology and e-inclusion,“ Journal of Assistive Technologies,” 1(1), 2007, pp. 10-14.
[15] S. Judge, Z. Robertson, and M. Hawley, “The limitations of speech control: perceptions of provision of speech-driven environmental controls,” Journal of Assistive Technologies, 5(1), 2011, pp. 4-11.
[16] F. Casacuberta, M. Federico, H. Ney, and E. Vidal, “Recent efforts in spoken language translation,” Signal Processing Magazine, IEEE, 25(3), 2008, pp. 80-88.
[17] S. Seljan et al., “Computational Language Analysis: Computer-Assisted Translation and e-Language Learning”, (reprint of published papers), Zagreb: Department of Information and Communication studies of Faculty of Humanities and Social Sciences Zagreb, 2012, ch. 1-5.
[18] I. Dunđer, S. Seljan, and M. Arambašić, “Domain-specific Evaluation of Croatian Speech Synthesis in CALL,” Recent Advances in Information Science, Athens, WSEAS Press, 2013, pp. 142-147. (7th European Computing Conference Dubrovnik, Croatia).
[19] S. Seljan, and I. Dunđer, “Automatic word-level evaluation and error analysis of formant speech synthesis for Croatian,” Recent Advances in Information Science - Recent Advances in Computer Engineering Series 17. Athens, WSEAS, 2013, pp. 172-178. (4th European Conference of Computer Science, Paris, France).
[20] D. Boras, and N. Lazić, “Aspects of a Theory and the Present State of Speech Synthesis,” Proceedings of the 29th International Convention MIPRO: Computer in Technical Systems, Rijeka, Croatian Society for Information and Communication Technology, Electronics and Microelectronics – MIPRO, 2006, pp. 187-190.
[21] Z. Handley, “Is text-to-speech synthesis ready for use in computerassisted language learning?,” Speech Communication, 51(10), 2009, pp. 906-919.
[22] Z. Handley, and M.-J. Hamel, “Establishing a Methodology for Benchmarking Speech Synthesis for Computer-Assisted Language Learning (CALL),” Language Learning & Technology, 9(3), 2005, pp. 99-120.
[23] F. Ehsani, “Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm,” Language Learning & Technology, 2(1), 1998, pp. 54-73.
[24] A. Black, R. Brown, R. Frederking, K. Lenzo, J. Moody, A. Rudnicky, R. Singh, and E. Steinbrecher, “Rapid Development of A Speech-to- Speech Translation Systems,” Proceedings of the International Conference on Spoken Language Processing in Denver, United States of America, 2002.
[25] A. Black, R. Brown, R. Frederking, R. Singh, J. Moody, and E. Steinbrecher, “TONGUES: Rapid Development of a Speech-to-Speech Translation System,” Proceedings of the 2nd International Conference on Human Language Technology Research – HLT, San Diego, United States of America, 2002, pp. 183-186.
[26] D. Jurafsky, J. Martin, and A. Kehler, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. New Jersey: Prentice Hall, 2000.
[27] M. Popović, and H. Ney, “Word Error Rates: Decomposition over POS Classes and Applications for Error Analysis,” Proceedings of the Second Workshop on Statistical Machine Translation, Association for Computational Linguistics, Prague, Czech Republic, 2007, pp. 48-55.
[28] L. Dybkajær, N. O. Bernsen, and W. Minker, “Overview of Evaluation and Usability,” Spoken Multimodal Human-Computer Dialogue in Mobile Environments: Text, Speech and Language Technology, 28, 2005, pp. 221-246.
[29] M. Popović, “Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output,” The Prague Bulletin of Mathematical Linguistics, 96(1), 2011, pp. 59-67.
[30] V. I. Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions and Reversals,” Soviet Physics Doklady, 10(8), 1966, pp. 707- 710.
[31] J.-M. Torres-Moreno, “Beyond Stemming and Lemmatization: Ultrastemming to Improve Automatic Text Summarization,” 2012. arXiv:1209.3126 (cs.IR).
[32] A. Gesmundo, and T. Samardžić, “Lemmatisation as a Tagging Task,” Association for Computational Linguistics, 2, 2012, pp. 368-372.