Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33156
Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control

Authors: Van Nhan Nguyen, Harald Holone

Abstract:

Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.

Keywords: Automatic Speech Recognition, ASR, Air Traffic Control, ATC.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1108428

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4058

References:


[1] H. AlShu’eili, G. Sen Gupta, and S. Mukhopadhyay. Voice recognition based wireless home automation system. In Mechatronics (ICOM), 2011 4th International Conference On, pages 1–6, May 2011.
[2] Tanel Alum¨ae and Leo V˜ohandu. Limited-vocabulary estonian continuous speech recognition system using hidden markov models. Informatica, 15(3):303–314, 2004.
[3] Hamid Behravan. Dialect and accent recognition. PhD thesis, 2012.
[4] Francesco Beritelli and Salvatore Serrano. A robust low-complexity algorithm for voice command recognition in adverse acoustic environments. In 2006 8th International Conference on Signal Processing, volume 3. IEEE, 2006.
[5] Fadi Biadsy. Automatic dialect and accent recognition and its application to speech recognition. PhD thesis, Columbia University, 2011.
[6] Shantanu Chakrabartty, Guneet Singh, and Gert Cauwenberghs. Hybrid support vector machine/hidden markov model approach for continuous speech recognition. In Circuits and Systems, 2000. Proceedings of the 43rd IEEE Midwest Symposium on, volume 2, pages 828–831. IEEE, 2000.
[7] Rahul Chitturi, Venkatesh Keri, Gopalakrishna Anumanchipalli, and Sachin Joshi. Lexical modeling for non native speech recognition using neural networks. In Proceedings of the International Conference on Natural Language Processing (ICON–2005), page 79. Allied Publishers, 2005.
[8] Noah B. Coccaro. Latent Semantic Analysis As a Tool to Improve Automatic Speech Recognition Performance. PhD thesis, Boulder, CO, USA, 2005. AAI3190360.
[9] Jos´e Manuel Cordero, Manuel Dorado, and Jos´e Miguel de Pablo. Automated speech recognition in atc environment. In Proceedings of the 2nd International Conference on Application and Theory of Automation in Command and Control Systems, pages 46–53. IRIT Press, 2012.
[10] Jos´e Manuel Cordero, Natalia Rodr´ıguez, Jos´e Miguel, and Manuel Dorado. Automated speech recognition in controller communications applied to workload measurement. Third SESAR Innovation Days, 2013.
[11] KH Davis, R Biddulph, and Stephen Balashek. Automatic recognition of spoken digits. The Journal of the Acoustical Society of America, 24(6):637–642, 1952.
[12] M. De Wachter, M. Matton, K. Demuynck, P. Wambacq, R. Cools, and D. Van Compernolle. Template-based continuous speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(4):1377–1390, May 2007.
[13] Li Deng, Khaled Hassanein, and M Elmasry. Analysis of the correlation structure for a neural predictive model with application to speech recognition. Neural Networks, 7(2):331–339, 1994.
[14] Li Deng, Geoffrey Hinton, and Brian Kingsbury. New types of deep neural network learning for speech recognition and related applications: An overview. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8599–8603. IEEE, 2013.
[15] Scott Durling and Jo Lumsden. Speech recognition use in healthcare applications. In Proceedings of the 6th international conference on advances in mobile computing and multimedia, pages 473–478. ACM, 2008.
[16] Hakan Erdogan, Ruhi Sarikaya, Stanley F Chen, Yuqing Gao, and Michael Picheny. Using semantic analysis to improve speech recognition performance. Computer Speech & Language, 19(3):321–343, 2005.
[17] Eurocontrol. All clear? the path to clear communication. icao standard phraseology a quick reference guide for commercial air transport pilots. http://www.skybrary.aero/bookshelf/books/115.pdf, 2011.
[18] AJV-0 VP Mission Support Federal Aviation Administration. Air traffic control - chapter 2. general control, faa 7110.65 2-1-1. Technical report, February 19, 2014.
[19] F Fern´andez, J Ferreiros, JM Pardo, V Sama, R de C´ordoba, J Marias-Guarasa, JM Montero, R San Segundo, LF d’Haro, M Santamar´ıa, et al. Automatic understanding of atc speech. Aerospace and Electronic Systems Magazine, IEEE, 21(10):12–17, 2006.
[20] J. Ferreiros, J.M. Pardo, R. de Crdoba, J. Macias-Guarasa, J.M. Montero, F. Fernndez, V. Sama, L.F. d’Haro, and G. Gonzlez. A speech interface for air traffic control terminals. Aerospace Science and Technology, 21(1):7 – 15, 2012.
[21] Sadaoki Furui. 50 years of progress in speech and speaker recognition. SPECOM 2005, Patras, pages 1–9, 2005.
[22] Sadaoki Furui, Masanobu Nakamura, Tomohisa Ichiba, and Koji Iwano. Why is the recognition of spontaneous speech so hard? In Text, Speech and Dialogue, pages 9–22. Springer, 2005.
[23] Santosh K Gaikwad, Bharti W Gawali, and Pravin Yannawar. A review on speech recognition technique. International Journal of Computer Applications, 10(3):16–24, 2010.
[24] M.J.F. Gales and S.J. Young. Robust continuous speech recognition using parallel model combination. Speech and Audio Processing, IEEE Transactions on, 4(5):352–359, Sep 1996.
[25] J. Gauvain and Chin-Hui Lee. Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. Speech and Audio Processing, IEEE Transactions on, 2(2):291–298, Apr 1994.
[26] Claudiu-Mihai Geac˘ar. Reducing pilot/atc communication errors using voice recognition. In Proceedings of ICAS, volume 2010, 2010.
[27] Yitagessu Birhanu Gebremedhin, Frank Duckhorn, R¨udiger Hoffmann, and Ivan Kraljevski. A new approach to develop a syllable based, continuous amharic speech recognizer. In EUROCON, 2013 IEEE, pages 1684–1689. IEEE, 2013.
[28] Wiqas Ghai and Navdeep Singh. Literature review on automatic speech recognition. International Journal of Computer Applications, 41(8):42–50, 2012.
[29] A. Graves, N. Jaitly, and A.-R. Mohamed. Hybrid speech recognition with deep bidirectional lstm. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on, pages 273–278, Dec 2013.
[30] Hartmut Helmke, Heiko Ehr, and Matthias Kleinert. Increased acceptance of controller assistance by automatic speech recognition. Tenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2013), 2013.
[31] Horst Hering. Technical analysis of atc controller to pilot voice communication with regard to automatic speech recognition systems. EEC note, 1, 2001.
[32] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82–97, 2012.
[33] John-Paul Hosom. The cslu toolkit: A platform for research and development of spoken-language systems. Center for Spoken Language Understanding (CSLU), OGI Campus, Oregon Health & Science University (OGI/OHSU), visitado em Janeiro de, 2002.
[34] Zhang Hua and Wei Lieh Ng. Speech recognition interface design for in-vehicle system. In Proceedings of the 2nd International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pages 29–33. ACM, 2010.
[35] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, and Raj Foreword By-Reddy. Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall PTR, 2001.
[36] Xuedong Huang, James Baker, and Raj Reddy. A historical perspective of speech recognition. Commun. ACM, 57(1):94–103, January 2014.
[37] Karlsson Joakim. The integration of automatic speech recognition into the air traffic control system. Technical report, Cambridge, Mass.: Flight Transportation Laboratory, Dept. of Aeronautics and Astronautics, Massachusetts Institute of Technology,
[1990], 1990.
[38] Rhys James Jones, Simon Downey, and John S. Mason. Continuous speech recognition using syllables. In In Proc. Eurospeech ’97, pages 1171–1174, 1997.
[39] Biing-Hwang Juang and Lawrence R Rabiner. Automatic speech recognition–a brief history of the technology development. Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara, 1, 2005.
[40] Daniel Jurafsky, Chuck Wooters, Gary Tajchman, Jonathan Segal, Andreas Stolcke, Eric Foster, and Nelson Morgan. The berkeley restaurant project. In ICSLP, volume 94, pages 2139–2142, 1994.
[41] H.D. Kopald, A. Chanen, Shuo Chen, E.C. Smith, and R.M. Tarakan. Applying automatic speech recognition technology to air traffic management. In Digital Avionics Systems Conference (DASC), 2013 IEEE/AIAA 32nd, pages 6C3–1–6C3–15, Oct 2013.
[42] Cini Kurian and Kannan Balakriahnan. Continuous speech recognition system for malayalam language using plp cepstral coefficient. Journal of Computing and Business Research, 3(1), 2012.
[43] KF Leung, FH Frank Leung, HK Lam, and Peter Kwong-Shun Tam. Neural fuzzy network and genetic algorithm approach for cantonese speech command recognition. In 2003. FUZZ’03. The 12th IEEE International Conference on Fuzzy Systems, volume 1, pages 208–213. IEEE, 2003.
[44] Edward C Lin, Kai Yu, Rob A Rutenbar, and Tsuhan Chen. A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single fpga. In Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, pages 60–68. ACM, 2007.
[45] F Marque, SK Bennacef, F Neel, and S Trinh. Parole: a vocal dialogue system for air traffic control training. In Applications of Speech Technology, 1993.
[46] LG Miller and S Levinson. Syntactic analysis for large vocabulary speech recognition using a context-free covering grammar. In Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on, pages 271–274. IEEE, 1988.
[47] M. Nofal, E. Abdel-Raheem, H. El Henawy, and N.A. Kader. Acoustic training system for speaker independent continuous arabic speech recognition system. In Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004., pages 200–203, Dec 2004.
[48] Jan Novotn`y, Pavel Sovka, and Jan Uhl´ıˇr. Analysis and optimization of telephone speech command recognition system performance in noisy environment. Radioengineering, 13(1):1, 2004.
[49] JM Pardo, J Ferreiros, F Fernandez, Valentin Sama, R De Cordoba, Javier Macias-Guarasa, JM Montero, R San-Segundo, LF D’Haro, and Germ´an Gonz´alez. Automatic understanding of atc speech: Study of prospectives and field experiments for several controller positions. IEEE Transactions on Aerospace and Electronic Systems, 47(4):2709–2730, 2011.
[50] B.L. Pellom, R. Sarikaya, and J.H.L. Hansen. Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition. Signal Processing Letters, IEEE, 8(8):221–224, Aug 2001.
[51] Omprakash Prabhakar and Navneet Kumar Sahu. A survey on: Voice command recognition technique. International Journal of Advanced Research in Computer Science And Software Engineering, 3(5), 2013.
[52] V Radha and C Vimala. A review on speech recognition challenges and approaches. doaj. org, 2(1):1–7, 2012.
[53] V. Radha, C. Vimala, and M. Krishnaveni. Continuous speech recognition system for tamil language using monophone-based hidden markov model. In Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, CCSEIT ’12, pages 227–231, New York, NY, USA, 2012. ACM.
[54] D. Schaefer. Context-sensitive speech recognition in the air traffic control simulation. EEC Technical/Scientific Report No. 2001-004, 2001.
[55] ICAO Secretariat. Outlook for air transport to the year 2025. Report No. Cir, 313, 2007.
[56] Hussien Seid and Bj¨orn Gamb¨ack. A speaker independent continuous speech recognizer for amharic. INTERSPEECH 2005, 2005.
[57] Benjamin J Shannon and Kuldip K Paliwal. Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition. Speech Communication, 48(11):1458–1485, 2006.
[58] CMU Sphinx. Cmu sphinx: Open source toolkit for speech recognition. Retrieved, 8(13):2010, 2010.
[59] Georg Stemmer, Elmar N¨oth, and Heinrich Niemann. The utility of semantic-pragmatic information and dialogue-state for speech recognition in spoken dialogue systems. In Text, Speech and Dialogue, pages 439–444. Springer, 2000.
[60] Stevenson. Oxford dictionary of english.
[61] Glenn Taylor, J Miller, and Jeff Maddox. Automating simulation-based air traffic control. In Interservice/Industry Training, Simulation, and Education Conference, volume 2193, 2005.
[62] R. Thangarajan, A. M. Natarajan, and M. Selvam. Word and triphone based approaches in continuous speech recognition for tamil language. WSEAS Trans. Sig. Proc., 4(3):76–85, March 2008.
[63] R Thangarajan, AM Natarajan, and M Selvam. Syllable modeling in continuous speech recognition for tamil language. International Journal of Speech Technology, 12(1):47–57, 2009.
[64] Edmondo Trentin and Marco Gori. A survey of hybrid ann/hmm models for automatic speech recognition. Neurocomputing, 37(1):91–126, 2001.
[65] Thanassis Trikas. Automated speech recognition in air traffic control. Technical report, Cambridge, Mass.: Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, Flight Transportation Laboratory, 1987, 1987.
[66] Karen Ward. A speech act model of air traffic control dialogue. 1992.
[67] MARTA WRONISZEWSKA and JACEK DZIEDZIC. Voice command recognition using hybrid genetic algorithm. TASK QUARTERLY, 14(4):377–396, 2010.
[68] Dong Yu and Li Deng. Deep neural network-hidden markov model hybrid systems. In Automatic Speech Recognition, pages 99–116. Springer, 2015.
[69] Bartosz Zi´ołko, Suresh Manandhar, Richard C Wilson, and Mariusz Zi´ołko. Semantic modelling for speech recognition. Proceedings of Speech Analysis, Synthesis and Recognition. Applications in Systems for Homeland Security, Piechowice, Poland, 2008.