{"title":" A Smart-Visio Microphone for Audio-Visual Speech Recognition \u201cVmike\u201c ","authors":"Y. Ni, K. Sebri","volume":2,"journal":"International Journal of Electronics and Communication Engineering","pagesStart":157,"pagesEnd":161,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/6893","abstract":"
The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X\/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.<\/p>\r\n","references":"
[1] Yashwanth H, Harish Mahendrakar and Suman David, "Automatic Speech Recognition using Audio Visual Cues", IEEE India Annual Conference 2004, INDICON 2004.\r\n[2] L. Liang, X. Liu, Y. Zhao, X. Pi, and A. V. Nefian, "Speaker Independent AUDIO-VISUAL Continuous Speech Recognition", In IEEE International Conference on Acoustics, 2002.\r\n[3] J. Huang, G. Potamianos, and C. Neti, "Improving Audio Visual Speech Recognition with an infrared Headset", AVSP 2003 - International Conference on Audio-Visual Processing, St. Jorioz, France, September 4-7, 2003.\r\n[4] M. Kaynak, Q. Zhi, A. D. Cheok, K. Sengupta, Z. Jian, and K. Chi Chung, "Analysis of lip Geometric Features for Audio-Visual Speech Recognition", IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, VOL.34, NO.4, July 2004.\r\n[5] C. Neti, G. Potamianos, J. Luettin, I. Matthews, D. Vergyri, J. Sison, A. Mashari, and J. Zhou, "Audio-Visual Speech Recognition", WORKSHOP 2000 Final report, October 12, 2000.\r\n[6] www.globalsecurity.org\/security\/systems\/voice.htm.\r\n[7] A. K. Jain, A. Ross and S. Prabhakar, "An introduction to Biometric Recognition", IEEE Transactions on Circuits and Systems for video Technology, Special issue on Image-and-video-Based Biometrics, VOL. 14, No. 1, January 2004.\r\n[8] www.thalesgroup.com\/avionics\/markets\/military_aircraft<\/p>\r\n","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 2, 2007"}