A Smart-Visio Microphone for Audio-Visual Speech Recognition “Vmike“
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
A Smart-Visio Microphone for Audio-Visual Speech Recognition “Vmike“

Authors: Y. Ni, K. Sebri

Abstract:

The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.

Keywords: Audio-Visual Speech recognition, CMOS Smartsensor, On-Chip image processing.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1332808

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1829

References:


[1] Yashwanth H, Harish Mahendrakar and Suman David, "Automatic Speech Recognition using Audio Visual Cues", IEEE India Annual Conference 2004, INDICON 2004.
[2] L. Liang, X. Liu, Y. Zhao, X. Pi, and A. V. Nefian, "Speaker Independent AUDIO-VISUAL Continuous Speech Recognition", In IEEE International Conference on Acoustics, 2002.
[3] J. Huang, G. Potamianos, and C. Neti, "Improving Audio Visual Speech Recognition with an infrared Headset", AVSP 2003 - International Conference on Audio-Visual Processing, St. Jorioz, France, September 4-7, 2003.
[4] M. Kaynak, Q. Zhi, A. D. Cheok, K. Sengupta, Z. Jian, and K. Chi Chung, "Analysis of lip Geometric Features for Audio-Visual Speech Recognition", IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, VOL.34, NO.4, July 2004.
[5] C. Neti, G. Potamianos, J. Luettin, I. Matthews, D. Vergyri, J. Sison, A. Mashari, and J. Zhou, "Audio-Visual Speech Recognition", WORKSHOP 2000 Final report, October 12, 2000.
[6] www.globalsecurity.org/security/systems/voice.htm.
[7] A. K. Jain, A. Ross and S. Prabhakar, "An introduction to Biometric Recognition", IEEE Transactions on Circuits and Systems for video Technology, Special issue on Image-and-video-Based Biometrics, VOL. 14, No. 1, January 2004.
[8] www.thalesgroup.com/avionics/markets/military_aircraft