Realtime Lip Contour Tracking For Audio-Visual Speech Recognition Applications

Mehran Yazdi; Mehdi Seyfi; Amirhossein Rafati; Meghdad Asadi

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Realtime Lip Contour Tracking For Audio-Visual Speech Recognition Applications

Authors: Mehran Yazdi, Mehdi Seyfi, Amirhossein Rafati, Meghdad Asadi

Abstract:

Detection and tracking of the lip contour is an important issue in speechreading. While there are solutions for lip tracking once a good contour initialization in the first frame is available, the problem of finding such a good initialization is not yet solved automatically, but done manually. We have developed a new tracking solution for lip contour detection using only few landmarks (15 to 25) and applying the well known Active Shape Models (ASM). The proposed method is a new LMS-like adaptive scheme based on an Auto regressive (AR) model that has been fit on the landmark variations in successive video frames. Moreover, we propose an extra motion compensation model to address more general cases in lip tracking. Computer simulations demonstrate a fair match between the true and the estimated spatial pixels. Significant improvements related to the well known LMS approach has been obtained via a defined Frobenius norm index.

Keywords: Lip contour, Tracking, LMS-Like

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1059881

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1740

References:

[1] R. Caucic et al., "Real time lip tarcking for audio-visual speech recognition applications,"Proc. European Conf. Computer Vision, Cambridge, UK, pp. 376-387, April 1996.
[2] S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition,"IEEE Transactions on Multimedia, vol. 2, no. 3, pp.141-151, Sept 2000.
[3] S. L. Wang, W. H. Lau, and S. H. Leung. "A new real-time lip contour extraction algorithm" Proc. IEEE international conference on Acoustics, Speech and Signal Processing, ICASSP- 03, Hong Kong, Vol. 3, pp. 578- 582, April 2003.
[4] I. Mattews, T.F Cootes, J.A Bangham, S. Cox, R. Harvey, Extraction of visual features for lipreading, IEEE Tran. on PAMI, vol.24, pp.198-213, Feb. 2002.
[5] S. Mirsaidi and G. A. Fleury, and J. Oskman, "LMS like AR modeling in the case of missing observations," IEEE Trans. on signal processing, vol. 45, no. 6, pp.1574-1583 , June 1997.