Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Face Localization Using Illumination-dependent Face Model for Visual Speech Recognition

Authors: Robert E. Hursig, Jane X. Zhang

Abstract:

A robust still image face localization algorithm capable of operating in an unconstrained visual environment is proposed. First, construction of a robust skin classifier within a shifted HSV color space is described. Then various filtering operations are performed to better isolate face candidates and mitigate the effect of substantial non-skin regions. Finally, a novel Bhattacharyya-based face detection algorithm is used to compare candidate regions of interest with a unique illumination-dependent face model probability distribution function approximation. Experimental results show a 90% face detection success rate despite the demands of the visually noisy environment.

Keywords: Audio-visual speech recognition, Bhattacharyyacoefficient, face detection,

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1085285

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1247

References:


[1] G. Potamianos, J. Luettin, and I. Matthews, "Audio-Visual Automatic Speech Recognition: An Oveview," Issues in Visual and Audio-Visual Speech Processing, MIT Press, rCh 10, 2004.
[2] D.G. Stork and M.E. Hennecke, "Speechreading by Humans and Machines" in NATO ASI Series F, vol 150, Springer Verlag, 1996.
[3] B. Lee, M. Hasegawa-Johnson, C. Goudeseune, S. Kamdar, S. Borys, M. Liu, T. Huang, "AVICAR: Audio-Visual Speech Corpus in a Car Environment," INTERSPEECH2004-ICSLP, 2004.
[4] Viola, Jones, "Robust Real-time Object Detection," IJCV 2001.
[5] P. Delmas, M. Lievin, "From Face Features Analysis to Automatic Lip Reading. Seventh International Conference on Control, Automation, Robotics and Vision, vol. 3, Dec. 2-5, 2002.
[6] K. Kumar, C. Tsuhan, R.M. Stern, "Profile View Lip Reading," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, April 15-20, 2007.
[7] L.G. da Silveira, J. Facon, D.L. Borges, "Visual Speech Recognition: A Solution from Feature Extraction to Words Classification," sibgrapi, XVI Brazilian Symposium on Computer Graphics and Image Processing, 2003.
[8] M.C. Shin, K.I. Chang, L.V. Tsap, "Does Color Space Transformation Make Any Difference on Skin Detection?" WACV: Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision, Washington DC, IEEE Computer Society, 2002.
[9] M.J. Jones, J.M. Rehg, "Statistical Color Models with Application to Skin Detection," International Journal of Computer Vision, vol. 46, no.1, 2006
[10] Y. Ming-Hsuan, A. Narendra, "Detecting Human Faces in Color Images," Proceedings of the International Conference on Image Processing, vol. 1, 1998.
[11] X. Zhang, H. A. Montoya, and B. Crow, "Finding Lips in Unconstrained Imagery for Improved Automatic Speech Recognition," Proceedings of 9th International Conference on Visual Information Systems, 2007.
[12] M. Abdel-Mottaleb, A. Ellgammal, "Face Detection in Complex Environments from Color Images," Proceedings of the International Conference on Image Processing, vol.3, 1999
[13] B. Crow, "Automated Location and Tracking of Facial Features in an Unconstrained Environment," Master-s Thesis, California Polytechnic State University, 2008.
[14] J. van de Weijer, Th. Gevers, and A. Gijsenij, "Edge-based color constancy," in Trans. On Image Processing, 2007.
[15] G.D. Finlayson and E. Trezzi, , "Shades of gray and colour constancy," in Proc. Of the 12th Color Imaging Conference, 2004.