Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30372
Analysis of Vocal Fold Vibrations from High-Speed Digital Images Based On Dynamic Time Warping

Authors: K. Ahmad, A. I. A. Rahman, Sh-Hussain Salleh, K. Anuar


Analysis of vocal fold vibration is essential for understanding the mechanism of voice production and for improving clinical assessment of voice disorders. This paper presents a Dynamic Time Warping (DTW) based approach to analyze and objectively classify vocal fold vibration patterns. The proposed technique was designed and implemented on a Glottal Area Waveform (GAW) extracted from high-speed laryngeal images by delineating the glottal edges for each image frame. Feature extraction from the GAW was performed using Linear Predictive Coding (LPC). Several types of voice reference templates from simulations of clear, breathy, fry, pressed and hyperfunctional voice productions were used. The patterns of the reference templates were first verified using the analytical signal generated through Hilbert transformation of the GAW. Samples from normal speakers’ voice recordings were then used to evaluate and test the effectiveness of this approach. The classification of the voice patterns using the technique of LPC and DTW gave the accuracy of 81%.

Keywords: dynamic time warping, glottal area waveform, linear predictive coding, high-speed laryngeal images, Hilbert transform

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1969


[1] P. Gomez, R. Fernandez, A. Nieto, F. Diaz, F.J. Fernandez, V. Rodellar, A. Alvarez, and R. Martinez, Evaluation of Voice Pathology Based on the Estimation of Vocal Fold Biomechanical Parameters, Journal of Voice. 2006, 21(4), 450-476.
[2] M. Dollinger, J. Lohscheller, J. Svec, A. McWhorter, and M. Kunduk, "Support Vector Machine Classification of Vocal Fold Vibrations based on Phonovibrogram Features”, Advances in Vibration Analysis Research, 2010, pp. 435-456.
[3] U. Hoppe, Mechanisms of Hoarseness–Visualization and Interpretation by Means of Nonlinear Dynamics. Aachen, Germany: Shaker, 2001.
[4] K. Ahmad, Y. Yan, and D. M. Bless, Vocal Fold Vibratory Characteristics in Normal Female Speakers From High-Speed Digital Imaging. Journal of Voice. 2011,1-15.
[5] K. Ahmad, Y. Yan, and D. M. Bless, Vocal Fold Vibratory Characteristics of Healthy Geriatric Females - Analysis of High-Speed Digital Images. Journal of Voice., 2012.
[6] I. R. Titze, Workshop on Acoustic Voice Analysis: Summary Statement. Denver, CO: National Center for Voice and Speech, Wilbur James Gould Research Center; February 17, 1994.
[7] T. Wittenberg, M. Tigges, P. Mergell, U. Eysholdt, "Functional Imaging of Vocal Fold Vibration: Digital Multislice High-Speed Kymography” Journal of Voice. 2000 ;14(3):422–442.
[8] R. Schwarz, U. Hoppe, M. Schuster, T. Wurzbacher, U. Eysholdt, and J. Lohscheller, Classification of unilateral vocal fold paralysis by endoscopic digital highspeed recordings and inversion of a biomechanical model. IEEE Transactions on Biomedical Engineering, 2006, 53(6), 1099-1108.
[9] Y. Zhang, E. Bieging, H. Tsui, and J. J. Jiang, Efficient and effective extraction of vocal fold vibratory patterns from high-speed digital imaging. Journal of Voice. In press, 2009.
[10] Y. Yan, K. Ahmad, M. Kunduk, D. Bless, Analysis Of Vocal-Fold Vibrations From High-Speed Laryngeal Images Using A Hilbert Transform-Based Methodology. Journal of Voice,2005, 19(2), 161–175.
[11] J. Lohscheller, U. Eysholdt, H. Toy, and M. Dollinger, Phonovibrography: mapping high-speed movies of vocal fold vibrations into 2D-diagrams for visualizing and analyzing the underlying laryngeal dynamics. IEEE Transactions on Medical Imaging,2008, 27(3), 300-309.
[12] X. Chen, D. Bless, and Y. Yan, A Segmentation Scheme Based on Rayleigh Distribution Model for Extracting Glottal Waveform from High-speed Laryngeal Images. Proc. of the IEEE Eng. in Medicine and Biology. 2005, 6269-6272.
[13] V. J. Osma-Ruiz, J. I. Godino-Llorente, N. Sáenz-Lechón, N., and R. Fraile, Segmentation of the glottal space from laryngeal images using the watershed transform. Computerized Medical Imaging and Graphics.2008, 32(3), 193-201.
[14] S. Allin, J. Galeotti, G. Stetten, S. H. Dailey, Enhanced snake based segmentation of vocal folds. In proc. ISBI.1, 2004, 812- 815.
[15] A. Mendez, E. M. I. Alaoui, B. Garcia, E. Ibn-Elhaj, and I. Ruiz, Glottal Space Segmentation From Motion Estimation and Gabor Filtering. International Conf. of the IEEE EMBS. 2009, 5756-5759.
[16] V. J. Osma-Ruiz, J. M. Gutierrez-Arriola, J. I. Godino-Llorente, N. Saenz-Lechon, R. Fraile, and J. D. Arias-Londono, Advanced Preprocessing of Larynx Images To Improve the Segmentation of Glottal Area. ICSA. 2009, 129-132.
[17] Y. Zhang, C. Tao, J. J. Jiang, Parameter estimation of an asymmetric vocal fold system from glottal area time series using chaos synchronization. Chaos.16. 2006.
[18] Y. Yan, E. Damrose, D. Bless, Functional analysis of voice using simultaneous high-speed imaging and acoustic recordings. Journal of Voice. 21, 2007.
[19] D. Voigt, M. Dollinger, T. Braunschweig, A, Yang, U. Eysholdt, and J. Lohscheller, Classification of Functional Voice Disorders Based on Phonovibrograms. Artificial Intelligence in Medicine. 2010, 51-59.
[20] J. P. Noordzij, and P. Woo, Glottal Area Waveform Analysis of Benign Before and After Surgery. Ann. Otol. Rhinol. Laryngol. 2000, 105, 441-446.
[21] J. R Booth, and D. G. Childers, Automated Analysis of Ultra High-Speed Laryngeal Films. IEEE Trans. Biomedical Eng. 1979, 26(4), 185-192.
[22] W. C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coder, Wiley-Interscience,2003.
[23] P.R. Cook, Real Sound Synthesis for Interactive Applications, A K Peters, 2002.
[24] Rubita, Sh-Hussain Salleh, Shahrudin, NN with DTW-FF conficients and pitch feature for speaker recognition, Regional postgraduate conference on Engineering and Science (RPECES 2006), 2006.
[25] A.M. Youssef, T.K. Abdel-Galil, E.F. El-Saadany and M.M.A. Salama, Disturbance Classification Utilizing Dynamic Time Warping Classifier, IEEE Transactions on Power Delivery, 2004, 19(1), 272-278.