Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2

multiscale product Related Abstracts

2 Automatic Segmentation of the Clean Speech Signal

Authors: A. Bouzid, M. A. Ben Messaoud, N. Ellouze

Abstract:

Speech Segmentation is the measure of the change point detection for partitioning an input speech signal into regions each of which accords to only one speaker. In this paper, we apply two features based on multi-scale product (MP) of the clean speech, namely the spectral centroid of MP, and the zero crossings rate of MP. We focus on multi-scale product analysis as an important tool for segmentation extraction. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. We have evaluated our method on the Keele database. Experimental results show the effectiveness of our method presenting a good performance. It shows that the two simple features can find word boundaries, and extracted the segments of the clean speech.

Keywords: spectral centroid, multiscale product, speech segmentation, zero crossings rate

Procedia PDF Downloads 349
1 A Normalized Non-Stationary Wavelet Based Analysis Approach for a Computer Assisted Classification of Laryngoscopic High-Speed Video Recordings

Authors: Mona K. Fehling, Jakob Unger, Dietmar J. Hecker, Bernhard Schick, Joerg Lohscheller

Abstract:

Voice disorders origin from disturbances of the vibration patterns of the two vocal folds located within the human larynx. Consequently, the visual examination of vocal fold vibrations is an integral part within the clinical diagnostic process. For an objective analysis of the vocal fold vibration patterns, the two-dimensional vocal fold dynamics are captured during sustained phonation using an endoscopic high-speed camera. In this work, we present an approach allowing a fully automatic analysis of the high-speed video data including a computerized classification of healthy and pathological voices. The approach bases on a wavelet-based analysis of so-called phonovibrograms (PVG), which are extracted from the high-speed videos and comprise the entire two-dimensional vibration pattern of each vocal fold individually. Using a principal component analysis (PCA) strategy a low-dimensional feature set is computed from each phonovibrogram. From the PCA-space clinically relevant measures can be derived that quantify objectively vibration abnormalities. In the first part of the work it will be shown that, using a machine learning approach, the derived measures are suitable to distinguish automatically between healthy and pathological voices. Within the approach the formation of the PCA-space and consequently the extracted quantitative measures depend on the clinical data, which were used to compute the principle components. Therefore, in the second part of the work we proposed a strategy to achieve a normalization of the PCA-space by registering the PCA-space to a coordinate system using a set of synthetically generated vibration patterns. The results show that owing to the normalization step potential ambiguousness of the parameter space can be eliminated. The normalization further allows a direct comparison of research results, which bases on PCA-spaces obtained from different clinical subjects.

Keywords: multiscale product, normalization, Wavelet-based analysis, computer assisted classification, high-speed laryngoscopy, vocal fold analysis, phonovibrogram

Procedia PDF Downloads 131