Search results for: development of coherent speech
4336 Delineating Students’ Speaking Anxieties and Assessment Gaps in Online Speech Performances
Authors: Mary Jane B. Suarez
Abstract:
Speech anxiety is innumerable in any traditional communication classes especially for ESL students. The speech anxiety intensifies when communication skills assessments have taken its toll in an online mode of learning due to the perils of the COVID-19 virus. Teachers and students have experienced vast ambiguity on how to realize a still effective way to teach and learn various speaking skills amidst the pandemic. This mixed method study determined the factors that affected the public speaking skills of students in online performances, delineated the assessment gaps in assessing speaking skills in an online setup, and recommended ways to address students’ speech anxieties. Using convergent parallel design, quantitative data were gathered by examining the desired learning competencies of the English course including a review of the teacher’s class record to analyze how students’ performances reflected a significantly high level of anxiety in online speech delivery. Focus group discussion was also conducted for qualitative data describing students’ public speaking anxiety and assessment gaps. Results showed a significantly high level of students’ speech anxiety affected by time constraints, use of technology, lack of audience response, being conscious of making mistakes, and the use of English as a second language. The study presented recommendations to redesign curricular assessments of English teachers and to have a robust diagnosis of students’ speaking anxiety to better cater to the needs of learners in attempt to bridge any gaps in cultivating public speaking skills of students as educational institutions segue from the pandemic to the post-pandemic milieu.
Keywords: Blended learning, communication skills assessment, online speech delivery, public speaking anxiety, speech anxiety.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1784335 Pure and Mixed Nash Equilibria Domain of a Discrete Game Model with Dichotomous Strategy Space
Authors: A. S. Mousa, F. Shoman
Abstract:
We present a discrete game theoretical model with homogeneous individuals who make simultaneous decisions. In this model the strategy space of all individuals is a discrete and dichotomous set which consists of two strategies. We fully characterize the coherent, split and mixed strategies that form Nash equilibria and we determine the corresponding Nash domains for all individuals. We find all strategic thresholds in which individuals can change their mind if small perturbations in the parameters of the model occurs.Keywords: Coherent strategy, split strategy, pure strategy, mixed strategy, Nash Equilibrium, game theory.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7154334 Environmentally Adaptive Acoustic Echo Suppression for Barge-in Speech Recognition
Authors: Jong Han Joo, Jeong Hun Lee, Young Sun Kim, Jae Young Kang, Seung Ho Choi
Abstract:
In this study, we propose a novel technique for acoustic echo suppression (AES) during speech recognition under barge-in conditions. Conventional AES methods based on spectral subtraction apply fixed weights to the estimated echo path transfer function (EPTF) at the current signal segment and to the EPTF estimated until the previous time interval. However, the effects of echo path changes should be considered for eliminating the undesired echoes. We describe a new approach that adaptively updates weight parameters in response to abrupt changes in the acoustic environment due to background noises or double-talk. Furthermore, we devised a voice activity detector and an initial time-delay estimator for barge-in speech recognition in communication networks. The initial time delay is estimated using log-spectral distance measure, as well as cross-correlation coefficients. The experimental results show that the developed techniques can be successfully applied in barge-in speech recognition systems.
Keywords: Acoustic echo suppression, barge-in, speech recognition, echo path transfer function, initial delay estimator, voice activity detector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23174333 A Novel Receiver Algorithm for Coherent Underwater Acoustic Communications
Authors: Liang Zhao, Jianhua Ge
Abstract:
In this paper, we proposed a novel receiver algorithm for coherent underwater acoustic communications. The proposed receiver is composed of three parts: (1) Doppler tracking and correction, (2) Time reversal channel estimation and combining, and (3) Joint iterative equalization and decoding (JIED). To reduce computational complexity and optimize the equalization algorithm, Time reversal (TR) channel estimation and combining is adopted to simplify multi-channel adaptive decision feedback equalizer (ADFE) into single channel ADFE without reducing the system performance. Simultaneously, the turbo theory is adopted to form joint iterative ADFE and convolutional decoder (JIED). In JIED scheme, the ADFE and decoder exchange soft information in an iterative manner, which can enhance the equalizer performance using decoding gain. The simulation results show that the proposed algorithm can reduce computational complexity and improve the performance of equalizer. Therefore, the performance of coherent underwater acoustic communications can be improved greatly.Keywords: Underwater acoustic communication, Time reversal (TR) combining, joint iterative equalization and decoding (JIED)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17254332 Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System
Authors: Yueming Wang, Rendong Ying, Sumxin Jiang, Peilin Liu
Abstract:
Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.
Keywords: ID3 Decision Tree, MFCC, Orchestra/Percussion Classification, USAC
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16734331 A High Quality Speech Coder at 600 bps
Authors: Yong Zhang, Ruimin Hu
Abstract:
This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.
Keywords: Speech coding, Vector quantization, linear predicition, Mixed sinusoidal excitation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21894330 An ICA Algorithm for Separation of Convolutive Mixture of Speech Signals
Authors: Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano
Abstract:
This paper describes Independent Component Analysis (ICA) based fixed-point algorithm for the blind separation of the convolutive mixture of speech, picked-up by a linear microphone array. The proposed algorithm extracts independent sources by non- Gaussianizing the Time-Frequency Series of Speech (TFSS) in a deflationary way. The degree of non-Gaussianization is measured by negentropy. The relative performances of algorithm under random initialization and Null beamformer (NBF) based initialization are studied. It has been found that an NBF based initial value gives speedy convergence as well as better separation performance
Keywords: Blind signal separation, independent component analysis, negentropy, convolutive mixture.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17794329 An Advanced Method for Speech Recognition
Authors: Meysam Mohamad pour, Fardad Farokhi
Abstract:
In this paper in consideration of each available techniques deficiencies for speech recognition, an advanced method is presented that-s able to classify speech signals with the high accuracy (98%) at the minimum time. In the presented method, first, the recorded signal is preprocessed that this section includes denoising with Mels Frequency Cepstral Analysis and feature extraction using discrete wavelet transform (DWT) coefficients; Then these features are fed to Multilayer Perceptron (MLP) network for classification. Finally, after training of neural network effective features are selected with UTA algorithm.Keywords: Multilayer perceptron (MLP) neural network, Discrete Wavelet Transform (DWT) , Mels Scale Frequency Filter , UTA algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23674328 The Effect of Different Compression Schemes on Speech Signals
Authors: Jalal Karam, Raed Saad
Abstract:
This paper studies the effect of different compression constraints and schemes presented in a new and flexible paradigm to achieve high compression ratios and acceptable signal to noise ratios of Arabic speech signals. Compression parameters are computed for variable frame sizes of a level 5 to 7 Discrete Wavelet Transform (DWT) representation of the signals for different analyzing mother wavelet functions. Results are obtained and compared for Global threshold and level dependent threshold techniques. The results obtained also include comparisons with Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error.Keywords: Speech Compression, Wavelets.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17354327 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses
Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas
Abstract:
We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.
Keywords: Transient noise pulses, noise reduction, dynamic time warping, speech recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19464326 Unit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis
Authors: Mohamed Ali KAMMOUN, Ahmed Ben HAMIDA
Abstract:
In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection. In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost. In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora.Keywords: Unit selection, Corpus-based Speech Synthesis, Bigram model
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14414325 Puff Noise Detection and Cancellation for Robust Speech Recognition
Authors: Sangjun Park, Jungpyo Hong, Byung-Ok Kang, Yun-keun Lee, Minsoo Hahn
Abstract:
In this paper, an algorithm for detecting and attenuating puff noises frequently generated under the mobile environment is proposed. As a baseline system, puff detection system is designed based on Gaussian Mixture Model (GMM), and 39th Mel Frequency Cepstral Coefficient (MFCC) is extracted as feature parameters. To improve the detection performance, effective acoustic features for puff detection are proposed. In addition, detected puff intervals are attenuated by high-pass filtering. The speech recognition rate was measured for evaluation and confusion matrix and ROC curve are used to confirm the validity of the proposed system.Keywords: Gaussian mixture model, puff detection and cancellation, speech enhancement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22344324 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse
Authors: Sheena Christabel Pravin, M. Palanivelan
Abstract:
Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.
Keywords: Bilingual, children who stutter, children with language impairment, Hidden Markov Models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10294323 Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition
Authors: Wernhuar Tarng, Yuan-Yuan Chen, Chien-Lung Li, Kun-Rong Hsie, Mingteh Chen
Abstract:
An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.Keywords: Smart phones, emotional speech recognition, socialnetworks, support vector machines, time-frequency parameter, Mel-scale frequency cepstral coefficients (MFCC).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18434322 Voice Disorders Identification Using Hybrid Approach: Wavelet Analysis and Multilayer Neural Networks
Authors: L. Salhi, M. Talbi, A. Cherif
Abstract:
This paper presents a new strategy of identification and classification of pathological voices using the hybrid method based on wavelet transform and neural networks. After speech acquisition from a patient, the speech signal is analysed in order to extract the acoustic parameters such as the pitch, the formants, Jitter, and shimmer. Obtained results will be compared to those normal and standard values thanks to a programmable database. Sounds are collected from normal people and patients, and then classified into two different categories. Speech data base is consists of several pathological and normal voices collected from the national hospital “Rabta-Tunis". Speech processing algorithm is conducted in a supervised mode for discrimination of normal and pathology voices and then for classification between neural and vocal pathologies (Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation results will be presented in function of the disease and will be compared with the clinical diagnosis in order to have an objective evaluation of the developed tool.Keywords: Formants, Neural Networks, Pathological Voices, Pitch, Wavelet Transform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28424321 Continuous Feature Adaptation for Non-Native Speech Recognition
Authors: Y. Deng, X. Li, C. Kwan, B. Raj, R. Stern
Abstract:
The current speech interfaces in many military applications may be adequate for native speakers. However, the recognition rate drops quite a lot for non-native speakers (people with foreign accents). This is mainly because the nonnative speakers have large temporal and intra-phoneme variations when they pronounce the same words. This problem is also complicated by the presence of large environmental noise such as tank noise, helicopter noise, etc. In this paper, we proposed a novel continuous acoustic feature adaptation algorithm for on-line accent and environmental adaptation. Implemented by incremental singular value decomposition (SVD), the algorithm captures local acoustic variation and runs in real-time. This feature-based adaptation method is then integrated with conventional model-based maximum likelihood linear regression (MLLR) algorithm. Extensive experiments have been performed on the NATO non-native speech corpus with baseline acoustic model trained on native American English. The proposed feature-based adaptation algorithm improved the average recognition accuracy by 15%, while the MLLR model based adaptation achieved 11% improvement. The corresponding word error rate (WER) reduction was 25.8% and 2.73%, as compared to that without adaptation. The combined adaptation achieved overall recognition accuracy improvement of 29.5%, and WER reduction of 31.8%, as compared to that without adaptation. Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32174320 Automatic Detection of Syllable Repetition in Read Speech for Objective Assessment of Stuttered Disfluencies
Authors: K. M. Ravikumar, Balakrishna Reddy, R. Rajagopal, H. C. Nagaraj
Abstract:
Automatic detection of syllable repetition is one of the important parameter in assessing the stuttered speech objectively. The existing method which uses artificial neural network (ANN) requires high levels of agreement as prerequisite before attempting to train and test ANNs to separate fluent and nonfluent. We propose automatic detection method for syllable repetition in read speech for objective assessment of stuttered disfluencies which uses a novel approach and has four stages comprising of segmentation, feature extraction, score matching and decision logic. Feature extraction is implemented using well know Mel frequency Cepstra coefficient (MFCC). Score matching is done using Dynamic Time Warping (DTW) between the syllables. The Decision logic is implemented by Perceptron based on the score given by score matching. Although many methods are available for segmentation, in this paper it is done manually. Here the assessment by human judges on the read speech of 10 adults who stutter are described using corresponding method and the result was 83%.Keywords: Assessment, DTW, MFCC, Objective, Perceptron, Stuttering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28124319 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model
Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You
Abstract:
The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.Keywords: Clustering algorithm, potential function, speech signal, the UBSS model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6824318 Spectral Entropy Employment in Speech Enhancement based on Wavelet Packet
Authors: Talbi Mourad, Salhi Lotfi, Chérif Adnen
Abstract:
In this work, we are interested in developing a speech denoising tool by using a discrete wavelet packet transform (DWPT). This speech denoising tool will be employed for applications of recognition, coding and synthesis. For noise reduction, instead of applying the classical thresholding technique, some wavelet packet nodes are set to zero and the others are thresholded. To estimate the non stationary noise level, we employ the spectral entropy. A comparison of our proposed technique to classical denoising methods based on thresholding and spectral subtraction is made in order to evaluate our approach. The experimental implementation uses speech signals corrupted by two sorts of noise, white and Volvo noises. The obtained results from listening tests show that our proposed technique is better than spectral subtraction. The obtained results from SNR computation show the superiority of our technique when compared to the classical thresholding method using the modified hard thresholding function based on u-law algorithm.
Keywords: Enhancement, spectral subtraction, SNR, discrete wavelet packet transform, spectral entropy Histogram
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19924317 Bangla Vowel Characterization Based on Analysis by Synthesis
Authors: Syed Akhter Hossain, M. Lutfar Rahman, Farruk Ahmed
Abstract:
Bangla Vowel characterization determines the spectral properties of Bangla vowels for efficient synthesis as well as recognition of Bangla vowels. In this paper, Bangla vowels in isolated word have been analyzed based on speech production model within the framework of Analysis-by-Synthesis. This has led to the extraction of spectral parameters for the production model in order to produce different Bangla vowel sounds. The real and synthetic spectra are compared and a weighted square error has been computed along with the error in the formant bandwidths for efficient representation of Bangla vowels. The extracted features produced good representation of targeted Bangla vowel. Such a representation also plays essential role in low bit rate speech coding and vocoders.
Keywords: Speech, vowel, formant, synthesis, spectrum, LPC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23714316 Speech Recognition Using Scaly Neural Networks
Authors: Akram M. Othman, May H. Riadh
Abstract:
This research work is aimed at speech recognition using scaly neural networks. A small vocabulary of 11 words were established first, these words are “word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2". These chosen words involved with executing some computer functions such as opening a file, print certain text document, cutting, copying, pasting, editing and exit. It introduced to the computer then subjected to feature extraction process using LPC (linear prediction coefficients). These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system; those are used for information retrieval. The system components are consist of three parts, speech processing and feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 79.5-88% successful, which is quite acceptable, considering the variation to surrounding, state of the person, and the microphone type.Keywords: Feature extraction, Liner prediction coefficients, neural network, Speech Recognition, Scaly ANN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17384315 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach
Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik
Abstract:
We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.Keywords: Noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9914314 A Smart-Visio Microphone for Audio-Visual Speech Recognition “Vmike“
Abstract:
The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.
Keywords: Audio-Visual Speech recognition, CMOS Smartsensor, On-Chip image processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18274313 Development of Effective Cooling Schemes of Gas Turbine Blades Based on Computer Simulation
Authors: Pasayev, A., C. Askerov, R. Sadiqov, C. Ardil
Abstract:
In contrast to existing of calculation of temperature field of a profile part a blade with convective cooling which are not taking into account multi connective in a broad sense of this term, we develop mathematical models and highly effective combination (BIEM AND FDM) numerical methods from the point of view of a realization on the PC. The theoretical substantiation of these methods is proved by the appropriate theorems.
Keywords: multi coherent systems, method of the boundary integrated equations, singular operators, gas turbines
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16494312 Computationally Efficient Signal Quality Improvement Method for VoIP System
Authors: H. P. Singh, S. Singh
Abstract:
The voice signal in Voice over Internet protocol (VoIP) system is processed through the best effort policy based IP network, which leads to the network degradations including delay, packet loss jitter. The work in this paper presents the implementation of finite impulse response (FIR) filter for voice quality improvement in the VoIP system through distributed arithmetic (DA) algorithm. The VoIP simulations are conducted with AMR-NB 6.70 kbps and G.729a speech coders at different packet loss rates and the performance of the enhanced VoIP signal is evaluated using the perceptual evaluation of speech quality (PESQ) measurement for narrowband signal. The results show reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal.
Keywords: VoIP, Signal Quality, Distributed Arithmetic, Packet Loss, Speech Coder.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18304311 Extracting Tongue Shape Dynamics from Magnetic Resonance Image Sequences
Authors: María S. Avila-García, John N. Carter, Robert I. Damper
Abstract:
An important problem in speech research is the automatic extraction of information about the shape and dimensions of the vocal tract during real-time speech production. We have previously developed Southampton dynamic magnetic resonance imaging (SDMRI) as an approach to the solution of this problem.However, the SDMRI images are very noisy so that shape extraction is a major challenge. In this paper, we address the problem of tongue shape extraction, which poses difficulties because this is a highly deforming non-parametric shape. We show that combining active shape models with the dynamic Hough transform allows the tongue shape to be reliably tracked in the image sequence.
Keywords: Vocal tract imaging, speech production, active shapemodels, dynamic Hough transform, object tracking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17354310 Influence of Loudness Compression on Hearing with Bone Anchored Hearing Implants
Authors: Anja Kurz, Marc Flynn, Tobias Good, Marco Caversaccio, Martin Kompis
Abstract:
Bone Anchored Hearing Implants (BAHI) are routinely used in patients with conductive or mixed hearing loss, e.g. if conventional air conduction hearing aids cannot be used. New sound processors and new fitting software now allow the adjustment of parameters such as loudness compression ratios or maximum power output separately. Today it is unclear, how the choice of these parameters influences aided speech understanding in BAHI users. In this prospective experimental study, the effect of varying the compression ratio and lowering the maximum power output in a BAHI were investigated. Twelve experienced adult subjects with a mixed hearing loss participated in this study. Four different compression ratios (1.0; 1.3; 1.6; 2.0) were tested along with two different maximum power output settings, resulting in a total of eight different programs. Each participant tested each program during two weeks. A blinded Latin square design was used to minimize bias. For each of the eight programs, speech understanding in quiet and in noise was assessed. For speech in quiet, the Freiburg number test and the Freiburg monosyllabic word test at 50, 65, and 80 dB SPL were used. For speech in noise, the Oldenburg sentence test was administered. Speech understanding in quiet and in noise was improved significantly in the aided condition in any program, when compared to the unaided condition. However, no significant differences were found between any of the eight programs. In contrast, on a subjective level there was a significant preference for medium compression ratios of 1.3 to 1.6 and higher maximum power output.
Keywords: Bone Anchored Hearing Implant, Compression, Maximum Power Output, Speech understanding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20664309 Comparison of Fricative Vocal Tract Transfer Functions Derived using Two Different Segmentation Techniques
Authors: K. S. Subari, C. H. Shadle, A. Barney, R. I. Damper
Abstract:
The acoustic and articulatory properties of fricative speech sounds are being studied using magnetic resonance imaging (MRI) and acoustic recordings from a single subject. Area functions were derived from a complete set of axial and coronal MR slices using two different methods: the Mermelstein technique and the Blum transform. Area functions derived from the two techniques were shown to differ significantly in some cases. Such differences will lead to different acoustic predictions and it is important to know which is the more accurate. The vocal tract acoustic transfer function (VTTF) was derived from these area functions for each fricative and compared with measured speech signals for the same fricative and same subject. The VTTFs for /f/ in two vowel contexts and the corresponding acoustic spectra are derived here; the Blum transform appears to show a better match between prediction and measurement than the Mermelstein technique.
Keywords: Area functions, fricatives, vocal tract transferfunction, MRI, speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16534308 Automotive 3-Microphone Noise Canceller in a Frequently Moving Noise Source Environment
Authors: Z. Qi, T. J. Moir
Abstract:
A combined three-microphone voice activity detector (VAD) and noise-canceling system is studied to enhance speech recognition in an automobile environment. A previous experiment clearly shows the ability of the composite system to cancel a single noise source outside of a defined zone. This paper investigates the performance of the composite system when there are frequently moving noise sources (noise sources are coming from different locations but are not always presented at the same time) e.g. there is other passenger speech or speech from a radio when a desired speech is presented. To work in a frequently moving noise sources environment, whilst a three-microphone voice activity detector (VAD) detects voice from a “VAD valid zone", the 3-microphone noise canceller uses a “noise canceller valid zone" defined in freespace around the users head. Therefore, a desired voice should be in the intersection of the noise canceller valid zone and VAD valid zone. Thus all noise is suppressed outside this intersection of area. Experiments are shown for a real environment e.g. all results were recorded in a car by omni-directional electret condenser microphones.
Keywords: Signal processing, voice activity detection, noise canceller, microphone array beam forming.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16124307 Phase-Averaged Analysis of Three-Dimensional Vorticity in the Wake of Two Yawed Side-By-Side Circular Cylinders
Authors: T. Zhou, S. F. Mohd. Razali, Y. Zhou, H. Wang, L. Cheng
Abstract:
Thewake flow behind two yawed side-by-sidecircular cylinders is investigated using athree-dimensional vorticity probe. Four yaw angles (α), namely, 0°, 15°, 30° and 45° and twocylinder spacing ratios T* of 1.7 and 3.0 were tested. For T* = 3.0, there exist two vortex streets and the cylinders behave as independent and isolated ones. The maximum contour value of the coherent streamwise vorticity ~* ωx is only about 10% of that of the spanwise vorticity ~* ωz . With the increase of α, ~* ωx increases whereas ~* ωz decreases. At α = 45°, ~* ωx is about 67% of ~* ωz .For T* = 1.7, only a single peak is detected in the energy spectrum. The spanwise vorticity contours have an organized pattern only at α = 0°. The maximum coherent vorticity contours of ~* ω x and ~* ωz for T* = 1.7 are about 30% and 7% of those for T* = 3.0.The independence principle (IP)in terms of Strouhal numbers is applicable in both wakes when α< 40°.
Keywords: Circular cylinder wake, vorticity, vortex shedding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1795