Search results for: Corpus-based Speech Synthesis

610 An ICA Algorithm for Separation of Convolutive Mixture of Speech Signals

Authors: Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano

Abstract:

This paper describes Independent Component Analysis (ICA) based fixed-point algorithm for the blind separation of the convolutive mixture of speech, picked-up by a linear microphone array. The proposed algorithm extracts independent sources by non- Gaussianizing the Time-Frequency Series of Speech (TFSS) in a deflationary way. The degree of non-Gaussianization is measured by negentropy. The relative performances of algorithm under random initialization and Null beamformer (NBF) based initialization are studied. It has been found that an NBF based initial value gives speedy convergence as well as better separation performance

Keywords: Blind signal separation, independent component analysis, negentropy, convolutive mixture.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729

609 An Advanced Method for Speech Recognition

Authors: Meysam Mohamad pour, Fardad Farokhi

Abstract:

In this paper in consideration of each available techniques deficiencies for speech recognition, an advanced method is presented that-s able to classify speech signals with the high accuracy (98%) at the minimum time. In the presented method, first, the recorded signal is preprocessed that this section includes denoising with Mels Frequency Cepstral Analysis and feature extraction using discrete wavelet transform (DWT) coefficients; Then these features are fed to Multilayer Perceptron (MLP) network for classification. Finally, after training of neural network effective features are selected with UTA algorithm.

Keywords: Multilayer perceptron (MLP) neural network, Discrete Wavelet Transform (DWT) , Mels Scale Frequency Filter , UTA algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2309

608 The Effect of Different Compression Schemes on Speech Signals

Authors: Jalal Karam, Raed Saad

Abstract:

This paper studies the effect of different compression constraints and schemes presented in a new and flexible paradigm to achieve high compression ratios and acceptable signal to noise ratios of Arabic speech signals. Compression parameters are computed for variable frame sizes of a level 5 to 7 Discrete Wavelet Transform (DWT) representation of the signals for different analyzing mother wavelet functions. Results are obtained and compared for Global threshold and level dependent threshold techniques. The results obtained also include comparisons with Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error.

Keywords: Speech Compression, Wavelets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1690

607 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses

Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas

Abstract:

We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.

Keywords: Transient noise pulses, noise reduction, dynamic time warping, speech recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1861

606 Investigation of the Synthesis of Alcohols Byproducts in Fischer-Tropsch Synthesis on Modified Fe-Cu Catalyst: Reactivity and Mechanism

Authors: Wanyu Mao, Qiwen Sun, Weiyong Ying, Dingye Fang

Abstract:

The influence of copper promoters and reaction conditions on the formation of alcohols byproducts of a common Fischer-Tropsch synthesis used iron-based catalysts were investigated. A good compromise of 28%Cu/FeKLaSiO2 can lead to the optimization of an improved Fischer-Tropsch catalyst. The product distribution shifts towards hydrocarbons with increasing the reaction temperature, while pressure promotes the formation of alcohols. It was found that the production of either alcohols or hydrocarbons followed A-S-F distributions, and their α parameters were essentially different which indicated a competition in the growing chain between the two species. TPD after acetaldehyde adsorption gave strong evidence of the insertion of a C1 oxygen-containing species into an alkyl chain.

Keywords: Fischer-Tropsch synthesis, Fe-Cu catalyst, alcohols byproducts, reaction pathways

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1587

605 Synthesis of Filtering in Stochastic Systems on Continuous-Time Memory Observations in the Presence of Anomalous Noises

Authors: S. Rozhkova, O. Rozhkova, A. Harlova, V. Lasukov

Abstract:

We have conducted the optimal synthesis of rootmean- squared objective filter to estimate the state vector in the case if within the observation channel with memory the anomalous noises with unknown mathematical expectation are complement in the function of the regular noises. The synthesis has been carried out for linear stochastic systems of continuous - time.

Keywords: Mathematical expectation, filtration, anomalous noise, memory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1924

604 Hybrid Temporal Correlation Based on Gaussian Mixture Model Framework for View Synthesis

Authors: Deng Zengming, Wang Mingjiang

Abstract:

As 3D video is explored as a hot research topic in the last few decades, free-viewpoint TV (FTV) is no doubt a promising field for its better visual experience and incomparable interactivity. View synthesis is obviously a crucial technology for FTV; it enables to render images in unlimited numbers of virtual viewpoints with the information from limited numbers of reference view. In this paper, a novel hybrid synthesis framework is proposed and blending priority is explored. In contrast to the commonly used View Synthesis Reference Software (VSRS), the presented synthesis process is driven in consideration of the temporal correlation of image sequences. The temporal correlations will be exploited to produce fine synthesis results even near the foreground boundaries. As for the blending priority, this scheme proposed that one of the two reference views is selected to be the main reference view based on the distance between the reference views and virtual view, another view is chosen as the auxiliary viewpoint, just assist to fill the hole pixel with the help of background information. Significant improvement of the proposed approach over the state-of –the-art pixel-based virtual view synthesis method is presented, the results of the experiments show that subjective gains can be observed, and objective PSNR average gains range from 0.5 to 1.3 dB, while SSIM average gains range from 0.01 to 0.05.

Keywords: View synthesis, Gaussian mixture model, hybrid framework, fusion method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 943

603 Puff Noise Detection and Cancellation for Robust Speech Recognition

Authors: Sangjun Park, Jungpyo Hong, Byung-Ok Kang, Yun-keun Lee, Minsoo Hahn

Abstract:

In this paper, an algorithm for detecting and attenuating puff noises frequently generated under the mobile environment is proposed. As a baseline system, puff detection system is designed based on Gaussian Mixture Model (GMM), and 39th Mel Frequency Cepstral Coefficient (MFCC) is extracted as feature parameters. To improve the detection performance, effective acoustic features for puff detection are proposed. In addition, detected puff intervals are attenuated by high-pass filtering. The speech recognition rate was measured for evaluation and confusion matrix and ROC curve are used to confirm the validity of the proposed system.

Keywords: Gaussian mixture model, puff detection and cancellation, speech enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2182

602 Synthesis of TiO2 Nanoparticles by Sol-Gel and Sonochemical Combination

Authors: Sabriye Piskin, Sibel Kasap, Muge Sari Yilmaz

Abstract:

Nanocrystalline TiO₂ particles were successfully synthesized via sol-gel and sonochemical combination using titanium tetraisopropoxide as a precursor at lower temperature for a short time. The effect of the reaction parameters (hydrolysis media, acid media, and reaction temperatures) on the synthesis of TiO₂ particles were investigated in the present study. Characterizations of synthesized samples were prepared by X-ray diffraction (XRD) analysis. It was shown that the reaction parameters played a significant role in the synthesis of TiO₂ particles.

Keywords: Crystalline TiO2, sonochemical mechanism, sol-gel reaction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1972

601 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: Bilingual, children who stutter, children with language impairment, Hidden Markov Models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 978

600 Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition

Authors: Wernhuar Tarng, Yuan-Yuan Chen, Chien-Lung Li, Kun-Rong Hsie, Mingteh Chen

Abstract:

An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.

Keywords: Smart phones, emotional speech recognition, socialnetworks, support vector machines, time-frequency parameter, Mel-scale frequency cepstral coefficients (MFCC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1807

599 BDD Package Based on Boolean NOR Operation

Authors: M. Raseen, A.Assi, P.W. C. Prasad, A. Harb

Abstract:

Binary Decision Diagrams (BDDs) are useful data structures for symbolic Boolean manipulations. BDDs are used in many tasks in VLSI/CAD, such as equivalence checking, property checking, logic synthesis, and false paths. In this paper we describe a new approach for the realization of a BDD package. To perform manipulations of Boolean functions, the proposed approach does not depend on the recursive synthesis operation of the IF-Then-Else (ITE). Instead of using the ITE operation, the basic synthesis algorithm is done using Boolean NOR operation.

Keywords: Binary Decision Diagram (BDD), ITE Operation, Boolean Function, NOR operation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1905

598 Plasma Chemical Gasification of Solid Fuel with Mineral Mass Processing

Authors: V. E. Messerle, O. A. Lavrichshev, A. B. Ustimenko

Abstract:

The article presents a plasma chemical technology for processing solid fuels, using examples of bituminous and brown coals. Thermodynamic and experimental investigation of the technology was made. The technology allows producing synthesis gas from the coal organic mass and valuable components (technical silicon, ferrosilicon, aluminum, and carbon silicon, as well as microelements of rare metals, such as uranium, molybdenum, vanadium, etc.) from the mineral mass. The thusly produced highcalorific synthesis gas can be used for synthesis of methanol, as a high-calorific reducing gas instead of blast-furnace coke as well as power gas for thermal power plants.

Keywords: Gasification, mineral mass, organic mass, plasma, processing, solid fuel, synthesis gas, valuable components.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1919

597 Voice Disorders Identification Using Hybrid Approach: Wavelet Analysis and Multilayer Neural Networks

Authors: L. Salhi, M. Talbi, A. Cherif

Abstract:

This paper presents a new strategy of identification and classification of pathological voices using the hybrid method based on wavelet transform and neural networks. After speech acquisition from a patient, the speech signal is analysed in order to extract the acoustic parameters such as the pitch, the formants, Jitter, and shimmer. Obtained results will be compared to those normal and standard values thanks to a programmable database. Sounds are collected from normal people and patients, and then classified into two different categories. Speech data base is consists of several pathological and normal voices collected from the national hospital “Rabta-Tunis". Speech processing algorithm is conducted in a supervised mode for discrimination of normal and pathology voices and then for classification between neural and vocal pathologies (Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation results will be presented in function of the disease and will be compared with the clinical diagnosis in order to have an objective evaluation of the developed tool.

Keywords: Formants, Neural Networks, Pathological Voices, Pitch, Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2795

596 Preliminary Study of the Phonological Development in Three- and Four-Year-Old Bulgarian Children

Authors: Tsvetomira Braynova, Miglena Simonska

Abstract:

The article presents the results of a research of phonological processes in three- and four-year-old children. A test, created for the purpose of the study, was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the research is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, elision of sound, metathesis of sound, elision of syllable, elision of consonants clustered in a syllable. Measuring the correlation between average length of repeated speech and average length of generated speech, the analysis does not prove that the more words a child can repeat in part “repeated speech”, the more words they can be expected to generate in part “generating sentence”. The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.

Keywords: Articulation, phonology, speech, language development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 305

595 Continuous Feature Adaptation for Non-Native Speech Recognition

Authors: Y. Deng, X. Li, C. Kwan, B. Raj, R. Stern

Abstract:

The current speech interfaces in many military applications may be adequate for native speakers. However, the recognition rate drops quite a lot for non-native speakers (people with foreign accents). This is mainly because the nonnative speakers have large temporal and intra-phoneme variations when they pronounce the same words. This problem is also complicated by the presence of large environmental noise such as tank noise, helicopter noise, etc. In this paper, we proposed a novel continuous acoustic feature adaptation algorithm for on-line accent and environmental adaptation. Implemented by incremental singular value decomposition (SVD), the algorithm captures local acoustic variation and runs in real-time. This feature-based adaptation method is then integrated with conventional model-based maximum likelihood linear regression (MLLR) algorithm. Extensive experiments have been performed on the NATO non-native speech corpus with baseline acoustic model trained on native American English. The proposed feature-based adaptation algorithm improved the average recognition accuracy by 15%, while the MLLR model based adaptation achieved 11% improvement. The corresponding word error rate (WER) reduction was 25.8% and 2.73%, as compared to that without adaptation. The combined adaptation achieved overall recognition accuracy improvement of 29.5%, and WER reduction of 31.8%, as compared to that without adaptation.

Keywords: speaker adaptation; environment adaptation; robust speech recognition; SVD; non-native speech recognition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3177

594 Automatic Detection of Syllable Repetition in Read Speech for Objective Assessment of Stuttered Disfluencies

Authors: K. M. Ravikumar, Balakrishna Reddy, R. Rajagopal, H. C. Nagaraj

Abstract:

Automatic detection of syllable repetition is one of the important parameter in assessing the stuttered speech objectively. The existing method which uses artificial neural network (ANN) requires high levels of agreement as prerequisite before attempting to train and test ANNs to separate fluent and nonfluent. We propose automatic detection method for syllable repetition in read speech for objective assessment of stuttered disfluencies which uses a novel approach and has four stages comprising of segmentation, feature extraction, score matching and decision logic. Feature extraction is implemented using well know Mel frequency Cepstra coefficient (MFCC). Score matching is done using Dynamic Time Warping (DTW) between the syllables. The Decision logic is implemented by Perceptron based on the score given by score matching. Although many methods are available for segmentation, in this paper it is done manually. Here the assessment by human judges on the read speech of 10 adults who stutter are described using corresponding method and the result was 83%.

Keywords: Assessment, DTW, MFCC, Objective, Perceptron, Stuttering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2750

593 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: Clustering algorithm, potential function, speech signal, the UBSS model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 625

592 Technologies of Acylation of Hydroxyanthraquinones

Authors: Dmitry Yu. Korulkin, Raissa A. Muzychkina

Abstract:

In review the generalized data about different methods of synthesis of biological activity acylatedhydrohyanthraquinones is presented. The basic regularity of a synthesis is analyzed. Action of temperature, pH, solubility, catalysts and other factors on a reaction product yield is revealed.

Keywords: Aminoacidic acylation, hydroxyanthraquinones, nucleophilic exchange, physiologically active substances.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1763

591 Technologies of Halogenation of Hydroxyanthraquinones

Authors: Dmitriy Yu. Korulkin, Raissa A. Muzychkina

Abstract:

In review the generalized data about different methods of synthesis of biological activity halogenated di-, tri- and tetrahydroxyanthraquinones is presented. The basic regularity of a synthesis is analyzed. Action of temperature, pH, solubility, catalysts and other factors on a reaction product yield is revealed.

Keywords: Electrophilic substitution, halogenation, hydroxyanthraquinones, physiologically active substances.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2150

590 Technologies of Amination of Hydroxyanthraquinones

Authors: Dmitry Yu. Korulkin, Raissa A. Muzychkina

Abstract:

In review the generalized data about different methods of synthesis of biological activity aminated hydroxyanthraquinones is presented. The basic regularity of a synthesis is analyzed. Action of temperature, pH, solubility, catalysts and other factors on a reaction product yield is revealed.

Keywords: Amination, hydroxyanthraquinones, nucleophilic exchange, physiologically active substances.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2543

589 Analysis of Effect of Pre-Logic Factoring on Cell Based Combinatorial Logic Synthesis

Authors: Padmanabhan Balasubramanian, Bashetty Raghavendra

Abstract:

In this paper, an analysis is presented, which demonstrates the effect pre-logic factoring could have on an automated combinational logic synthesis process succeeding it. The impact of pre-logic factoring for some arbitrary combinatorial circuits synthesized within a FPGA based logic design environment has been analyzed previously. This paper explores a similar effect, but with the non-regenerative logic synthesized using elements of a commercial standard cell library. On an overall basis, the results obtained pertaining to the analysis on a variety of MCNC/IWLS combinational logic benchmark circuits indicate that pre-logic factoring has the potential to facilitate simultaneous power, delay and area optimized synthesis solutions in many cases.

Keywords: Algebraic factoring, Combinational logic synthesis, Standard cells, Low power, Delay optimization, Area reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1329

588 Speech Recognition Using Scaly Neural Networks

Authors: Akram M. Othman, May H. Riadh

Abstract:

This research work is aimed at speech recognition using scaly neural networks. A small vocabulary of 11 words were established first, these words are “word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2". These chosen words involved with executing some computer functions such as opening a file, print certain text document, cutting, copying, pasting, editing and exit. It introduced to the computer then subjected to feature extraction process using LPC (linear prediction coefficients). These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system; those are used for information retrieval. The system components are consist of three parts, speech processing and feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 79.5-88% successful, which is quite acceptable, considering the variation to surrounding, state of the person, and the microphone type.

Keywords: Feature extraction, Liner prediction coefficients, neural network, Speech Recognition, Scaly ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1696

587 Thermodynamic Attainable Region for Direct Synthesis of Dimethyl Ether from Synthesis Gas

Authors: Thulane Paepae, Tumisang Seodigeng

Abstract:

This paper demonstrates the use of a method of synthesizing process flowsheets using a graphical tool called the GH-plot and in particular, to look at how it can be used to compare the reactions of a combined simultaneous process with regard to their thermodynamics. The technique uses fundamental thermodynamic principles to allow the mass, energy and work balances locate the attainable region for chemical processes in a reactor. This provides guidance on what design decisions would be best suited to developing new processes that are more effective and make lower demands on raw material and energy usage.

Keywords: Attainable region, dimethyl ether synthesis, mass balance, optimal reaction networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1444

586 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach

Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik

Abstract:

We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.

Keywords: Noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 928

585 A Smart-Visio Microphone for Audio-Visual Speech Recognition “Vmike“

Authors: Y. Ni, K. Sebri

Abstract:

The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.

Keywords: Audio-Visual Speech recognition, CMOS Smartsensor, On-Chip image processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1782

584 Computationally Efficient Signal Quality Improvement Method for VoIP System

Authors: H. P. Singh, S. Singh

Abstract:

The voice signal in Voice over Internet protocol (VoIP) system is processed through the best effort policy based IP network, which leads to the network degradations including delay, packet loss jitter. The work in this paper presents the implementation of finite impulse response (FIR) filter for voice quality improvement in the VoIP system through distributed arithmetic (DA) algorithm. The VoIP simulations are conducted with AMR-NB 6.70 kbps and G.729a speech coders at different packet loss rates and the performance of the enhanced VoIP signal is evaluated using the perceptual evaluation of speech quality (PESQ) measurement for narrowband signal. The results show reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal.

Keywords: VoIP, Signal Quality, Distributed Arithmetic, Packet Loss, Speech Coder.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1781

583 Facile Synthesis of Vertically Aligned ZnO Nanowires on Carbon Layer by Vapour Deposition

Authors: Kh. A. Abdullin, N. B. Bakranov, S. E. Kudaibergenov, S.E. Kumekov, V. N. Ermolaev, L. V. Podrezova

Abstract:

A facile vapour deposition method of synthesis of vertically aligned ZnO nanowires on carbon seed layer was developed. The received samples were investigated on electronic microscope JSM-6490 LA JEOL and x-ray diffractometer X, pert MPD PRO. The photoluminescence spectra (PL) of obtained ZnO samples at a room temperature were studied using He-Cd laser (325 nm line) as excitation source.

Keywords: ZnO nanowires, vapor-phase deposition, Nicatalytic layer, facile method of synthesis, carbon catalytic layer, thephotoluminescence spectra, X-ray spectrum.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1406

582 Extracting Tongue Shape Dynamics from Magnetic Resonance Image Sequences

Authors: María S. Avila-García, John N. Carter, Robert I. Damper

Abstract:

An important problem in speech research is the automatic extraction of information about the shape and dimensions of the vocal tract during real-time speech production. We have previously developed Southampton dynamic magnetic resonance imaging (SDMRI) as an approach to the solution of this problem.However, the SDMRI images are very noisy so that shape extraction is a major challenge. In this paper, we address the problem of tongue shape extraction, which poses difficulties because this is a highly deforming non-parametric shape. We show that combining active shape models with the dynamic Hough transform allows the tongue shape to be reliably tracked in the image sequence.

Keywords: Vocal tract imaging, speech production, active shapemodels, dynamic Hough transform, object tracking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1693

581 The Green Synthesis AgNPs from Basil Leaf Extract

Authors: W. Wonsawat

Abstract:

Bioreduction of silver nanoparticles (AgNPs) from silver ions (Ag⁺) using water extract of Thai basil leaf was successfully carried out. The basil leaf extract provided a reducing agent and stabilizing agent for a synthesis of metal nanoparticles. Silver nanoparticles received from cut and uncut basil leaf was compared. The resulting silver nanoparticles are characterized by UV-Vis spectroscopy. The maximum intensities of silver nanoparticle from cut and uncut basil leaf were 410 and 420, respectively. The techniques involved are simple, eco-friendly and rapid.

Keywords: Basil leaves, Silver Nanoparticles, Green Synthesis, Plant Extract.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4627