Search results for: speech signal processing.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2633

Search results for: speech signal processing.

2603 Automatic Distance Compensation for Robust Voice-based Human-Computer Interaction

Authors: Randy Gomez, Keisuke Nakamura, Kazuhiro Nakadai

Abstract:

Distant-talking voice-based HCI system suffers from performance degradation due to mismatch between the acoustic speech (runtime) and the acoustic model (training). Mismatch is caused by the change in the power of the speech signal as observed at the microphones. This change is greatly influenced by the change in distance, affecting speech dynamics inside the room before reaching the microphones. Moreover, as the speech signal is reflected, its acoustical characteristic is also altered by the room properties. In general, power mismatch due to distance is a complex problem. This paper presents a novel approach in dealing with distance-induced mismatch by intelligently sensing instantaneous voice power variation and compensating model parameters. First, the distant-talking speech signal is processed through microphone array processing, and the corresponding distance information is extracted. Distance-sensitive Gaussian Mixture Models (GMMs), pre-trained to capture both speech power and room property are used to predict the optimal distance of the speech source. Consequently, pre-computed statistic priors corresponding to the optimal distance is selected to correct the statistics of the generic model which was frozen during training. Thus, model combinatorics are post-conditioned to match the power of instantaneous speech acoustics at runtime. This results to an improved likelihood in predicting the correct speech command at farther distances. We experiment using real data recorded inside two rooms. Experimental evaluation shows voice recognition performance using our method is more robust to the change in distance compared to the conventional approach. In our experiment, under the most acoustically challenging environment (i.e., Room 2: 2.5 meters), our method achieved 24.2% improvement in recognition performance against the best-performing conventional method.

Keywords: Human Machine Interaction, Human Computer Interaction, Voice Recognition, Acoustic Model Compensation, Acoustic Speech Enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1845
2602 Noise Estimation for Speech Enhancement in Non-Stationary Environments-A New Method

Authors: Ch.V.Rama Rao, Gowthami., Harsha., Rajkumar., M.B.Rama Murthy, K.Srinivasa Rao, K.AnithaSheela

Abstract:

This paper presents a new method for estimating the nonstationary noise power spectral density given a noisy signal. The method is based on averaging the noisy speech power spectrum using time and frequency dependent smoothing factors. These factors are adjusted based on signal-presence probability in individual frequency bins. Signal presence is determined by computing the ratio of the noisy speech power spectrum to its local minimum, which is updated continuously by averaging past values of the noisy speech power spectra with a look-ahead factor. This method adapts very quickly to highly non-stationary noise environments. The proposed method achieves significant improvements over a system that uses voice activity detector (VAD) in noise estimation.

Keywords: Noise estimation, Non-stationary noise, Speechenhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2305
2601 End Point Detection for Wavelet Based Speech Compression

Authors: Jalal Karam

Abstract:

In real-field applications, the correct determination of voice segments highly improves the overall system accuracy and minimises the total computation time. This paper presents reliable measures of speech compression by detcting the end points of the speech signals prior to compressing them. The two different compession schemes used are the Global threshold and the Level- Dependent threshold techniques. The performance of the proposed method is tested wirh the Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error parameter measures.

Keywords: Wavelets, End-points Detection, Compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1346
2600 Stochastic Resonance in Nonlinear Signal Detection

Authors: Youguo Wang, Lenan Wu

Abstract:

Stochastic resonance (SR) is a phenomenon whereby the signal transmission or signal processing through certain nonlinear systems can be improved by adding noise. This paper discusses SR in nonlinear signal detection by a simple test statistic, which can be computed from multiple noisy data in a binary decision problem based on a maximum a posteriori probability criterion. The performance of detection is assessed by the probability of detection error Per . When the input signal is subthreshold signal, we establish that benefit from noise can be gained for different noises and confirm further that the subthreshold SR exists in nonlinear signal detection. The efficacy of SR is significantly improved and the minimum of Per can dramatically approach to zero as the sample number increases. These results show the robustness of SR in signal detection and extend the applicability of SR in signal processing.

Keywords: Probability of detection error, signal detection, stochastic resonance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488
2599 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach

Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik

Abstract:

We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.

Keywords: Noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 940
2598 Signal Driven Sampling and Filtering a Promising Approach for Time Varying Signals Processing

Authors: Saeed Mian Qaisar, Laurent Fesquet, Marc Renaudin

Abstract:

The mobile systems are powered by batteries. Reducing the system power consumption is a key to increase its autonomy. It is known that mostly the systems are dealing with time varying signals. Thus, we aim to achieve power efficiency by smartly adapting the system processing activity in accordance with the input signal local characteristics. It is done by completely rethinking the processing chain, by adopting signal driven sampling and processing. In this context, a signal driven filtering technique, based on the level crossing sampling is devised. It adapts the sampling frequency and the filter order by analysing the input signal local variations. Thus, it correlates the processing activity with the signal variations. It leads towards a drastic computational gain of the proposed technique compared to the classical one.

Keywords: Level Crossing Sampling, Activity Selection, Adaptive Rate Filtering, Computational Complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1329
2597 Comparative Analysis of Two Approaches to Joint Signal Detection, ToA and AoA Estimation in Multi-Element Antenna Arrays

Authors: Olesya Bolkhovskaya, Alexey Davydov, Alexander Maltsev

Abstract:

In this paper two approaches to joint signal detection, time of arrival (ToA) and angle of arrival (AoA) estimation in multi-element antenna array are investigated. Two scenarios were considered: first one, when the waveform of the useful signal is known a priori and, second one, when the waveform of the desired signal is unknown. For first scenario, the antenna array signal processing based on multi-element matched filtering (MF) with the following non-coherent detection scheme and maximum likelihood (ML) parameter estimation blocks is exploited. For second scenario, the signal processing based on the antenna array elements covariance matrix estimation with the following eigenvector analysis and ML parameter estimation blocks is applied. The performance characteristics of both signal processing schemes are thoroughly investigated and compared for different useful signals and noise parameters.

Keywords: Antenna array, signal detection, ToA, AoA estimation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2013
2596 Hybrid Modeling Algorithm for Continuous Tamil Speech Recognition

Authors: M. Kalamani, S. Valarmathy, M. Krishnamoorthi

Abstract:

In this paper, Fuzzy C-Means clustering with Expectation Maximization-Gaussian Mixture Model based hybrid modeling algorithm is proposed for Continuous Tamil Speech Recognition. The speech sentences from various speakers are used for training and testing phase and objective measures are between the proposed and existing Continuous Speech Recognition algorithms. From the simulated results, it is observed that the proposed algorithm improves the recognition accuracy and F-measure up to 3% as compared to that of the existing algorithms for the speech signal from various speakers. In addition, it reduces the Word Error Rate, Error Rate and Error up to 4% as compared to that of the existing algorithms. In all aspects, the proposed hybrid modeling for Tamil speech recognition provides the significant improvements for speechto- text conversion in various applications.

Keywords: Speech Segmentation, Feature Extraction, Clustering, HMM, EM-GMM, CSR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2100
2595 Assamese Numeral Corpus for Speech Recognition using Cooperative ANN Architecture

Authors: Mousmita Sarma, Krishna Dutta, Kandarpa Kumar Sarma

Abstract:

Speech corpus is one of the major components in a Speech Processing System where one of the primary requirements is to recognize an input sample. The quality and details captured in speech corpus directly affects the precision of recognition. The current work proposes a platform for speech corpus generation using an adaptive LMS filter and LPC cepstrum, as a part of an ANN based Speech Recognition System which is exclusively designed to recognize isolated numerals of Assamese language- a major language in the North Eastern part of India. The work focuses on designing an optimal feature extraction block and a few ANN based cooperative architectures so that the performance of the Speech Recognition System can be improved.

Keywords: Filter, Feature, LMS, LPC, Cepstrum, ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2351
2594 A Semi- One Time Pad Using Blind Source Separation for Speech Encryption

Authors: Long Jye Sheu, Horng-Shing Chiou, Wei Ching Chen

Abstract:

We propose a new perspective on speech communication using blind source separation. The original speech is mixed with key signals which consist of the mixing matrix, chaotic signals and a random noise. However, parts of the keys (the mixing matrix and the random noise) are not necessary in decryption. In practice implement, one can encrypt the speech by changing the noise signal every time. Hence, the present scheme obtains the advantages of a One Time Pad encryption while avoiding its drawbacks in key exchange. It is demonstrated that the proposed scheme is immune against traditional attacks.

Keywords: one time pad, blind source separation, independentcomponent analysis, speech encryption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1531
2593 Design of Medical Information Storage System – ECG Signal

Authors: A. Rubiano F, N. Olarte, D. Lara

Abstract:

This paper presents the design, implementation and results related to the storage system of medical information associated to the ECG (Electrocardiography) signal. The system includes the signal acquisition modules, the preprocessing and signal processing, followed by a module of transmission and reception of the signal, along with the storage and web display system of the medical platform. The tests were initially performed with this signal, with the purpose to include more biosignal under the same system in the future.

Keywords: Acquisition, ECG Signal, Storage, Web Platform

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2231
2592 Pulsed Multi-Layered Image Filtering: A VLSI Implementation

Authors: Christian Mayr, Holger Eisenreich, Stephan Henker, René Schüffny

Abstract:

Image convolution similar to the receptive fields found in mammalian visual pathways has long been used in conventional image processing in the form of Gabor masks. However, no VLSI implementation of parallel, multi-layered pulsed processing has been brought forward which would emulate this property. We present a technical realization of such a pulsed image processing scheme. The discussed IC also serves as a general testbed for VLSI-based pulsed information processing, which is of interest especially with regard to the robustness of representing an analog signal in the phase or duration of a pulsed, quasi-digital signal, as well as the possibility of direct digital manipulation of such an analog signal. The network connectivity and processing properties are reconfigurable so as to allow adaptation to various processing tasks.

Keywords: Neural image processing, pulse computation application, pulsed Gabor convolution, VLSI pulse routing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1355
2591 Speaker Identification using Neural Networks

Authors: R.V Pawar, P.P.Kajave, S.N.Mali

Abstract:

The speech signal conveys information about the identity of the speaker. The area of speaker identification is concerned with extracting the identity of the person speaking the utterance. As speech interaction with computers becomes more pervasive in activities such as the telephone, financial transactions and information retrieval from speech databases, the utility of automatically identifying a speaker is based solely on vocal characteristic. This paper emphasizes on text dependent speaker identification, which deals with detecting a particular speaker from a known population. The system prompts the user to provide speech utterance. System identifies the user by comparing the codebook of speech utterance with those of the stored in the database and lists, which contain the most likely speakers, could have given that speech utterance. The speech signal is recorded for N speakers further the features are extracted. Feature extraction is done by means of LPC coefficients, calculating AMDF, and DFT. The neural network is trained by applying these features as input parameters. The features are stored in templates for further comparison. The features for the speaker who has to be identified are extracted and compared with the stored templates using Back Propogation Algorithm. Here, the trained network corresponds to the output; the input is the extracted features of the speaker to be identified. The network does the weight adjustment and the best match is found to identify the speaker. The number of epochs required to get the target decides the network performance.

Keywords: Average Mean Distance function, Backpropogation, Linear Predictive Coding, MultilayeredPerceptron,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1860
2590 Voice Driven Applications in Non-stationary and Chaotic Environment

Authors: C. Kwan, X. Li, D. Lao, Y. Deng, Z. Ren, B. Raj, R. Singh, R. Stern

Abstract:

Automated operations based on voice commands will become more and more important in many applications, including robotics, maintenance operations, etc. However, voice command recognition rates drop quite a lot under non-stationary and chaotic noise environments. In this paper, we tried to significantly improve the speech recognition rates under non-stationary noise environments. First, 298 Navy acronyms have been selected for automatic speech recognition. Data sets were collected under 4 types of noisy environments: factory, buccaneer jet, babble noise in a canteen, and destroyer. Within each noisy environment, 4 levels (5 dB, 15 dB, 25 dB, and clean) of Signal-to-Noise Ratio (SNR) were introduced to corrupt the speech. Second, a new algorithm to estimate speech or no speech regions has been developed, implemented, and evaluated. Third, extensive simulations were carried out. It was found that the combination of the new algorithm, the proper selection of language model and a customized training of the speech recognizer based on clean speech yielded very high recognition rates, which are between 80% and 90% for the four different noisy conditions. Fourth, extensive comparative studies have also been carried out.

Keywords: Non-stationary, speech recognition, voice commands.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1498
2589 Voice Disorders Identification Using Hybrid Approach: Wavelet Analysis and Multilayer Neural Networks

Authors: L. Salhi, M. Talbi, A. Cherif

Abstract:

This paper presents a new strategy of identification and classification of pathological voices using the hybrid method based on wavelet transform and neural networks. After speech acquisition from a patient, the speech signal is analysed in order to extract the acoustic parameters such as the pitch, the formants, Jitter, and shimmer. Obtained results will be compared to those normal and standard values thanks to a programmable database. Sounds are collected from normal people and patients, and then classified into two different categories. Speech data base is consists of several pathological and normal voices collected from the national hospital “Rabta-Tunis". Speech processing algorithm is conducted in a supervised mode for discrimination of normal and pathology voices and then for classification between neural and vocal pathologies (Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation results will be presented in function of the disease and will be compared with the clinical diagnosis in order to have an objective evaluation of the developed tool.

Keywords: Formants, Neural Networks, Pathological Voices, Pitch, Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2803
2588 Spectral Analysis of Speech: A New Technique

Authors: Neeta Awasthy, J.P.Saini, D.S.Chauhan

Abstract:

ICA which is generally used for blind source separation problem has been tested for feature extraction in Speech recognition system to replace the phoneme based approach of MFCC. Applying the Cepstral coefficients generated to ICA as preprocessing has developed a new signal processing approach. This gives much better results against MFCC and ICA separately, both for word and speaker recognition. The mixing matrix A is different before and after MFCC as expected. As Mel is a nonlinear scale. However, cepstrals generated from Linear Predictive Coefficient being independent prove to be the right candidate for ICA. Matlab is the tool used for all comparisons. The database used is samples of ISOLET.

Keywords: Cepstral Coefficient, Distance measures, Independent Component Analysis, Linear Predictive Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1917
2587 A Tool for Audio Quality Evaluation Under Hostile Environment

Authors: Akhil Kumar Arya, Jagdeep Singh Lather, Lillie Dewan

Abstract:

In this paper is to evaluate audio and speech quality with the help of Digital Audio Watermarking Technique under the different types of attacks (signal impairments) like Gaussian Noise, Compression Error and Jittering Effect. Further attacks are considered as Hostile Environment. Audio and Speech Quality Evaluation is an important research topic. The traditional way for speech quality evaluation is using subjective tests. They are reliable, but very expensive, time consuming, and cannot be used in certain applications such as online monitoring. Objective models, based on human perception, were developed to predict the results of subjective tests. The existing objective methods require either the original speech or complicated computation model, which makes some applications of quality evaluation impossible.

Keywords: Digital Watermarking, DCT, Speech Quality, Attacks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
2586 Automatic Recognition of Emotionally Coloured Speech

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou

Abstract:

Emotion in speech is an issue that has been attracting the interest of the speech community for many years, both in the context of speech synthesis as well as in automatic speech recognition (ASR). In spite of the remarkable recent progress in Large Vocabulary Recognition (LVR), it is still far behind the ultimate goal of recognising free conversational speech uttered by any speaker in any environment. Current experimental tests prove that using state of the art large vocabulary recognition systems the error rate increases substantially when applied to spontaneous/emotional speech. This paper shows that recognition rate for emotionally coloured speech can be improved by using a language model based on increased representation of emotional utterances.

Keywords: Statistical language model, N-grams, emotionallycoloured speech

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577
2585 EEG Signal Processing Methods to Differentiate Mental States

Authors: Sun H. Hwang, Young E. Lee, Yunhan Ga, Gilwon Yoon

Abstract:

EEG is a very complex signal with noises and other bio-potential interferences. EOG is the most distinct interfering signal when EEG signals are measured and analyzed. It is very important how to process raw EEG signals in order to obtain useful information. In this study, the EEG signal processing techniques such as EOG filtering and outlier removal were examined to minimize unwanted EOG signals and other noises. The two different mental states of resting and focusing were examined through EEG analysis. A focused state was induced by letting subjects to watch a red dot on the white screen. EEG data for 32 healthy subjects were measured. EEG data after 60-Hz notch filtering were processed by a commercially available EOG filtering and our presented algorithm based on the removal of outliers. The ratio of beta wave to theta wave was used as a parameter for determining the degree of focusing. The results show that our algorithm was more appropriate than the existing EOG filtering.

Keywords: EEG, focus, mental state, outlier, signal processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488
2584 Environmental Interference Cancellation of Speech with the Radial Basis Function Networks: An Experimental Comparison

Authors: Nima Hatami

Abstract:

In this paper, we use Radial Basis Function Networks (RBFN) for solving the problem of environmental interference cancellation of speech signal. We show that the Second Order Thin- Plate Spline (SOTPS) kernel cancels the interferences effectively. For make comparison, we test our experiments on two conventional most used RBFN kernels: the Gaussian and First order TPS (FOTPS) basis functions. The speech signals used here were taken from the OGI Multi-Language Telephone Speech Corpus database and were corrupted with six type of environmental noise from NOISEX-92 database. Experimental results show that the SOTPS kernel can considerably outperform the Gaussian and FOTPS functions on speech interference cancellation problem.

Keywords: Environmental interference, interference cancellation of speech, Radial Basis Function networks, Gaussian and TPS kernels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527
2583 Effect of Visual Speech in Sign Speech Synthesis

Authors: Zdenek Krnoul

Abstract:

This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips. Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech. Proposed methodology is inspected on Czech speech. Hence, this article presents a procedure of recording of speech data in order to set a synthesis system as well as to evaluate synthesized speech. Furthermore, one option of the evaluation process is elaborated in the form of a perceptual test. This test procedure is verified on the measured data with two settings of the synthesis system. The results of the perceptual test are presented as a statistically significant increase of intelligibility evoked by real and synthesized visual speech. Now, the aim is to show one part of evaluation process which leads to more comprehensive evaluation of the sign speech synthesis system.

Keywords: Perception test, Sign speech synthesis, Talking head, Visual speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428
2582 Online Prediction of Nonlinear Signal Processing Problems Based Kernel Adaptive Filtering

Authors: Hamza Nejib, Okba Taouali

Abstract:

This paper presents two of the most knowing kernel adaptive filtering (KAF) approaches, the kernel least mean squares and the kernel recursive least squares, in order to predict a new output of nonlinear signal processing. Both of these methods implement a nonlinear transfer function using kernel methods in a particular space named reproducing kernel Hilbert space (RKHS) where the model is a linear combination of kernel functions applied to transform the observed data from the input space to a high dimensional feature space of vectors, this idea known as the kernel trick. Then KAF is the developing filters in RKHS. We use two nonlinear signal processing problems, Mackey Glass chaotic time series prediction and nonlinear channel equalization to figure the performance of the approaches presented and finally to result which of them is the adapted one.

Keywords: KLMS, online prediction, KAF, signal processing, RKHS, Kernel methods, KRLS, KLMS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1012
2581 Efficient Filtering of Graph Based Data Using Graph Partitioning

Authors: Nileshkumar Vaishnav, Aditya Tatu

Abstract:

An algebraic framework for processing graph signals axiomatically designates the graph adjacency matrix as the shift operator. In this setup, we often encounter a problem wherein we know the filtered output and the filter coefficients, and need to find out the input graph signal. Solution to this problem using direct approach requires O(N3) operations, where N is the number of vertices in graph. In this paper, we adapt the spectral graph partitioning method for partitioning of graphs and use it to reduce the computational cost of the filtering problem. We use the example of denoising of the temperature data to illustrate the efficacy of the approach.

Keywords: Graph signal processing, graph partitioning, inverse filtering on graphs, algebraic signal processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1177
2580 A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children

Authors: Mumtaz Begum Mustafa, Siti Salwah Salim, Feizal Dani Rahman

Abstract:

Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.

Keywords: Automatic speech recognition system, children speech, adaptation, Malay.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1724
2579 On Preprocessing of Speech Signals

Authors: Ayaz Keerio, Bhargav Kumar Mitra, Philip Birch, Rupert Young, Chris Chatwin

Abstract:

Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.

Keywords: STE based methods, Mahalanobis distance, 3 edit σ rule, Hampel Identifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658
2578 A New Time-Frequency Speech Analysis Approach Based On Adaptive Fourier Decomposition

Authors: Liming Zhang

Abstract:

In this paper, a new adaptive Fourier decomposition (AFD) based time-frequency speech analysis approach is proposed. Given the fact that the fundamental frequency of speech signals often undergo fluctuation, the classical short-time Fourier transform (STFT) based spectrogram analysis suffers from the difficulty of window size selection. AFD is a newly developed signal decomposition theory. It is designed to deal with time-varying non-stationary signals. Its outstanding characteristic is to provide instantaneous frequency for each decomposed component, so the time-frequency analysis becomes easier. Experiments are conducted based on the sample sentence in TIMIT Acoustic-Phonetic Continuous Speech Corpus. The results show that the AFD based time-frequency distribution outperforms the STFT based one.

Keywords: Adaptive fourier decomposition, instantaneous frequency, speech analysis, time-frequency distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1686
2577 AC Signals Estimation from Irregular Samples

Authors: Predrag B. Petrović

Abstract:

The paper deals with the estimation of amplitude and phase of an analogue multi-harmonic band-limited signal from irregularly spaced sampling values. To this end, assuming the signal fundamental frequency is known in advance (i.e., estimated at an independent stage), a complexity-reduced algorithm for signal reconstruction in time domain is proposed. The reduction in complexity is achieved owing to completely new analytical and summarized expressions that enable a quick estimation at a low numerical error. The proposed algorithm for the calculation of the unknown parameters requires O((2M+1)2) flops, while the straightforward solution of the obtained equations takes O((2M+1)3) flops (M is the number of the harmonic components). It is applied in signal reconstruction, spectral estimation, system identification, as well as in other important signal processing problems. The proposed method of processing can be used for precise RMS measurements (for power and energy) of a periodic signal based on the presented signal reconstruction. The paper investigates the errors related to the signal parameter estimation, and there is a computer simulation that demonstrates the accuracy of these algorithms.

Keywords: Band-limited signals, Fourier coefficient estimation, analytical solutions, signal reconstruction, time.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1709
2576 Enhanced Gram-Schmidt Process for Improving the Stability in Signal and Image Processing

Authors: Mario Mastriani, Marcelo Naiouf

Abstract:

The Gram-Schmidt Process (GSP) is used to convert a non-orthogonal basis (a set of linearly independent vectors) into an orthonormal basis (a set of orthogonal, unit-length vectors). The process consists of taking each vector and then subtracting the elements in common with the previous vectors. This paper introduces an Enhanced version of the Gram-Schmidt Process (EGSP) with inverse, which is useful for signal and image processing applications.

Keywords: Digital filters, digital signal and image processing, Gram-Schmidt Process, orthonormalization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2846
2575 Automotive 3-Microphone Noise Canceller in a Frequently Moving Noise Source Environment

Authors: Z. Qi, T. J. Moir

Abstract:

A combined three-microphone voice activity detector (VAD) and noise-canceling system is studied to enhance speech recognition in an automobile environment. A previous experiment clearly shows the ability of the composite system to cancel a single noise source outside of a defined zone. This paper investigates the performance of the composite system when there are frequently moving noise sources (noise sources are coming from different locations but are not always presented at the same time) e.g. there is other passenger speech or speech from a radio when a desired speech is presented. To work in a frequently moving noise sources environment, whilst a three-microphone voice activity detector (VAD) detects voice from a “VAD valid zone", the 3-microphone noise canceller uses a “noise canceller valid zone" defined in freespace around the users head. Therefore, a desired voice should be in the intersection of the noise canceller valid zone and VAD valid zone. Thus all noise is suppressed outside this intersection of area. Experiments are shown for a real environment e.g. all results were recorded in a car by omni-directional electret condenser microphones.

Keywords: Signal processing, voice activity detection, noise canceller, microphone array beam forming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1576
2574 Slovenian Text-to-Speech Synthesis for Speech User Interfaces

Authors: Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešić, Mario Žganec, Stanislav Gruden

Abstract:

The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.

Keywords: text-to-speech synthesis, prosody modeling, speech user interface.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1406