Search results for: Speech signal
1373 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy
Authors: Nazaket Gazieva
Abstract:
Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.
Keywords: Biometric voice prints, fundamental frequency, phonogram, speech signal, temporal characteristics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5731372 Speech Enhancement Using Kalman Filter in Communication
Authors: Eng. Alaa K. Satti Salih
Abstract:
Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.
Keywords: Autoregressive Process, Kalman filter, Matlab and Noise speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 40251371 On SNR Estimation by the Likelihood of near Pitch for Speech Detection
Authors: Young-Hwan Song, Doo-Heon Kyun, Jong-Kuk Kim, Myung-Jin Bae
Abstract:
People have the habitual pitch level which is used when people say something generally. However this pitch should be changed irregularly in the presence of noise. So it is useful to estimate SNR of speech signal by pitch. In this paper, we obtain the energy of input speech signal and then we detect a stationary region on voiced speech. And we get the pitch period by NAMDF for the stationary region that is not varied pitch rapidly. After getting pitch, each frame is divided by pitch period and the likelihood of closed pitch is estimated. In this paper, we proposed new parameter, NLF, to estimate the SNR of received speech signal. The NLF is derived from the correlation of near pitch periods. The NLF is obtained for each stationary region in voiced speech. Finally we confirmed good performance of the estimation of the SNR of received input speech in the presence of noise.
Keywords: Likelihood, pitch, SNR, speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15741370 Narrowband Speech Hiding using Vector Quantization
Authors: Driss Guerchi, Fatiha Djebbar
Abstract:
In this work we introduce an efficient method to limit the impact of the hiding process on the quality of the cover speech. Vector quantization of the speech spectral information reduces drastically the number of the secret speech parameters to be embedded in the cover signal. Compared to scalar hiding, vector quantization hiding technique provides a stego signal that is indistinguishable from the cover speech. The objective and subjective performance measures reveal that the current hiding technique attracts no suspicion about the presence of the secret message in the stego speech, while being able to recover an intelligible copy of the secret message at the receiver side.Keywords: Speech steganography, LSF vector quantization, fast Fourier transform
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15141369 Detection of Clipped Fragments in Speech Signals
Authors: Sergei Aleinik, Yuri Matveev
Abstract:
In this paper a novel method for the detection of clipping in speech signals is described. It is shown that the new method has better performance than known clipping detection methods, is easy to implement, and is robust to changes in signal amplitude, size of data, etc. Statistical simulation results are presented.
Keywords: Clipping, clipped signal, speech signal processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26731368 High Quality Speech Coding using Combined Parametric and Perceptual Modules
Authors: M. Kulesza, G. Szwoch, A. Czyżewski
Abstract:
A novel approach to speech coding using the hybrid architecture is presented. Advantages of parametric and perceptual coding methods are utilized together in order to create a speech coding algorithm assuring better signal quality than in traditional CELP parametric codec. Two approaches are discussed. One is based on selection of voiced signal components that are encoded using parametric algorithm, unvoiced components that are encoded perceptually and transients that remain unencoded. The second approach uses perceptual encoding of the residual signal in CELP codec. The algorithm applied for precise transient selection is described. Signal quality achieved using the proposed hybrid codec is compared to quality of some standard speech codecs.
Keywords: CELP residual coding, hybrid codec architecture, perceptual speech coding, speech codecs comparison.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15291367 Adaptive Noise Reduction Algorithm for Speech Enhancement
Authors: M. Kalamani, S. Valarmathy, M. Krishnamoorthi
Abstract:
In this paper, Least Mean Square (LMS) adaptive noise reduction algorithm is proposed to enhance the speech signal from the noisy speech. In this, the speech signal is enhanced by varying the step size as the function of the input signal. Objective and subjective measures are made under various noises for the proposed and existing algorithms. From the experimental results, it is seen that the proposed LMS adaptive noise reduction algorithm reduces Mean square Error (MSE) and Log Spectral Distance (LSD) as compared to that of the earlier methods under various noise conditions with different input SNR levels. In addition, the proposed algorithm increases the Peak Signal to Noise Ratio (PSNR) and Segmental SNR improvement (ΔSNRseg) values; improves the Mean Opinion Score (MOS) as compared to that of the various existing LMS adaptive noise reduction algorithms. From these experimental results, it is observed that the proposed LMS adaptive noise reduction algorithm reduces the speech distortion and residual noise as compared to that of the existing methods.
Keywords: LMS, speech enhancement, speech quality, residual noise.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28051366 Automatic Segmentation of the Clean Speech Signal
Authors: M. A. Ben Messaoud, A. Bouzid, N. Ellouze
Abstract:
Speech Segmentation is the measure of the change point detection for partitioning an input speech signal into regions each of which accords to only one speaker. In this paper, we apply two features based on multi-scale product (MP) of the clean speech, namely the spectral centroid of MP, and the zero crossings rate of MP. We focus on multi-scale product analysis as an important tool for segmentation extraction. The MP is based on making the product of the speech wavelet transform coefficients (WTC). We have estimated our method on the Keele database. The results show the effectiveness of our method. It indicates that the two features can find word boundaries, and extracted the segments of the clean speech.
Keywords: Speech segmentation, Multi-scale product, Spectral centroid, Zero crossings rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25081365 Recognition of Isolated Speech Signals using Simplified Statistical Parameters
Authors: Abhijit Mitra, Bhargav Kumar Mitra, Biswajoy Chatterjee
Abstract:
We present a novel scheme to recognize isolated speech signals using certain statistical parameters derived from those signals. The determination of the statistical estimates is based on extracted signal information rather than the original signal information in order to reduce the computational complexity. Subtle details of these estimates, after extracting the speech signal from ambience noise, are first exploited to segregate the polysyllabic words from the monosyllabic ones. Precise recognition of each distinct word is then carried out by analyzing the histogram, obtained from these information.Keywords: Isolated speech signals, Block overlapping technique, Positive peaks, Histogram analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14251364 Hybrid Method Using Wavelets and Predictive Method for Compression of Speech Signal
Authors: Karima Siham Aoubid, Mohamed Boulemden
Abstract:
The development of the signal compression algorithms is having compressive progress. These algorithms are continuously improved by new tools and aim to reduce, an average, the number of bits necessary to the signal representation by means of minimizing the reconstruction error. The following article proposes the compression of Arabic speech signal by a hybrid method combining the wavelet transform and the linear prediction. The adopted approach rests, on one hand, on the original signal decomposition by ways of analysis filters, which is followed by the compression stage, and on the other hand, on the application of the order 5, as well as, the compression signal coefficients. The aim of this approach is the estimation of the predicted error, which will be coded and transmitted. The decoding operation is then used to reconstitute the original signal. Thus, the adequate choice of the bench of filters is useful to the transform in necessary to increase the compression rate and induce an impercevable distortion from an auditive point of view.Keywords: Compression, linear prediction analysis, multiresolution analysis, speech signal.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13361363 A proposed High-Resolution Time-Frequency Distribution for the Analysis of Multicomponent and Speech Signals
Authors: D. Boutana, B. Barkat , F. Marir
Abstract:
In this paper, we propose a novel time-frequency distribution (TFD) for the analysis of multi-component signals. In particular, we use synthetic as well as real-life speech signals to prove the superiority of the proposed TFD in comparison to some existing ones. In the comparison, we consider the cross-terms suppression and the high energy concentration of the signal around its instantaneous frequency (IF).
Keywords: Cohen's Class, Multicomponent signal, SeparableKernel, Speech signal, Time- frequency resolution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18681362 Computationally Efficient Signal Quality Improvement Method for VoIP System
Authors: H. P. Singh, S. Singh
Abstract:
The voice signal in Voice over Internet protocol (VoIP) system is processed through the best effort policy based IP network, which leads to the network degradations including delay, packet loss jitter. The work in this paper presents the implementation of finite impulse response (FIR) filter for voice quality improvement in the VoIP system through distributed arithmetic (DA) algorithm. The VoIP simulations are conducted with AMR-NB 6.70 kbps and G.729a speech coders at different packet loss rates and the performance of the enhanced VoIP signal is evaluated using the perceptual evaluation of speech quality (PESQ) measurement for narrowband signal. The results show reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal.
Keywords: VoIP, Signal Quality, Distributed Arithmetic, Packet Loss, Speech Coder.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18301361 Speech Intelligibility Improvement Using Variable Level Decomposition DWT
Authors: Samba Raju, Chiluveru, Manoj Tripathy
Abstract:
Intelligibility is an essential characteristic of a speech signal, which is used to help in the understanding of information in speech signal. Background noise in the environment can deteriorate the intelligibility of a recorded speech. In this paper, we presented a simple variance subtracted - variable level discrete wavelet transform, which improve the intelligibility of speech. The proposed algorithm does not require an explicit estimation of noise, i.e., prior knowledge of the noise; hence, it is easy to implement, and it reduces the computational burden. The proposed algorithm decides a separate decomposition level for each frame based on signal dominant and dominant noise criteria. The performance of the proposed algorithm is evaluated with speech intelligibility measure (STOI), and results obtained are compared with Universal Discrete Wavelet Transform (DWT) thresholding and Minimum Mean Square Error (MMSE) methods. The experimental results revealed that the proposed scheme outperformed competing methodsKeywords: Discrete Wavelet Transform, speech intelligibility, STOI, standard deviation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6931360 On the Effectivity of Different Pseudo-Noise and Orthogonal Sequences for Speech Encryption from Correlation Properties
Authors: V. Anil Kumar, Abhijit Mitra, S. R. Mahadeva Prasanna
Abstract:
We analyze the effectivity of different pseudo noise (PN) and orthogonal sequences for encrypting speech signals in terms of perceptual intelligence. Speech signal can be viewed as sequence of correlated samples and each sample as sequence of bits. The residual intelligibility of the speech signal can be reduced by removing the correlation among the speech samples. PN sequences have random like properties that help in reducing the correlation among speech samples. The mean square aperiodic auto-correlation (MSAAC) and the mean square aperiodic cross-correlation (MSACC) measures are used to test the randomness of the PN sequences. Results of the investigation show the effectivity of large Kasami sequences for this purpose among many PN sequences.
Keywords: Speech encryption, pseudo-noise codes, maximallength, Gold, Barker, Kasami, Walsh-Hadamard, autocorrelation, crosscorrelation, figure of merit.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20401359 The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition
Authors: Fawaz S. Al-Anzi, Dia AbuZeina
Abstract:
Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature extraction process, which parameterizes the speech signal to produce the corresponding feature vectors. Feature extraction process aims at approximating the linguistic content that is conveyed by the input speech signal. In speech processing field, there are several methods to extract speech features, however, Mel Frequency Cepstral Coefficients (MFCC) is the popular technique. It has been long observed that the MFCC is dominantly used in the well-known recognizers such as the Carnegie Mellon University (CMU) Sphinx and the Markov Model Toolkit (HTK). Hence, this paper focuses on the MFCC method as the standard choice to identify the different speech segments in order to obtain the language phonemes for further training and decoding steps. Due to MFCC good performance, the previous studies show that the MFCC dominates the Arabic ASR research. In this paper, we demonstrate MFCC as well as the intermediate steps that are performed to get these coefficients using the HTK toolkit.
Keywords: Speech recognition, acoustic features, Mel Frequency Cepstral Coefficients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19731358 A Modified Speech Enhancement Using Adaptive Gain Equalizer with Non linear Spectral Subtraction for Robust Speech Recognition
Authors: C. Ganesh Babu, P. T. Vanathi
Abstract:
In this paper we present an enhanced noise reduction method for robust speech recognition using Adaptive Gain Equalizer with Non linear Spectral Subtraction. In Adaptive Gain Equalizer method (AGE), the input signal is divided into a number of subbands that are individually weighed in time domain, in accordance to the short time Signal-to-Noise Ratio (SNR) in each subband estimation at every time instant. Instead of focusing on suppression the noise on speech enhancement is focused. When analysis was done under various noise conditions for speech recognition, it was found that Adaptive Gain Equalizer method algorithm has an obvious failing point for a SNR of -5 dB, with inadequate levels of noise suppression for SNR less than this point. This work proposes the implementation of AGE when coupled with Non linear Spectral Subtraction (AGE-NSS) for robust speech recognition. The experimental result shows that out AGE-NSS performs the AGE when SNR drops below -5db level.
Keywords: Adaptive Gain Equalizer, Non Linear Spectral Subtraction, Speech Enhancement, and Speech Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17011357 Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language
Authors: Khalid A. Darabkh, Ala F. Khalifeh, Baraa A. Bathech, Saed W. Sabah
Abstract:
Despite the fact that Arabic language is currently one of the most common languages worldwide, there has been only a little research on Arabic speech recognition relative to other languages such as English and Japanese. Generally, digital speech processing and voice recognition algorithms are of special importance for designing efficient, accurate, as well as fast automatic speech recognition systems. However, the speech recognition process carried out in this paper is divided into three stages as follows: firstly, the signal is preprocessed to reduce noise effects. After that, the signal is digitized and hearingized. Consequently, the voice activity regions are segmented using voice activity detection (VAD) algorithm. Secondly, features are extracted from the speech signal using Mel-frequency cepstral coefficients (MFCC) algorithm. Moreover, delta and acceleration (delta-delta) coefficients have been added for the reason of improving the recognition accuracy. Finally, each test word-s features are compared to the training database using dynamic time warping (DTW) algorithm. Utilizing the best set up made for all affected parameters to the aforementioned techniques, the proposed system achieved a recognition rate of about 98.5% which outperformed other HMM and ANN-based approaches available in the literature.Keywords: Arabic speech recognition, MFCC, DTW, VAD.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 40751356 A Novel RLS Based Adaptive Filtering Method for Speech Enhancement
Authors: Pogula Rakesh, T. Kishore Kumar
Abstract:
Speech enhancement is a long standing problem with numerous applications like teleconferencing, VoIP, hearing aids and speech recognition. The motivation behind this research work is to obtain a clean speech signal of higher quality by applying the optimal noise cancellation technique. Real-time adaptive filtering algorithms seem to be the best candidate among all categories of the speech enhancement methods. In this paper, we propose a speech enhancement method based on Recursive Least Squares (RLS) adaptive filter of speech signals. Experiments were performed on noisy data which was prepared by adding AWGN, Babble and Pink noise to clean speech samples at -5dB, 0dB, 5dB and 10dB SNR levels. We then compare the noise cancellation performance of proposed RLS algorithm with existing NLMS algorithm in terms of Mean Squared Error (MSE), Signal to Noise ratio (SNR) and SNR Loss. Based on the performance evaluation, the proposed RLS algorithm was found to be a better optimal noise cancellation technique for speech signals.
Keywords: Adaptive filter, Adaptive Noise Canceller, Mean Squared Error, Noise reduction, NLMS, RLS, SNR, SNR Loss.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31831355 The Effect of Different Compression Schemes on Speech Signals
Authors: Jalal Karam, Raed Saad
Abstract:
This paper studies the effect of different compression constraints and schemes presented in a new and flexible paradigm to achieve high compression ratios and acceptable signal to noise ratios of Arabic speech signals. Compression parameters are computed for variable frame sizes of a level 5 to 7 Discrete Wavelet Transform (DWT) representation of the signals for different analyzing mother wavelet functions. Results are obtained and compared for Global threshold and level dependent threshold techniques. The results obtained also include comparisons with Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error.Keywords: Speech Compression, Wavelets.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17331354 Blind Speech Separation Using SRP-PHAT Localization and Optimal Beamformer in Two-Speaker Environments
Authors: Hai Quang Hong Dam, Hai Ho, Minh Hoang Le Ngo
Abstract:
This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.Keywords: Blind speech separation, voice activity detector, SRP-PHAT, optimal beamformer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13881353 Optimum Cascaded Design for Speech Enhancement Using Kalman Filter
Authors: T. Kishore Kumar
Abstract:
Speech enhancement is the process of eliminating noise and increasing the quality of a speech signal, which is contaminated with other kinds of distortions. This paper is on developing an optimum cascaded system for speech enhancement. This aim is attained without diminishing any relevant speech information and without much computational and time complexity. LMS algorithm, Spectral Subtraction and Kalman filter have been deployed as the main de-noising algorithms in this work. Since these algorithms suffer from respective shortcomings, this work has been undertaken to design cascaded systems in different combinations and the evaluation of such cascades by qualitative (listening) and quantitative (SNR) tests.Keywords: LMS, Kalman filter, Speech Enhancement and Spectral Subtraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17311352 Noise Estimation for Speech Enhancement in Non-Stationary Environments-A New Method
Authors: Ch.V.Rama Rao, Gowthami., Harsha., Rajkumar., M.B.Rama Murthy, K.Srinivasa Rao, K.AnithaSheela
Abstract:
This paper presents a new method for estimating the nonstationary noise power spectral density given a noisy signal. The method is based on averaging the noisy speech power spectrum using time and frequency dependent smoothing factors. These factors are adjusted based on signal-presence probability in individual frequency bins. Signal presence is determined by computing the ratio of the noisy speech power spectrum to its local minimum, which is updated continuously by averaging past values of the noisy speech power spectra with a look-ahead factor. This method adapts very quickly to highly non-stationary noise environments. The proposed method achieves significant improvements over a system that uses voice activity detector (VAD) in noise estimation.Keywords: Noise estimation, Non-stationary noise, Speechenhancement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23401351 End Point Detection for Wavelet Based Speech Compression
Authors: Jalal Karam
Abstract:
In real-field applications, the correct determination of voice segments highly improves the overall system accuracy and minimises the total computation time. This paper presents reliable measures of speech compression by detcting the end points of the speech signals prior to compressing them. The two different compession schemes used are the Global threshold and the Level- Dependent threshold techniques. The performance of the proposed method is tested wirh the Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error parameter measures.
Keywords: Wavelets, End-points Detection, Compression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13771350 A Review in Advanced Digital Signal Processing Systems
Authors: Roza Dastres, Mohsen Soori
Abstract:
Digital Signal Processing (DSP) is the use of digital processing systems by computers in order to perform a variety of signal processing operations. It is the mathematical manipulation of a digital signal's numerical values in order to increase quality as well as effects of signals. DSP can include linear or nonlinear operators in order to process and analyze the input signals. The nonlinear DSP processing is closely related to nonlinear system detection and can be implemented in time, frequency and space-time domains. Applications of the DSP can be presented as control systems, digital image processing, biomedical engineering, speech recognition systems, industrial engineering, health care systems, radar signal processing and telecommunication systems. In this study, advanced methods and different applications of DSP are reviewed in order to move forward the interesting research filed.Keywords: Digital signal processing, advanced telecommunication, nonlinear signal processing, speech recognition systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10371349 Speech Enhancement of Vowels Based on Pitch and Formant Frequency
Authors: R. Rishma Rodrigo, R. Radhika, M. Vanitha Lakshmi
Abstract:
Numerous signal processing based speech enhancement systems have been proposed to improve intelligibility in the presence of noise. Traditionally, studies of neural vowel encoding have focused on the representation of formants (peaks in vowel spectra) in the discharge patterns of the population of auditory-nerve (AN) fibers. A method is presented for recording high-frequency speech components into a low-frequency region, to increase audibility for hearing loss listeners. The purpose of the paper is to enhance the formant of the speech based on the Kaiser window. The pitch and formant of the signal is based on the auto correlation, zero crossing and magnitude difference function. The formant enhancement stage aims to restore the representation of formants at the level of the midbrain. A MATLAB software’s are used for the implementation of the system with low complexity is developed.
Keywords: Formant estimation, formant enhancement, pitch detection, speech analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16411348 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach
Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik
Abstract:
We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.Keywords: Noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9901347 Hybrid Modeling Algorithm for Continuous Tamil Speech Recognition
Authors: M. Kalamani, S. Valarmathy, M. Krishnamoorthi
Abstract:
In this paper, Fuzzy C-Means clustering with Expectation Maximization-Gaussian Mixture Model based hybrid modeling algorithm is proposed for Continuous Tamil Speech Recognition. The speech sentences from various speakers are used for training and testing phase and objective measures are between the proposed and existing Continuous Speech Recognition algorithms. From the simulated results, it is observed that the proposed algorithm improves the recognition accuracy and F-measure up to 3% as compared to that of the existing algorithms for the speech signal from various speakers. In addition, it reduces the Word Error Rate, Error Rate and Error up to 4% as compared to that of the existing algorithms. In all aspects, the proposed hybrid modeling for Tamil speech recognition provides the significant improvements for speechto- text conversion in various applications.
Keywords: Speech Segmentation, Feature Extraction, Clustering, HMM, EM-GMM, CSR.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21391346 A Semi- One Time Pad Using Blind Source Separation for Speech Encryption
Authors: Long Jye Sheu, Horng-Shing Chiou, Wei Ching Chen
Abstract:
We propose a new perspective on speech communication using blind source separation. The original speech is mixed with key signals which consist of the mixing matrix, chaotic signals and a random noise. However, parts of the keys (the mixing matrix and the random noise) are not necessary in decryption. In practice implement, one can encrypt the speech by changing the noise signal every time. Hence, the present scheme obtains the advantages of a One Time Pad encryption while avoiding its drawbacks in key exchange. It is demonstrated that the proposed scheme is immune against traditional attacks.Keywords: one time pad, blind source separation, independentcomponent analysis, speech encryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15711345 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model
Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You
Abstract:
The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.Keywords: Clustering algorithm, potential function, speech signal, the UBSS model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6791344 Speaker Identification using Neural Networks
Authors: R.V Pawar, P.P.Kajave, S.N.Mali
Abstract:
The speech signal conveys information about the identity of the speaker. The area of speaker identification is concerned with extracting the identity of the person speaking the utterance. As speech interaction with computers becomes more pervasive in activities such as the telephone, financial transactions and information retrieval from speech databases, the utility of automatically identifying a speaker is based solely on vocal characteristic. This paper emphasizes on text dependent speaker identification, which deals with detecting a particular speaker from a known population. The system prompts the user to provide speech utterance. System identifies the user by comparing the codebook of speech utterance with those of the stored in the database and lists, which contain the most likely speakers, could have given that speech utterance. The speech signal is recorded for N speakers further the features are extracted. Feature extraction is done by means of LPC coefficients, calculating AMDF, and DFT. The neural network is trained by applying these features as input parameters. The features are stored in templates for further comparison. The features for the speaker who has to be identified are extracted and compared with the stored templates using Back Propogation Algorithm. Here, the trained network corresponds to the output; the input is the extracted features of the speaker to be identified. The network does the weight adjustment and the best match is found to identify the speaker. The number of epochs required to get the target decides the network performance.Keywords: Average Mean Distance function, Backpropogation, Linear Predictive Coding, MultilayeredPerceptron,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892