Search results for: Speech coding.
505 High Quality Speech Coding using Combined Parametric and Perceptual Modules
Authors: M. Kulesza, G. Szwoch, A. Czyżewski
Abstract:
A novel approach to speech coding using the hybrid architecture is presented. Advantages of parametric and perceptual coding methods are utilized together in order to create a speech coding algorithm assuring better signal quality than in traditional CELP parametric codec. Two approaches are discussed. One is based on selection of voiced signal components that are encoded using parametric algorithm, unvoiced components that are encoded perceptually and transients that remain unencoded. The second approach uses perceptual encoding of the residual signal in CELP codec. The algorithm applied for precise transient selection is described. Signal quality achieved using the proposed hybrid codec is compared to quality of some standard speech codecs.
Keywords: CELP residual coding, hybrid codec architecture, perceptual speech coding, speech codecs comparison.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1529504 A High Quality Speech Coder at 600 bps
Authors: Yong Zhang, Ruimin Hu
Abstract:
This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.
Keywords: Speech coding, Vector quantization, linear predicition, Mixed sinusoidal excitation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2187503 Speech Coding and Recognition
Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha
Abstract:
This paper investigates the performance of a speech recognizer in an interactive voice response system for various coded speech signals, coded by using a vector quantization technique namely Multi Switched Split Vector Quantization Technique. The process of recognizing the coded output can be used in Voice banking application. The recognition technique used for the recognition of the coded speech signals is the Hidden Markov Model technique. The spectral distortion performance, computational complexity, and memory requirements of Multi Switched Split Vector Quantization Technique and the performance of the speech recognizer at various bit rates have been computed. From results it is found that the speech recognizer is showing better performance at 24 bits/frame and it is found that the percentage of recognition is being varied from 100% to 93.33% for various bit rates.Keywords: Linear predictive coding, Speech Recognition, Voice banking, Multi Switched Split Vector Quantization, Hidden Markov Model, Linear Predictive Coefficients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1844502 Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System
Authors: Yueming Wang, Rendong Ying, Sumxin Jiang, Peilin Liu
Abstract:
Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.
Keywords: ID3 Decision Tree, MFCC, Orchestra/Percussion Classification, USAC
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673501 Speech Data Compression using Vector Quantization
Authors: H. B. Kekre, Tanuja K. Sarode
Abstract:
Mostly transforms are used for speech data compressions which are lossy algorithms. Such algorithms are tolerable for speech data compression since the loss in quality is not perceived by the human ear. However the vector quantization (VQ) has a potential to give more data compression maintaining the same quality. In this paper we propose speech data compression algorithm using vector quantization technique. We have used VQ algorithms LBG, KPE and FCG. The results table shows computational complexity of these three algorithms. Here we have introduced a new performance parameter Average Fractional Change in Speech Sample (AFCSS). Our FCG algorithm gives far better performance considering mean absolute error, AFCSS and complexity as compared to others.Keywords: Vector Quantization, Data Compression, Encoding, , Speech coding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2402500 Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems
Authors: К. R. Aida–Zade, C. Ardil, S. S. Rustamov
Abstract:
Statement of the automatic speech recognition problem, the assignment of speech recognition and the application fields are shown in the paper. At the same time as Azerbaijan speech, the establishment principles of speech recognition system and the problems arising in the system are investigated. The computing algorithms of speech features, being the main part of speech recognition system, are analyzed. From this point of view, the determination algorithms of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features are developed. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested to improve the reliability of speech recognition system. To this end, the recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. This results in decrease of error rate during recognition. The training and recognition processes are realized by artificial neural networks in the automatic speech recognition system. The neural networks are trained by the conjugate gradient method. In the paper the problems observed by the number of speech features at training the neural networks of MFCC and LPC-based speech recognition subsystems are investigated. The variety of results of neural networks trained from different initial points in training process is analyzed. Methodology of combined use of neural networks trained from different initial points in speech recognition system is suggested to improve the reliability of recognition system and increase the recognition quality, and obtained practical results are shown.Keywords: Speech recognition, cepstral analysis, Voice activation detection algorithm, Mel Frequency Cepstral Coefficients, features of speech, Cepstral Mean Subtraction, neural networks, Linear Predictive Coding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 912499 Multi Switched Split Vector Quantization of Narrowband Speech Signals
Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha
Abstract:
Vector quantization is a powerful tool for speech coding applications. This paper deals with LPC Coding of speech signals which uses a new technique called Multi Switched Split Vector Quantization (MSSVQ), which is a hybrid of Multi, switched, split vector quantization techniques. The spectral distortion performance, computational complexity, and memory requirements of MSSVQ are compared to split vector quantization (SVQ), multi stage vector quantization(MSVQ) and switched split vector quantization (SSVQ) techniques. It has been proved from results that MSSVQ has better spectral distortion performance, lower computational complexity and lower memory requirements when compared to all the above mentioned product code vector quantization techniques. Computational complexity is measured in floating point operations (flops), and memory requirements is measured in (floats).Keywords: Linear predictive Coding, Multi stage vectorquantization, Switched Split vector quantization, Split vectorquantization, Line Spectral Frequencies (LSF).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1671498 Coding of DWT Coefficients using Run-length Coding and Huffman Coding for the Purpose of Color Image Compression
Authors: Varun Setia, Vinod Kumar
Abstract:
In present paper we proposed a simple and effective method to compress an image. Here we found success in size reduction of an image without much compromising with it-s quality. Here we used Haar Wavelet Transform to transform our original image and after quantization and thresholding of DWT coefficients Run length coding and Huffman coding schemes have been used to encode the image. DWT is base for quite populate JPEG 2000 technique.
Keywords: Lossy compression, DWT, quantization, Run length coding, Huffman coding, JPEG2000.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2922497 Near-Lossless Image Coding based on Orthogonal Polynomials
Authors: Krishnamoorthy R, Rajavijayalakshmi K, Punidha R
Abstract:
In this paper, a near lossless image coding scheme based on Orthogonal Polynomials Transform (OPT) has been presented. The polynomial operators and polynomials basis operators are obtained from set of orthogonal polynomials functions for the proposed transform coding. The image is partitioned into a number of distinct square blocks and the proposed transform coding is applied to each of these individually. After applying the proposed transform coding, the transformed coefficients are rearranged into a sub-band structure. The Embedded Zerotree (EZ) coding algorithm is then employed to quantize the coefficients. The proposed transform is implemented for various block sizes and the performance is compared with existing Discrete Cosine Transform (DCT) transform coding scheme.Keywords: Near-lossless Coding, Orthogonal Polynomials Transform, Embedded Zerotree Coding
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1944496 Peak-to-Average Power Ratio Reduction in OFDM Systems using Huffman Coding
Authors: Ashraf A. Eltholth, Adel R. Mikhail, A. Elshirbini, Moawad I. Moawad, A. I. Abdelfattah
Abstract:
In this paper we proposed the use of Huffman coding to reduce the PAR of an OFDM system as a distortionless scrambling technique, and we utilize the amount saved in the total bit rate by the Huffman coding to send the encoding table for accurate decoding at the receiver without reducing the effective throughput. We found that the use of Huffman coding reduces the PAR by about 6 dB. Also we have investigated the effect of PAR reduction due to Huffman coding through testing the spectral spreading and the inband distortion due to HPA with different IBO values. We found a complete match of our expectation from the proposed solution with the obtained simulation results.Keywords: HPA, Huffman coding, OFDM, PAR
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2596495 Eukaryotic Gene Prediction by an Investigation of Nonlinear Dynamical Modeling Techniques on EIIP Coded Sequences
Authors: Mai S. Mabrouk, Nahed H. Solouma, Abou-Bakr M. Youssef, Yasser M. Kadah
Abstract:
Many digital signal processing, techniques have been used to automatically distinguish protein coding regions (exons) from non-coding regions (introns) in DNA sequences. In this work, we have characterized these sequences according to their nonlinear dynamical features such as moment invariants, correlation dimension, and largest Lyapunov exponent estimates. We have applied our model to a number of real sequences encoded into a time series using EIIP sequence indicators. In order to discriminate between coding and non coding DNA regions, the phase space trajectory was first reconstructed for coding and non-coding regions. Nonlinear dynamical features are extracted from those regions and used to investigate a difference between them. Our results indicate that the nonlinear dynamical characteristics have yielded significant differences between coding (CR) and non-coding regions (NCR) in DNA sequences. Finally, the classifier is tested on real genes where coding and non-coding regions are well known.
Keywords: Gene prediction, nonlinear dynamics, correlation dimension, Lyapunov exponent.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1824494 Automatic Recognition of Emotionally Coloured Speech
Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou
Abstract:
Emotion in speech is an issue that has been attracting the interest of the speech community for many years, both in the context of speech synthesis as well as in automatic speech recognition (ASR). In spite of the remarkable recent progress in Large Vocabulary Recognition (LVR), it is still far behind the ultimate goal of recognising free conversational speech uttered by any speaker in any environment. Current experimental tests prove that using state of the art large vocabulary recognition systems the error rate increases substantially when applied to spontaneous/emotional speech. This paper shows that recognition rate for emotionally coloured speech can be improved by using a language model based on increased representation of emotional utterances.Keywords: Statistical language model, N-grams, emotionallycoloured speech
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617493 MIM: A Species Independent Approach for Classifying Coding and Non-Coding DNA Sequences in Bacterial and Archaeal Genomes
Authors: Achraf El Allali, John R. Rose
Abstract:
A number of competing methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on mutual information has shown good accuracy in classifying DNA sequences into coding and noncoding. In this paper we describe a species independent iterative approach that distinguishes coding from non-coding sequences using the mutual information measure (MIM). A set of sixty prokaryotes is used to extract universal training data. To facilitate comparisons with the published results of other researchers, a test set of 51 bacterial and archaeal genomes was used to evaluate MIM. These results demonstrate that MIM produces superior results while remaining species independent.Keywords: Coding Non-coding Classification, Entropy, GeneRecognition, Mutual Information.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1726492 Effect of Visual Speech in Sign Speech Synthesis
Authors: Zdenek Krnoul
Abstract:
This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips. Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech. Proposed methodology is inspected on Czech speech. Hence, this article presents a procedure of recording of speech data in order to set a synthesis system as well as to evaluate synthesized speech. Furthermore, one option of the evaluation process is elaborated in the form of a perceptual test. This test procedure is verified on the measured data with two settings of the synthesis system. The results of the perceptual test are presented as a statistically significant increase of intelligibility evoked by real and synthesized visual speech. Now, the aim is to show one part of evaluation process which leads to more comprehensive evaluation of the sign speech synthesis system.
Keywords: Perception test, Sign speech synthesis, Talking head, Visual speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1476491 Speaker Identification using Neural Networks
Authors: R.V Pawar, P.P.Kajave, S.N.Mali
Abstract:
The speech signal conveys information about the identity of the speaker. The area of speaker identification is concerned with extracting the identity of the person speaking the utterance. As speech interaction with computers becomes more pervasive in activities such as the telephone, financial transactions and information retrieval from speech databases, the utility of automatically identifying a speaker is based solely on vocal characteristic. This paper emphasizes on text dependent speaker identification, which deals with detecting a particular speaker from a known population. The system prompts the user to provide speech utterance. System identifies the user by comparing the codebook of speech utterance with those of the stored in the database and lists, which contain the most likely speakers, could have given that speech utterance. The speech signal is recorded for N speakers further the features are extracted. Feature extraction is done by means of LPC coefficients, calculating AMDF, and DFT. The neural network is trained by applying these features as input parameters. The features are stored in templates for further comparison. The features for the speaker who has to be identified are extracted and compared with the stored templates using Back Propogation Algorithm. Here, the trained network corresponds to the output; the input is the extracted features of the speaker to be identified. The network does the weight adjustment and the best match is found to identify the speaker. The number of epochs required to get the target decides the network performance.Keywords: Average Mean Distance function, Backpropogation, Linear Predictive Coding, MultilayeredPerceptron,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892490 Parallel Joint Channel Coding and Cryptography
Authors: Nataša Živić, Christoph Ruland
Abstract:
Method of Parallel Joint Channel Coding and Cryptography has been analyzed and simulated in this paper. The method is an extension of Soft Input Decryption with feedback, which is used for improvement of channel decoding of secured messages. Parallel Joint Channel Coding and Cryptography results in improved coding gain of channel decoding, which achieves more than 2 dB. Such results are an implication of a combination of receiver components and their interoperability.Keywords: Block length, Coding gain, Feedback, L-values, Parallel Joint Channel Coding and Cryptography, Soft Input Decryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1584489 Enhanced Frame-based Video Coding to Support Content-based Functionalities
Authors: Prabhudev Hosur, Rolando Carrasco
Abstract:
This paper presents the enhanced frame-based video coding scheme. The input source video to the enhanced frame-based video encoder consists of a rectangular-size video and shapes of arbitrarily-shaped objects on video frames. The rectangular frame texture is encoded by the conventional frame-based coding technique and the video object-s shape is encoded using the contour-based vertex coding. It is possible to achieve several useful content-based functionalities by utilizing the shape information in the bitstream at the cost of a very small overhead to the bitrate.
Keywords: Video coding, content-based, hyper video, interactivity, shape coding, polygon.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1661488 The Main Principles of Text-to-Speech Synthesis System
Authors: K.R. Aida–Zade, C. Ardil, A.M. Sharifova
Abstract:
In this paper, the main principles of text-to-speech synthesis system are presented. Associated problems which arise when developing speech synthesis system are described. Used approaches and their application in the speech synthesis systems for Azerbaijani language are shown.
Keywords: synthesis of Azerbaijani language, morphemes, phonemes, sounds, sentence, speech synthesizer, intonation, accent, pronunciation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5652487 TeleMe Speech Booster: Web-Based Speech Therapy and Training Program for Children with Articulation Disorders
Authors: C. Treerattanaphan, P. Boonpramuk, P. Singla
Abstract:
Frequent, continuous speech training has proven to be a necessary part of a successful speech therapy process, but constraints of traveling time and employment dispensation become key obstacles especially for individuals living in remote areas or for dependent children who have working parents. In order to ameliorate speech difficulties with ample guidance from speech therapists, a website has been developed that supports speech therapy and training for people with articulation disorders in the standard Thai language. This web-based program has the ability to record speech training exercises for each speech trainee. The records will be stored in a database for the speech therapist to investigate, evaluate, compare and keep track of all trainees’ progress in detail. Speech trainees can request live discussions via video conference call when needed. Communication through this web-based program facilitates and reduces training time in comparison to walk-in training or appointments. This type of training also allows people with articulation disorders to practice speech lessons whenever or wherever is convenient for them, which can lead to a more regular training processes.
Keywords: Web-Based Remote Training Program, Thai Speech Therapy, Articulation Disorders.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1859486 Blind Speech Separation Using SRP-PHAT Localization and Optimal Beamformer in Two-Speaker Environments
Authors: Hai Quang Hong Dam, Hai Ho, Minh Hoang Le Ngo
Abstract:
This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.Keywords: Blind speech separation, voice activity detector, SRP-PHAT, optimal beamformer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1388485 Bangla Vowel Characterization Based on Analysis by Synthesis
Authors: Syed Akhter Hossain, M. Lutfar Rahman, Farruk Ahmed
Abstract:
Bangla Vowel characterization determines the spectral properties of Bangla vowels for efficient synthesis as well as recognition of Bangla vowels. In this paper, Bangla vowels in isolated word have been analyzed based on speech production model within the framework of Analysis-by-Synthesis. This has led to the extraction of spectral parameters for the production model in order to produce different Bangla vowel sounds. The real and synthetic spectra are compared and a weighted square error has been computed along with the error in the formant bandwidths for efficient representation of Bangla vowels. The extracted features produced good representation of targeted Bangla vowel. Such a representation also plays essential role in low bit rate speech coding and vocoders.
Keywords: Speech, vowel, formant, synthesis, spectrum, LPC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2370484 Evaluation of a Multi-Resolution Dyadic Wavelet Transform Method for usable Speech Detection
Authors: Wajdi Ghezaiel, Amel Ben Slimane Rahmouni, Ezzedine Ben Braiek
Abstract:
Many applications of speech communication and speaker identification suffer from the problem of co-channel speech. This paper deals with a multi-resolution dyadic wavelet transform method for usable segments of co-channel speech detection that could be processed by a speaker identification system. Evaluation of this method is performed on TIMIT database referring to the Target to Interferer Ratio measure. Co-channel speech is constructed by mixing all possible gender speakers. Results do not show much difference for different mixtures. For the overall mixtures 95.76% of usable speech is correctly detected with false alarms of 29.65%.Keywords: Co-channel speech, usable speech, multi-resolutionanalysis, speaker identification
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365483 Narrowband Speech Hiding using Vector Quantization
Authors: Driss Guerchi, Fatiha Djebbar
Abstract:
In this work we introduce an efficient method to limit the impact of the hiding process on the quality of the cover speech. Vector quantization of the speech spectral information reduces drastically the number of the secret speech parameters to be embedded in the cover signal. Compared to scalar hiding, vector quantization hiding technique provides a stego signal that is indistinguishable from the cover speech. The objective and subjective performance measures reveal that the current hiding technique attracts no suspicion about the presence of the secret message in the stego speech, while being able to recover an intelligible copy of the secret message at the receiver side.Keywords: Speech steganography, LSF vector quantization, fast Fourier transform
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1514482 Using Teager Energy Cepstrum and HMM distancesin Automatic Speech Recognition and Analysis of Unvoiced Speech
Authors: Panikos Heracleous
Abstract:
In this study, the use of silicon NAM (Non-Audible Murmur) microphone in automatic speech recognition is presented. NAM microphones are special acoustic sensors, which are attached behind the talker-s ear and can capture not only normal (audible) speech, but also very quietly uttered speech (non-audible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech conversion etc.) for sound-impaired people. Using a small amount of training data and adaptation approaches, 93.9% word accuracy was achieved for a 20k Japanese vocabulary dictation task. Non-audible murmur recognition in noisy environments is also investigated. In this study, further analysis of the NAM speech has been made using distance measures between hidden Markov model (HMM) pairs. It has been shown the reduced spectral space of NAM speech using a metric distance, however the location of the different phonemes of NAM are similar to the location of the phonemes of normal speech, and the NAM sounds are well discriminated. Promising results in using nonlinear features are also introduced, especially under noisy conditions.Keywords: Speech recognition, unvoiced speech, nonlinear features, HMM distance measures
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646481 Analysis of Combined Use of NN and MFCC for Speech Recognition
Authors: Safdar Tanweer, Abdul Mobin, Afshar Alam
Abstract:
The performance and analysis of speech recognition system is illustrated in this paper. An approach to recognize the English word corresponding to digit (0-9) spoken by 2 different speakers is captured in noise free environment. For feature extraction, speech Mel frequency cepstral coefficients (MFCC) has been used which gives a set of feature vectors from recorded speech samples. Neural network model is used to enhance the recognition performance. Feed forward neural network with back propagation algorithm model is used. However other speech recognition techniques such as HMM, DTW exist. All experiments are carried out on Matlab.
Keywords: Speech Recognition, MFCC, Neural Network, classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3267480 On SNR Estimation by the Likelihood of near Pitch for Speech Detection
Authors: Young-Hwan Song, Doo-Heon Kyun, Jong-Kuk Kim, Myung-Jin Bae
Abstract:
People have the habitual pitch level which is used when people say something generally. However this pitch should be changed irregularly in the presence of noise. So it is useful to estimate SNR of speech signal by pitch. In this paper, we obtain the energy of input speech signal and then we detect a stationary region on voiced speech. And we get the pitch period by NAMDF for the stationary region that is not varied pitch rapidly. After getting pitch, each frame is divided by pitch period and the likelihood of closed pitch is estimated. In this paper, we proposed new parameter, NLF, to estimate the SNR of received speech signal. The NLF is derived from the correlation of near pitch periods. The NLF is obtained for each stationary region in voiced speech. Finally we confirmed good performance of the estimation of the SNR of received input speech in the presence of noise.
Keywords: Likelihood, pitch, SNR, speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1574479 Spectral Entropy Employment in Speech Enhancement based on Wavelet Packet
Authors: Talbi Mourad, Salhi Lotfi, Chérif Adnen
Abstract:
In this work, we are interested in developing a speech denoising tool by using a discrete wavelet packet transform (DWPT). This speech denoising tool will be employed for applications of recognition, coding and synthesis. For noise reduction, instead of applying the classical thresholding technique, some wavelet packet nodes are set to zero and the others are thresholded. To estimate the non stationary noise level, we employ the spectral entropy. A comparison of our proposed technique to classical denoising methods based on thresholding and spectral subtraction is made in order to evaluate our approach. The experimental implementation uses speech signals corrupted by two sorts of noise, white and Volvo noises. The obtained results from listening tests show that our proposed technique is better than spectral subtraction. The obtained results from SNR computation show the superiority of our technique when compared to the classical thresholding method using the modified hard thresholding function based on u-law algorithm.
Keywords: Enhancement, spectral subtraction, SNR, discrete wavelet packet transform, spectral entropy Histogram
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1991478 A Multi-Signature Scheme based on Coding Theory
Authors: Mohammed Meziani, Pierre-Louis Cayrel
Abstract:
In this paper we propose two first non-generic constructions of multisignature scheme based on coding theory. The first system make use of the CFS signature scheme and is secure in random oracle while the second scheme is based on the KKS construction and is a few times. The security of our construction relies on a difficult problems in coding theory: The Syndrome Decoding problem which has been proved NP-complete [4].Keywords: Post-quantum cryptography, Coding-based cryptography, Digital signature, Multisignature scheme.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1879477 Speech Impact Realization via Manipulative Argumentation Techniques in Modern American Political Discourse
Authors: Zarine Avetisyan
Abstract:
The present paper presents the discussion of scholars concerning speech impact, peculiarities of its realization, speech strategies and techniques in particular. Departing from the viewpoints of many prominent linguists, the paper suggests that manipulative argumentation be viewed as a most pervasive speech strategy with a certain set of techniques which are to be found in modern American political discourse. The precedence of their occurrence allows us to regard them as pragmatic patterns of speech impact realization in effective public speaking.Keywords: Manipulative argumentation, political discourse, speech impact, technique.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2289476 Speech Enhancement Using Kalman Filter in Communication
Authors: Eng. Alaa K. Satti Salih
Abstract:
Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.
Keywords: Autoregressive Process, Kalman filter, Matlab and Noise speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4025