Search results for: perceptual speech coding
548 High Quality Speech Coding using Combined Parametric and Perceptual Modules
Authors: M. Kulesza, G. Szwoch, A. Czyżewski
Abstract:
A novel approach to speech coding using the hybrid architecture is presented. Advantages of parametric and perceptual coding methods are utilized together in order to create a speech coding algorithm assuring better signal quality than in traditional CELP parametric codec. Two approaches are discussed. One is based on selection of voiced signal components that are encoded using parametric algorithm, unvoiced components that are encoded perceptually and transients that remain unencoded. The second approach uses perceptual encoding of the residual signal in CELP codec. The algorithm applied for precise transient selection is described. Signal quality achieved using the proposed hybrid codec is compared to quality of some standard speech codecs.
Keywords: CELP residual coding, hybrid codec architecture, perceptual speech coding, speech codecs comparison.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1529547 From Maskee to Audible Noise in Perceptual Speech Enhancement
Authors: Asmaa Amehraye, Dominique Pastor, Ahmed Tamtaoui, Driss Aboutajdine
Abstract:
A new analysis of perceptual speech enhancement is presented. It focuses on the fact that if only noise above the masking threshold is filtered, then noise below the masking threshold, but above the absolute threshold of hearing, can become audible after the masker filtering. This particular drawback of some perceptual filters, hereafter called the maskee-to-audible-noise (MAN) phenomenon, favours the emergence of isolated tonals that increase musical noise. Two filtering techniques that avoid or correct the MAN phenomenon are proposed to effectively suppress background noise without introducing much distortion. Experimental results, including objective and subjective measurements, show that these techniques improve the enhanced speech quality and the gain they bring emphasizes the importance of the MAN phenomenon.Keywords: Perceptual speech filtering, maskee to audible noise, distorsion, musical noise.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1491546 Effect of Visual Speech in Sign Speech Synthesis
Authors: Zdenek Krnoul
Abstract:
This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips. Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech. Proposed methodology is inspected on Czech speech. Hence, this article presents a procedure of recording of speech data in order to set a synthesis system as well as to evaluate synthesized speech. Furthermore, one option of the evaluation process is elaborated in the form of a perceptual test. This test procedure is verified on the measured data with two settings of the synthesis system. The results of the perceptual test are presented as a statistically significant increase of intelligibility evoked by real and synthesized visual speech. Now, the aim is to show one part of evaluation process which leads to more comprehensive evaluation of the sign speech synthesis system.
Keywords: Perception test, Sign speech synthesis, Talking head, Visual speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1476545 A High Quality Speech Coder at 600 bps
Authors: Yong Zhang, Ruimin Hu
Abstract:
This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.
Keywords: Speech coding, Vector quantization, linear predicition, Mixed sinusoidal excitation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2187544 Speech Coding and Recognition
Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha
Abstract:
This paper investigates the performance of a speech recognizer in an interactive voice response system for various coded speech signals, coded by using a vector quantization technique namely Multi Switched Split Vector Quantization Technique. The process of recognizing the coded output can be used in Voice banking application. The recognition technique used for the recognition of the coded speech signals is the Hidden Markov Model technique. The spectral distortion performance, computational complexity, and memory requirements of Multi Switched Split Vector Quantization Technique and the performance of the speech recognizer at various bit rates have been computed. From results it is found that the speech recognizer is showing better performance at 24 bits/frame and it is found that the percentage of recognition is being varied from 100% to 93.33% for various bit rates.Keywords: Linear predictive coding, Speech Recognition, Voice banking, Multi Switched Split Vector Quantization, Hidden Markov Model, Linear Predictive Coefficients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1844543 Voice Features as the Diagnostic Marker of Autism
Authors: Elena Lyakso, Olga Frolova, Yuri Matveev
Abstract:
The aim of the study is to determine the acoustic features of voice and speech of children with autism spectrum disorders (ASD) as a possible additional diagnostic criterion. The participants in the study were 95 children with ASD aged 5-16 years, 150 typically development (TD) children, and 103 adults – listening to children’s speech samples. Three types of experimental methods for speech analysis were performed: spectrographic, perceptual by listeners, and automatic recognition. In the speech of children with ASD, the pitch values, pitch range, values of frequency and intensity of the third formant (emotional) leading to the “atypical” spectrogram of vowels are higher than corresponding parameters in the speech of TD children. High values of vowel articulation index (VAI) are specific for ASD children’s speech signals. These acoustic features can be considered as diagnostic marker of autism. The ability of humans and automatic recognition of the psychoneurological state of children via their speech is determined.
Keywords: Autism spectrum disorders, biomarker of autism, child speech, voice features.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 619542 On the Effectivity of Different Pseudo-Noise and Orthogonal Sequences for Speech Encryption from Correlation Properties
Authors: V. Anil Kumar, Abhijit Mitra, S. R. Mahadeva Prasanna
Abstract:
We analyze the effectivity of different pseudo noise (PN) and orthogonal sequences for encrypting speech signals in terms of perceptual intelligence. Speech signal can be viewed as sequence of correlated samples and each sample as sequence of bits. The residual intelligibility of the speech signal can be reduced by removing the correlation among the speech samples. PN sequences have random like properties that help in reducing the correlation among speech samples. The mean square aperiodic auto-correlation (MSAAC) and the mean square aperiodic cross-correlation (MSACC) measures are used to test the randomness of the PN sequences. Results of the investigation show the effectivity of large Kasami sequences for this purpose among many PN sequences.
Keywords: Speech encryption, pseudo-noise codes, maximallength, Gold, Barker, Kasami, Walsh-Hadamard, autocorrelation, crosscorrelation, figure of merit.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2040541 Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System
Authors: Yueming Wang, Rendong Ying, Sumxin Jiang, Peilin Liu
Abstract:
Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.
Keywords: ID3 Decision Tree, MFCC, Orchestra/Percussion Classification, USAC
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673540 Speech Data Compression using Vector Quantization
Authors: H. B. Kekre, Tanuja K. Sarode
Abstract:
Mostly transforms are used for speech data compressions which are lossy algorithms. Such algorithms are tolerable for speech data compression since the loss in quality is not perceived by the human ear. However the vector quantization (VQ) has a potential to give more data compression maintaining the same quality. In this paper we propose speech data compression algorithm using vector quantization technique. We have used VQ algorithms LBG, KPE and FCG. The results table shows computational complexity of these three algorithms. Here we have introduced a new performance parameter Average Fractional Change in Speech Sample (AFCSS). Our FCG algorithm gives far better performance considering mean absolute error, AFCSS and complexity as compared to others.Keywords: Vector Quantization, Data Compression, Encoding, , Speech coding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2402539 Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems
Authors: К. R. Aida–Zade, C. Ardil, S. S. Rustamov
Abstract:
Statement of the automatic speech recognition problem, the assignment of speech recognition and the application fields are shown in the paper. At the same time as Azerbaijan speech, the establishment principles of speech recognition system and the problems arising in the system are investigated. The computing algorithms of speech features, being the main part of speech recognition system, are analyzed. From this point of view, the determination algorithms of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features are developed. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested to improve the reliability of speech recognition system. To this end, the recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. This results in decrease of error rate during recognition. The training and recognition processes are realized by artificial neural networks in the automatic speech recognition system. The neural networks are trained by the conjugate gradient method. In the paper the problems observed by the number of speech features at training the neural networks of MFCC and LPC-based speech recognition subsystems are investigated. The variety of results of neural networks trained from different initial points in training process is analyzed. Methodology of combined use of neural networks trained from different initial points in speech recognition system is suggested to improve the reliability of recognition system and increase the recognition quality, and obtained practical results are shown.Keywords: Speech recognition, cepstral analysis, Voice activation detection algorithm, Mel Frequency Cepstral Coefficients, features of speech, Cepstral Mean Subtraction, neural networks, Linear Predictive Coding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 912538 Perceptual JPEG Compliant Coding by Using DCT-Based Visibility Thresholds of Color Images
Authors: Kuo-Cheng Liu
Abstract:
Effective estimation of just noticeable distortion (JND) for images is helpful to increase the efficiency of a compression algorithm in which both the statistical redundancy and the perceptual redundancy should be accurately removed. In this paper, we design a DCT-based model for estimating JND profiles of color images. Based on a mathematical model of measuring the base detection threshold for each DCT coefficient in the color component of color images, the luminance masking adjustment, the contrast masking adjustment, and the cross masking adjustment are utilized for luminance component, and the variance-based masking adjustment based on the coefficient variation in the block is proposed for chrominance components. In order to verify the proposed model, the JND estimator is incorporated into the conventional JPEG coder to improve the compression performance. A subjective and fair viewing test is designed to evaluate the visual quality of the coding image under the specified viewing condition. The simulation results show that the JPEG coder integrated with the proposed DCT-based JND model gives better coding bit rates at visually lossless quality for a variety of color images.
Keywords: Just-noticeable distortion (JND), discrete cosine transform (DCT), JPEG.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2581537 Multi Switched Split Vector Quantization of Narrowband Speech Signals
Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha
Abstract:
Vector quantization is a powerful tool for speech coding applications. This paper deals with LPC Coding of speech signals which uses a new technique called Multi Switched Split Vector Quantization (MSSVQ), which is a hybrid of Multi, switched, split vector quantization techniques. The spectral distortion performance, computational complexity, and memory requirements of MSSVQ are compared to split vector quantization (SVQ), multi stage vector quantization(MSVQ) and switched split vector quantization (SSVQ) techniques. It has been proved from results that MSSVQ has better spectral distortion performance, lower computational complexity and lower memory requirements when compared to all the above mentioned product code vector quantization techniques. Computational complexity is measured in floating point operations (flops), and memory requirements is measured in (floats).Keywords: Linear predictive Coding, Multi stage vectorquantization, Switched Split vector quantization, Split vectorquantization, Line Spectral Frequencies (LSF).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1671536 A Perceptually Optimized Wavelet Embedded Zero Tree Image Coder
Authors: A. Bajit, M. Nahid, A. Tamtaoui, E. H. Bouyakhf
Abstract:
In this paper, we propose a Perceptually Optimized Embedded ZeroTree Image Coder (POEZIC) that introduces a perceptual weighting to wavelet transform coefficients prior to control SPIHT encoding algorithm in order to reach a targeted bit rate with a perceptual quality improvement with respect to the coding quality obtained using the SPIHT algorithm only. The paper also, introduces a new objective quality metric based on a Psychovisual model that integrates the properties of the HVS that plays an important role in our POEZIC quality assessment. Our POEZIC coder is based on a vision model that incorporates various masking effects of human visual system HVS perception. Thus, our coder weights the wavelet coefficients based on that model and attempts to increase the perceptual quality for a given bit rate and observation distance. The perceptual weights for all wavelet subbands are computed based on 1) luminance masking and Contrast masking, 2) the contrast sensitivity function CSF to achieve the perceptual decomposition weighting, 3) the Wavelet Error Sensitivity WES used to reduce the perceptual quantization errors. The new perceptually optimized codec has the same complexity as the original SPIHT techniques. However, the experiments results show that our coder demonstrates very good performance in terms of quality measurement.
Keywords: DWT, linear-phase 9/7 filter, 9/7 Wavelets Error Sensitivity WES, CSF implementation approaches, JND Just Noticeable Difference, Luminance masking, Contrast masking, standard SPIHT, Objective Quality Measure, Probability Score PS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2050535 Coding of DWT Coefficients using Run-length Coding and Huffman Coding for the Purpose of Color Image Compression
Authors: Varun Setia, Vinod Kumar
Abstract:
In present paper we proposed a simple and effective method to compress an image. Here we found success in size reduction of an image without much compromising with it-s quality. Here we used Haar Wavelet Transform to transform our original image and after quantization and thresholding of DWT coefficients Run length coding and Huffman coding schemes have been used to encode the image. DWT is base for quite populate JPEG 2000 technique.
Keywords: Lossy compression, DWT, quantization, Run length coding, Huffman coding, JPEG2000.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2922534 Subjective Evaluation of Spectral and Time Domain Cascading Algorithm for Speech Enhancement for Mobile Communication
Authors: Harish Chander, Balwinder Singh, Ravinder Khanna
Abstract:
In this paper, we present the comparative subjective analysis of Improved Minima Controlled Recursive Averaging (IMCRA) Algorithm, the Kalman filter and the cascading of IMCRA and Kalman filter algorithms. Performance of speech enhancement algorithms can be predicted in two different ways. One is the objective method of evaluation in which the speech quality parameters are predicted computationally. The second is a subjective listening test in which the processed speech signal is subjected to the listeners who judge the quality of speech on certain parameters. The comparative objective evaluation of these algorithms was analyzed in terms of Global SNR, Segmental SNR and Perceptual Evaluation of Speech Quality (PESQ) by the authors and it was reported that with cascaded algorithms there is a substantial increase in objective parameters. Since subjective evaluation is the real test to judge the quality of speech enhancement algorithms, the authenticity of superiority of cascaded algorithms over individual IMCRA and Kalman algorithms is tested through subjective analysis in this paper. The results of subjective listening tests have confirmed that the cascaded algorithms perform better under all types of noise conditions.
Keywords: Speech enhancement, spectral domain, time domain, PESQ, subjective analysis, objective analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1231533 Near-Lossless Image Coding based on Orthogonal Polynomials
Authors: Krishnamoorthy R, Rajavijayalakshmi K, Punidha R
Abstract:
In this paper, a near lossless image coding scheme based on Orthogonal Polynomials Transform (OPT) has been presented. The polynomial operators and polynomials basis operators are obtained from set of orthogonal polynomials functions for the proposed transform coding. The image is partitioned into a number of distinct square blocks and the proposed transform coding is applied to each of these individually. After applying the proposed transform coding, the transformed coefficients are rearranged into a sub-band structure. The Embedded Zerotree (EZ) coding algorithm is then employed to quantize the coefficients. The proposed transform is implemented for various block sizes and the performance is compared with existing Discrete Cosine Transform (DCT) transform coding scheme.Keywords: Near-lossless Coding, Orthogonal Polynomials Transform, Embedded Zerotree Coding
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1944532 A Perceptually Optimized Foveation Based Wavelet Embedded Zero Tree Image Coding
Authors: A. Bajit, M. Nahid, A. Tamtaoui, E. H. Bouyakhf
Abstract:
In this paper, we propose a Perceptually Optimized Foveation based Embedded ZeroTree Image Coder (POEFIC) that introduces a perceptual weighting to wavelet coefficients prior to control SPIHT encoding algorithm in order to reach a targeted bit rate with a perceptual quality improvement with respect to a given bit rate a fixation point which determines the region of interest ROI. The paper also, introduces a new objective quality metric based on a Psychovisual model that integrates the properties of the HVS that plays an important role in our POEFIC quality assessment. Our POEFIC coder is based on a vision model that incorporates various masking effects of human visual system HVS perception. Thus, our coder weights the wavelet coefficients based on that model and attempts to increase the perceptual quality for a given bit rate and observation distance. The perceptual weights for all wavelet subbands are computed based on 1) foveation masking to remove or reduce considerable high frequencies from peripheral regions 2) luminance and Contrast masking, 3) the contrast sensitivity function CSF to achieve the perceptual decomposition weighting. The new perceptually optimized codec has the same complexity as the original SPIHT techniques. However, the experiments results show that our coder demonstrates very good performance in terms of quality measurement.
Keywords: DWT, linear-phase 9/7 filter, Foveation Filtering, CSF implementation approaches, 9/7 Wavelet JND Thresholds and Wavelet Error Sensitivity WES, Luminance and Contrast masking, standard SPIHT, Objective Quality Measure, Probability Score PS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1794531 A Perceptual Image Coding method of High Compression Rate
Authors: Fahmi Kammoun, Mohamed Salim Bouhlel
Abstract:
In the framework of the image compression by Wavelet Transforms, we propose a perceptual method by incorporating Human Visual System (HVS) characteristics in the quantization stage. Indeed, human eyes haven-t an equal sensitivity across the frequency bandwidth. Therefore, the clarity of the reconstructed images can be improved by weighting the quantization according to the Contrast Sensitivity Function (CSF). The visual artifact at low bit rate is minimized. To evaluate our method, we use the Peak Signal to Noise Ratio (PSNR) and a new evaluating criteria witch takes into account visual criteria. The experimental results illustrate that our technique shows improvement on image quality at the same compression ratio.Keywords: Contrast Sensitivity Function, Human Visual System, Image compression, Wavelet transforms.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1873530 Computationally Efficient Signal Quality Improvement Method for VoIP System
Authors: H. P. Singh, S. Singh
Abstract:
The voice signal in Voice over Internet protocol (VoIP) system is processed through the best effort policy based IP network, which leads to the network degradations including delay, packet loss jitter. The work in this paper presents the implementation of finite impulse response (FIR) filter for voice quality improvement in the VoIP system through distributed arithmetic (DA) algorithm. The VoIP simulations are conducted with AMR-NB 6.70 kbps and G.729a speech coders at different packet loss rates and the performance of the enhanced VoIP signal is evaluated using the perceptual evaluation of speech quality (PESQ) measurement for narrowband signal. The results show reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal.
Keywords: VoIP, Signal Quality, Distributed Arithmetic, Packet Loss, Speech Coder.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1830529 Peak-to-Average Power Ratio Reduction in OFDM Systems using Huffman Coding
Authors: Ashraf A. Eltholth, Adel R. Mikhail, A. Elshirbini, Moawad I. Moawad, A. I. Abdelfattah
Abstract:
In this paper we proposed the use of Huffman coding to reduce the PAR of an OFDM system as a distortionless scrambling technique, and we utilize the amount saved in the total bit rate by the Huffman coding to send the encoding table for accurate decoding at the receiver without reducing the effective throughput. We found that the use of Huffman coding reduces the PAR by about 6 dB. Also we have investigated the effect of PAR reduction due to Huffman coding through testing the spectral spreading and the inband distortion due to HPA with different IBO values. We found a complete match of our expectation from the proposed solution with the obtained simulation results.Keywords: HPA, Huffman coding, OFDM, PAR
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2596528 Watermark Bit Rate in Diverse Signal Domains
Authors: Nedeljko Cvejic, Tapio Sepp
Abstract:
A study of the obtainable watermark data rate for information hiding algorithms is presented in this paper. As the perceptual entropy for wideband monophonic audio signals is in the range of four to five bits per sample, a significant amount of additional information can be inserted into signal without causing any perceptual distortion. Experimental results showed that transform domain watermark embedding outperforms considerably watermark embedding in time domain and that signal decompositions with a high gain of transform coding, like the wavelet transform, are the most suitable for high data rate information hiding. Keywords?Digital watermarking, information hiding, audio watermarking, watermark data rate.
Keywords: Digital watermarking, information hiding, audio watermarking, watermark data rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1627527 Eukaryotic Gene Prediction by an Investigation of Nonlinear Dynamical Modeling Techniques on EIIP Coded Sequences
Authors: Mai S. Mabrouk, Nahed H. Solouma, Abou-Bakr M. Youssef, Yasser M. Kadah
Abstract:
Many digital signal processing, techniques have been used to automatically distinguish protein coding regions (exons) from non-coding regions (introns) in DNA sequences. In this work, we have characterized these sequences according to their nonlinear dynamical features such as moment invariants, correlation dimension, and largest Lyapunov exponent estimates. We have applied our model to a number of real sequences encoded into a time series using EIIP sequence indicators. In order to discriminate between coding and non coding DNA regions, the phase space trajectory was first reconstructed for coding and non-coding regions. Nonlinear dynamical features are extracted from those regions and used to investigate a difference between them. Our results indicate that the nonlinear dynamical characteristics have yielded significant differences between coding (CR) and non-coding regions (NCR) in DNA sequences. Finally, the classifier is tested on real genes where coding and non-coding regions are well known.
Keywords: Gene prediction, nonlinear dynamics, correlation dimension, Lyapunov exponent.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1824526 Automatic Recognition of Emotionally Coloured Speech
Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou
Abstract:
Emotion in speech is an issue that has been attracting the interest of the speech community for many years, both in the context of speech synthesis as well as in automatic speech recognition (ASR). In spite of the remarkable recent progress in Large Vocabulary Recognition (LVR), it is still far behind the ultimate goal of recognising free conversational speech uttered by any speaker in any environment. Current experimental tests prove that using state of the art large vocabulary recognition systems the error rate increases substantially when applied to spontaneous/emotional speech. This paper shows that recognition rate for emotionally coloured speech can be improved by using a language model based on increased representation of emotional utterances.Keywords: Statistical language model, N-grams, emotionallycoloured speech
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617525 MIM: A Species Independent Approach for Classifying Coding and Non-Coding DNA Sequences in Bacterial and Archaeal Genomes
Authors: Achraf El Allali, John R. Rose
Abstract:
A number of competing methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on mutual information has shown good accuracy in classifying DNA sequences into coding and noncoding. In this paper we describe a species independent iterative approach that distinguishes coding from non-coding sequences using the mutual information measure (MIM). A set of sixty prokaryotes is used to extract universal training data. To facilitate comparisons with the published results of other researchers, a test set of 51 bacterial and archaeal genomes was used to evaluate MIM. These results demonstrate that MIM produces superior results while remaining species independent.Keywords: Coding Non-coding Classification, Entropy, GeneRecognition, Mutual Information.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1726524 Speaker Identification using Neural Networks
Authors: R.V Pawar, P.P.Kajave, S.N.Mali
Abstract:
The speech signal conveys information about the identity of the speaker. The area of speaker identification is concerned with extracting the identity of the person speaking the utterance. As speech interaction with computers becomes more pervasive in activities such as the telephone, financial transactions and information retrieval from speech databases, the utility of automatically identifying a speaker is based solely on vocal characteristic. This paper emphasizes on text dependent speaker identification, which deals with detecting a particular speaker from a known population. The system prompts the user to provide speech utterance. System identifies the user by comparing the codebook of speech utterance with those of the stored in the database and lists, which contain the most likely speakers, could have given that speech utterance. The speech signal is recorded for N speakers further the features are extracted. Feature extraction is done by means of LPC coefficients, calculating AMDF, and DFT. The neural network is trained by applying these features as input parameters. The features are stored in templates for further comparison. The features for the speaker who has to be identified are extracted and compared with the stored templates using Back Propogation Algorithm. Here, the trained network corresponds to the output; the input is the extracted features of the speaker to be identified. The network does the weight adjustment and the best match is found to identify the speaker. The number of epochs required to get the target decides the network performance.Keywords: Average Mean Distance function, Backpropogation, Linear Predictive Coding, MultilayeredPerceptron,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892523 NewPerceptual Organization within Temporal Displacement
Authors: Michele Sinico
Abstract:
The psychological present has an actual extension. When a sequence of instantaneous stimuli falls in this short interval of time, observers perceive a compresence of events in succession and the temporal order depends on the qualitative relationships between the perceptual properties of the events. Two experiments were carried out to study the influence of perceptual grouping, with and without temporal displacement, on the duration of auditory sequences. The psychophysical method of adjustment was adopted. The first experiment investigated the effect of temporal displacement of a white noise on sequence duration. The second experiment investigated the effect of temporal displacement, along the pitch dimension, on temporal shortening of sequence. The results suggest that the temporal order of sounds, in the case of temporal displacement, is organized along the pitch dimension.Keywords: Time perception, perceptual present, temporal displacement, gestalt laws of perceptual organization
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 808522 Parallel Joint Channel Coding and Cryptography
Authors: Nataša Živić, Christoph Ruland
Abstract:
Method of Parallel Joint Channel Coding and Cryptography has been analyzed and simulated in this paper. The method is an extension of Soft Input Decryption with feedback, which is used for improvement of channel decoding of secured messages. Parallel Joint Channel Coding and Cryptography results in improved coding gain of channel decoding, which achieves more than 2 dB. Such results are an implication of a combination of receiver components and their interoperability.Keywords: Block length, Coding gain, Feedback, L-values, Parallel Joint Channel Coding and Cryptography, Soft Input Decryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1584521 Enhanced Frame-based Video Coding to Support Content-based Functionalities
Authors: Prabhudev Hosur, Rolando Carrasco
Abstract:
This paper presents the enhanced frame-based video coding scheme. The input source video to the enhanced frame-based video encoder consists of a rectangular-size video and shapes of arbitrarily-shaped objects on video frames. The rectangular frame texture is encoded by the conventional frame-based coding technique and the video object-s shape is encoded using the contour-based vertex coding. It is possible to achieve several useful content-based functionalities by utilizing the shape information in the bitstream at the cost of a very small overhead to the bitrate.
Keywords: Video coding, content-based, hyper video, interactivity, shape coding, polygon.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1661520 The Main Principles of Text-to-Speech Synthesis System
Authors: K.R. Aida–Zade, C. Ardil, A.M. Sharifova
Abstract:
In this paper, the main principles of text-to-speech synthesis system are presented. Associated problems which arise when developing speech synthesis system are described. Used approaches and their application in the speech synthesis systems for Azerbaijani language are shown.
Keywords: synthesis of Azerbaijani language, morphemes, phonemes, sounds, sentence, speech synthesizer, intonation, accent, pronunciation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5652519 TeleMe Speech Booster: Web-Based Speech Therapy and Training Program for Children with Articulation Disorders
Authors: C. Treerattanaphan, P. Boonpramuk, P. Singla
Abstract:
Frequent, continuous speech training has proven to be a necessary part of a successful speech therapy process, but constraints of traveling time and employment dispensation become key obstacles especially for individuals living in remote areas or for dependent children who have working parents. In order to ameliorate speech difficulties with ample guidance from speech therapists, a website has been developed that supports speech therapy and training for people with articulation disorders in the standard Thai language. This web-based program has the ability to record speech training exercises for each speech trainee. The records will be stored in a database for the speech therapist to investigate, evaluate, compare and keep track of all trainees’ progress in detail. Speech trainees can request live discussions via video conference call when needed. Communication through this web-based program facilitates and reduces training time in comparison to walk-in training or appointments. This type of training also allows people with articulation disorders to practice speech lessons whenever or wherever is convenient for them, which can lead to a more regular training processes.
Keywords: Web-Based Remote Training Program, Thai Speech Therapy, Articulation Disorders.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1859