Search results for: development of coherent speech
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4364

Search results for: development of coherent speech

4334 On Preprocessing of Speech Signals

Authors: Ayaz Keerio, Bhargav Kumar Mitra, Philip Birch, Rupert Young, Chris Chatwin

Abstract:

Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.

Keywords: STE based methods, Mahalanobis distance, 3 edit σ rule, Hampel Identifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1654
4333 Effect of Atmospheric Turbulence on AcquisitionTime of Ground to Deep Space Optical Communication System

Authors: Hemani Kaushal, V.K.Jain, Subrat Kar

Abstract:

The performance of ground to deep space optical communication systems is degraded by distortion of the beam as it propagates through the turbulent atmosphere. Turbulence causes fluctuations in the intensity of the received signal which ultimately affects the acquisition time required to acquire and locate the spaceborne target using narrow laser beam. In this paper, performance of free-space optical (FSO) communication system in atmospheric turbulence has been analyzed in terms of acquisition time for coherent and non-coherent modulation schemes. Numerical results presented in graphical and tabular forms show that the acquisition time increases with the increase in turbulence level. This is true for both schemes. The BPSK has lowest acquisition time among all schemes. In non-coherent schemes, M-PPM performs better than the other schemes. With the increase in M, acquisition time becomes lower, but at the cost of increase in system complexity.

Keywords: Atmospheric Turbulence, Acquisition Time, BinaryPhase Shift Keying (BPSK), Free-Space Optical (FSO)Communication System, M-ary Pulse Position Modulation (M-PPM), Coherent/Non-coherent Modulation Schemes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1731
4332 The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition

Authors: Fawaz S. Al-Anzi, Dia AbuZeina

Abstract:

Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature extraction process, which parameterizes the speech signal to produce the corresponding feature vectors. Feature extraction process aims at approximating the linguistic content that is conveyed by the input speech signal. In speech processing field, there are several methods to extract speech features, however, Mel Frequency Cepstral Coefficients (MFCC) is the popular technique. It has been long observed that the MFCC is dominantly used in the well-known recognizers such as the Carnegie Mellon University (CMU) Sphinx and the Markov Model Toolkit (HTK). Hence, this paper focuses on the MFCC method as the standard choice to identify the different speech segments in order to obtain the language phonemes for further training and decoding steps. Due to MFCC good performance, the previous studies show that the MFCC dominates the Arabic ASR research. In this paper, we demonstrate MFCC as well as the intermediate steps that are performed to get these coefficients using the HTK toolkit.

Keywords: Speech recognition, acoustic features, Mel Frequency Cepstral Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1923
4331 A Sparse Representation Speech Denoising Method Based on Adapted Stopping Residue Error

Authors: Qianhua He, Weili Zhou, Aiwu Chen

Abstract:

A sparse representation speech denoising method based on adapted stopping residue error was presented in this paper. Firstly, the cross-correlation between the clean speech spectrum and the noise spectrum was analyzed, and an estimation method was proposed. In the denoising method, an over-complete dictionary of the clean speech power spectrum was learned with the K-singular value decomposition (K-SVD) algorithm. In the sparse representation stage, the stopping residue error was adaptively achieved according to the estimated cross-correlation and the adjusted noise spectrum, and the orthogonal matching pursuit (OMP) approach was applied to reconstruct the clean speech spectrum from the noisy speech. Finally, the clean speech was re-synthesised via the inverse Fourier transform with the reconstructed speech spectrum and the noisy speech phase. The experiment results show that the proposed method outperforms the conventional methods in terms of subjective and objective measure.

Keywords: Speech denoising, sparse representation, K-singular value decomposition, orthogonal matching pursuit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 970
4330 Eisenhower’s Farewell Speech: Initial and Continuing Communication Effects

Authors: B. Kuiper

Abstract:

When Dwight D. Eisenhower delivered his final Presidential speech in 1961, he was using the opportunity to bid farewell to America, but he was also trying to warn his fellow countrymen about deeper challenges threatening the country. In this analysis, Eisenhower’s speech is examined in light of the impact it had on American culture, communication concepts, and political ramifications. The paper initially highlights the previous literature on the speech, especially in light of its 50th anniversary, and reveals a man whose main concern was how the speech’s words would affect his beloved country. The painstaking approach to the wording of the speech to reveal the intent is key, particularly in light of analyzing the motivations according to “virtuous communication.” This philosophical construct indicates that Eisenhower’s Farewell Address was crafted carefully according to a departing President’s deepest values and concerns, concepts that he wanted to pass along to his successor, to his country, and even to the world.

Keywords: Eisenhower, mass communication, political speech, rhetoric.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1818
4329 Hybrid Modeling Algorithm for Continuous Tamil Speech Recognition

Authors: M. Kalamani, S. Valarmathy, M. Krishnamoorthi

Abstract:

In this paper, Fuzzy C-Means clustering with Expectation Maximization-Gaussian Mixture Model based hybrid modeling algorithm is proposed for Continuous Tamil Speech Recognition. The speech sentences from various speakers are used for training and testing phase and objective measures are between the proposed and existing Continuous Speech Recognition algorithms. From the simulated results, it is observed that the proposed algorithm improves the recognition accuracy and F-measure up to 3% as compared to that of the existing algorithms for the speech signal from various speakers. In addition, it reduces the Word Error Rate, Error Rate and Error up to 4% as compared to that of the existing algorithms. In all aspects, the proposed hybrid modeling for Tamil speech recognition provides the significant improvements for speechto- text conversion in various applications.

Keywords: Speech Segmentation, Feature Extraction, Clustering, HMM, EM-GMM, CSR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2094
4328 Neural Network Based Speech to Text in Malay Language

Authors: H. F. A. Abdul Ghani, R. R. Porle

Abstract:

Speech to text in Malay language is a system that converts Malay speech into text. The Malay language recognition system is still limited, thus, this paper aims to investigate the performance of ten Malay words obtained from the online Malay news. The methodology consists of three stages, which are preprocessing, feature extraction, and speech classification. In preprocessing stage, the speech samples are filtered using pre emphasis. After that, feature extraction method is applied to the samples using Mel Frequency Cepstrum Coefficient (MFCC). Lastly, speech classification is performed using Feedforward Neural Network (FFNN). The accuracy of the classification is further investigated based on the hidden layer size. From experimentation, the classifier with 40 hidden neurons shows the highest classification rate which is 94%.  

Keywords: Feed-Forward Neural Network, FFNN, Malay speech recognition, Mel Frequency Cepstrum Coefficient, MFCC, speech-to-text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 669
4327 On the Effectivity of Different Pseudo-Noise and Orthogonal Sequences for Speech Encryption from Correlation Properties

Authors: V. Anil Kumar, Abhijit Mitra, S. R. Mahadeva Prasanna

Abstract:

We analyze the effectivity of different pseudo noise (PN) and orthogonal sequences for encrypting speech signals in terms of perceptual intelligence. Speech signal can be viewed as sequence of correlated samples and each sample as sequence of bits. The residual intelligibility of the speech signal can be reduced by removing the correlation among the speech samples. PN sequences have random like properties that help in reducing the correlation among speech samples. The mean square aperiodic auto-correlation (MSAAC) and the mean square aperiodic cross-correlation (MSACC) measures are used to test the randomness of the PN sequences. Results of the investigation show the effectivity of large Kasami sequences for this purpose among many PN sequences.

Keywords: Speech encryption, pseudo-noise codes, maximallength, Gold, Barker, Kasami, Walsh-Hadamard, autocorrelation, crosscorrelation, figure of merit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2000
4326 On Developing an Automatic Speech Recognition System for Standard Arabic Language

Authors: R. Walha, F. Drira, H. El-Abed, A. M. Alimi

Abstract:

The Automatic Speech Recognition (ASR) applied to Arabic language is a challenging task. This is mainly related to the language specificities which make the researchers facing multiple difficulties such as the insufficient linguistic resources and the very limited number of available transcribed Arabic speech corpora. In this paper, we are interested in the development of a HMM-based ASR system for Standard Arabic (SA) language. Our fundamental research goal is to select the most appropriate acoustic parameters describing each audio frame, acoustic models and speech recognition unit. To achieve this purpose, we analyze the effect of varying frame windowing (size and period), acoustic parameter number resulting from features extraction methods traditionally used in ASR, speech recognition unit, Gaussian number per HMM state and number of embedded re-estimations of the Baum-Welch Algorithm. To evaluate the proposed ASR system, a multi-speaker SA connected-digits corpus is collected, transcribed and used throughout all experiments. A further evaluation is conducted on a speaker-independent continue SA speech corpus. The phonemes recognition rate is 94.02% which is relatively high when comparing it with another ASR system evaluated on the same corpus.

Keywords: ASR, HMM, acoustical analysis, acoustic modeling, Standard Arabic language

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1732
4325 A Novel RLS Based Adaptive Filtering Method for Speech Enhancement

Authors: Pogula Rakesh, T. Kishore Kumar

Abstract:

Speech enhancement is a long standing problem with numerous applications like teleconferencing, VoIP, hearing aids and speech recognition. The motivation behind this research work is to obtain a clean speech signal of higher quality by applying the optimal noise cancellation technique. Real-time adaptive filtering algorithms seem to be the best candidate among all categories of the speech enhancement methods. In this paper, we propose a speech enhancement method based on Recursive Least Squares (RLS) adaptive filter of speech signals. Experiments were performed on noisy data which was prepared by adding AWGN, Babble and Pink noise to clean speech samples at -5dB, 0dB, 5dB and 10dB SNR levels. We then compare the noise cancellation performance of proposed RLS algorithm with existing NLMS algorithm in terms of Mean Squared Error (MSE), Signal to Noise ratio (SNR) and SNR Loss. Based on the performance evaluation, the proposed RLS algorithm was found to be a better optimal noise cancellation technique for speech signals.

Keywords: Adaptive filter, Adaptive Noise Canceller, Mean Squared Error, Noise reduction, NLMS, RLS, SNR, SNR Loss.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3134
4324 A Modified Speech Enhancement Using Adaptive Gain Equalizer with Non linear Spectral Subtraction for Robust Speech Recognition

Authors: C. Ganesh Babu, P. T. Vanathi

Abstract:

In this paper we present an enhanced noise reduction method for robust speech recognition using Adaptive Gain Equalizer with Non linear Spectral Subtraction. In Adaptive Gain Equalizer method (AGE), the input signal is divided into a number of subbands that are individually weighed in time domain, in accordance to the short time Signal-to-Noise Ratio (SNR) in each subband estimation at every time instant. Instead of focusing on suppression the noise on speech enhancement is focused. When analysis was done under various noise conditions for speech recognition, it was found that Adaptive Gain Equalizer method algorithm has an obvious failing point for a SNR of -5 dB, with inadequate levels of noise suppression for SNR less than this point. This work proposes the implementation of AGE when coupled with Non linear Spectral Subtraction (AGE-NSS) for robust speech recognition. The experimental result shows that out AGE-NSS performs the AGE when SNR drops below -5db level.

Keywords: Adaptive Gain Equalizer, Non Linear Spectral Subtraction, Speech Enhancement, and Speech Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1663
4323 Speech Acts and Politeness Strategies in an EFL Classroom in Georgia

Authors: Tinatin Kurdghelashvili

Abstract:

The paper deals with the usage of speech acts and politeness strategies in an EFL classroom in Georgia (Rep of). It explores the students’ and the teachers’ practice of the politeness strategies and the speech acts of apology, thanking, request, compliment / encouragement, command, agreeing / disagreeing, addressing and code switching. The research method includes observation as well as a questionnaire. The target group involves the students from Georgian public schools and two certified, experienced local English teachers. The analysis is based on Searle’s Speech Act Theory and Brown and Levinson’s politeness strategies. The findings show that the students have certain knowledge regarding politeness yet they fail to apply them in English communication. In addition, most of the speech acts from the classroom interaction are used by the teachers and not the students. Thereby, it is suggested that teachers should cultivate the students’ communicative competence and attempt to give them opportunities to practise more English speech acts than they do today.

Keywords: English as a foreign language, Georgia, politeness principles, speech acts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6116
4322 Preliminary Study of the Phonological Development in Three- and Four-Year-Old Bulgarian Children

Authors: Tsvetomira Braynova, Miglena Simonska

Abstract:

The article presents the results of a research of phonological processes in three- and four-year-old children. A test, created for the purpose of the study, was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the research is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, elision of sound, metathesis of sound, elision of syllable, elision of consonants clustered in a syllable. Measuring the correlation between average length of repeated speech and average length of generated speech, the analysis does not prove that the more words a child can repeat in part “repeated speech”, the more words they can be expected to generate in part “generating sentence”. The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.

Keywords: Articulation, phonology, speech, language development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 305
4321 Voice Driven Applications in Non-stationary and Chaotic Environment

Authors: C. Kwan, X. Li, D. Lao, Y. Deng, Z. Ren, B. Raj, R. Singh, R. Stern

Abstract:

Automated operations based on voice commands will become more and more important in many applications, including robotics, maintenance operations, etc. However, voice command recognition rates drop quite a lot under non-stationary and chaotic noise environments. In this paper, we tried to significantly improve the speech recognition rates under non-stationary noise environments. First, 298 Navy acronyms have been selected for automatic speech recognition. Data sets were collected under 4 types of noisy environments: factory, buccaneer jet, babble noise in a canteen, and destroyer. Within each noisy environment, 4 levels (5 dB, 15 dB, 25 dB, and clean) of Signal-to-Noise Ratio (SNR) were introduced to corrupt the speech. Second, a new algorithm to estimate speech or no speech regions has been developed, implemented, and evaluated. Third, extensive simulations were carried out. It was found that the combination of the new algorithm, the proper selection of language model and a customized training of the speech recognizer based on clean speech yielded very high recognition rates, which are between 80% and 90% for the four different noisy conditions. Fourth, extensive comparative studies have also been carried out.

Keywords: Non-stationary, speech recognition, voice commands.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1491
4320 A Semi- One Time Pad Using Blind Source Separation for Speech Encryption

Authors: Long Jye Sheu, Horng-Shing Chiou, Wei Ching Chen

Abstract:

We propose a new perspective on speech communication using blind source separation. The original speech is mixed with key signals which consist of the mixing matrix, chaotic signals and a random noise. However, parts of the keys (the mixing matrix and the random noise) are not necessary in decryption. In practice implement, one can encrypt the speech by changing the noise signal every time. Hence, the present scheme obtains the advantages of a One Time Pad encryption while avoiding its drawbacks in key exchange. It is demonstrated that the proposed scheme is immune against traditional attacks.

Keywords: one time pad, blind source separation, independentcomponent analysis, speech encryption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
4319 Advances in Artificial Intelligence Using Speech Recognition

Authors: Khaled M. Alhawiti

Abstract:

This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.

Keywords: Speech recognition, acoustic phonetic, artificial intelligence, Hidden Markov Models (HMM), statistical models of speech recognition, human machine performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7901
4318 Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns

Authors: Christian Arcos, Marley Vellasco, Abraham Alcaim

Abstract:

In this paper, we present a wavelet coefficients masking based on Local Binary Patterns (WLBP) approach to enhance the temporal spectra of the wavelet coefficients for speech enhancement. This technique exploits the wavelet denoising scheme, which splits the degraded speech into pyramidal subband components and extracts frequency information without losing temporal information. Speech enhancement in each high-frequency subband is performed by binary labels through the local binary pattern masking that encodes the ratio between the original value of each coefficient and the values of the neighbour coefficients. This approach enhances the high-frequency spectra of the wavelet transform instead of eliminating them through a threshold. A comparative analysis is carried out with conventional speech enhancement algorithms, demonstrating that the proposed technique achieves significant improvements in terms of PESQ, an international recommendation of objective measure for estimating subjective speech quality. Informal listening tests also show that the proposed method in an acoustic context improves the quality of speech, avoiding the annoying musical noise present in other speech enhancement techniques. Experimental results obtained with a DNN based speech recognizer in noisy environments corroborate the superiority of the proposed scheme in the robust speech recognition scenario.

Keywords: Binary labels, local binary patterns, mask, wavelet coefficients, speech enhancement, speech recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 967
4317 Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems

Authors: К. R. Aida–Zade, C. Ardil, S. S. Rustamov

Abstract:

Statement of the automatic speech recognition problem, the assignment of speech recognition and the application fields are shown in the paper. At the same time as Azerbaijan speech, the establishment principles of speech recognition system and the problems arising in the system are investigated. The computing algorithms of speech features, being the main part of speech recognition system, are analyzed. From this point of view, the determination algorithms of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features are developed. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested to improve the reliability of speech recognition system. To this end, the recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. This results in decrease of error rate during recognition. The training and recognition processes are realized by artificial neural networks in the automatic speech recognition system. The neural networks are trained by the conjugate gradient method. In the paper the problems observed by the number of speech features at training the neural networks of MFCC and LPC-based speech recognition subsystems are investigated. The variety of results of neural networks trained from different initial points in training process is analyzed. Methodology of combined use of neural networks trained from different initial points in speech recognition system is suggested to improve the reliability of recognition system and increase the recognition quality, and obtained practical results are shown.

Keywords: Speech recognition, cepstral analysis, Voice activation detection algorithm, Mel Frequency Cepstral Coefficients, features of speech, Cepstral Mean Subtraction, neural networks, Linear Predictive Coding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 863
4316 Adaptive Noise Reduction Algorithm for Speech Enhancement

Authors: M. Kalamani, S. Valarmathy, M. Krishnamoorthi

Abstract:

In this paper, Least Mean Square (LMS) adaptive noise reduction algorithm is proposed to enhance the speech signal from the noisy speech. In this, the speech signal is enhanced by varying the step size as the function of the input signal. Objective and subjective measures are made under various noises for the proposed and existing algorithms. From the experimental results, it is seen that the proposed LMS adaptive noise reduction algorithm reduces Mean square Error (MSE) and Log Spectral Distance (LSD) as compared to that of the earlier methods under various noise conditions with different input SNR levels. In addition, the proposed algorithm increases the Peak Signal to Noise Ratio (PSNR) and Segmental SNR improvement (ΔSNRseg) values; improves the Mean Opinion Score (MOS) as compared to that of the various existing LMS adaptive noise reduction algorithms. From these experimental results, it is observed that the proposed LMS adaptive noise reduction algorithm reduces the speech distortion and residual noise as compared to that of the existing methods.

Keywords: LMS, speech enhancement, speech quality, residual noise.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2764
4315 Speech Intelligibility Improvement Using Variable Level Decomposition DWT

Authors: Samba Raju, Chiluveru, Manoj Tripathy

Abstract:

Intelligibility is an essential characteristic of a speech signal, which is used to help in the understanding of information in speech signal. Background noise in the environment can deteriorate the intelligibility of a recorded speech. In this paper, we presented a simple variance subtracted - variable level discrete wavelet transform, which improve the intelligibility of speech. The proposed algorithm does not require an explicit estimation of noise, i.e., prior knowledge of the noise; hence, it is easy to implement, and it reduces the computational burden. The proposed algorithm decides a separate decomposition level for each frame based on signal dominant and dominant noise criteria. The performance of the proposed algorithm is evaluated with speech intelligibility measure (STOI), and results obtained are compared with Universal Discrete Wavelet Transform (DWT) thresholding and Minimum Mean Square Error (MMSE) methods. The experimental results revealed that the proposed scheme outperformed competing methods

Keywords: Discrete Wavelet Transform, speech intelligibility, STOI, standard deviation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 639
4314 Various Speech Processing Techniques For Speech Compression And Recognition

Authors: Jalal Karam

Abstract:

Years of extensive research in the field of speech processing for compression and recognition in the last five decades, resulted in a severe competition among the various methods and paradigms introduced. In this paper we include the different representations of speech in the time-frequency and time-scale domains for the purpose of compression and recognition. The examination of these representations in a variety of related work is accomplished. In particular, we emphasize methods related to Fourier analysis paradigms and wavelet based ones along with the advantages and disadvantages of both approaches.

Keywords: Time-Scale, Wavelets, Time-Frequency, Compression, Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2291
4313 High Quality Speech Coding using Combined Parametric and Perceptual Modules

Authors: M. Kulesza, G. Szwoch, A. Czyżewski

Abstract:

A novel approach to speech coding using the hybrid architecture is presented. Advantages of parametric and perceptual coding methods are utilized together in order to create a speech coding algorithm assuring better signal quality than in traditional CELP parametric codec. Two approaches are discussed. One is based on selection of voiced signal components that are encoded using parametric algorithm, unvoiced components that are encoded perceptually and transients that remain unencoded. The second approach uses perceptual encoding of the residual signal in CELP codec. The algorithm applied for precise transient selection is described. Signal quality achieved using the proposed hybrid codec is compared to quality of some standard speech codecs.

Keywords: CELP residual coding, hybrid codec architecture, perceptual speech coding, speech codecs comparison.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1478
4312 Effect of Cooling Coherent Nozzle Orientation on the Machinability of Ti-6Al-4V in Step Shoulder Milling

Authors: Salah Gariani, Islam Shyha, Osama Elgadi, Khaled Jegandi

Abstract:

In this work, a cooling coherent round nozzle was developed and the impact of nozzle placement (i.e. nozzle angle and stand-off/impinging distance) on the machinability of Ti-6Al-4V was evaluated. Key process measures were cutting force, workpiece temperature, tool wear, burr formation and average surface roughness (Ra). Experimental results showed that nozzle position at a 15° angle in the feed direction and 45°/60° against feed direction assisted in minimising workpiece temperature. A stand-off distance of 55 and 75 mm is also necessary to control burr formation, workpiece temperature and Ra, but coherent nozzle orientation has no statistically significant impact on the mean values of cutting force and tool wear. It can be concluded that stand-off distance is more substantially significant than nozzle angles when step shoulder milling Ti-6Al- 4V using vegetable oil-based cutting fluid.

Keywords: Coherent round nozzle, step shoulder milling, Ti-6Al-4V, vegetable oil-based cutting fluid.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 356
4311 Design of Coherent Thermal Emission Source by Excitation of Magnetic Polaritons between Metallic Gratings and an Opaque Metallic Film

Authors: Samah G. Babiker, Yong Shuai, Mohamed Osman Sid-Ahmed, Ming Xie, Mu Lei

Abstract:

The present paper studies a structure consisting of a periodic metallic grating, coated on a dielectric spacer atop an opaque metal substrate, using coherent thermal emission source in the infrared region. It has been theoretically demonstrated that by exciting surface magnetic polaritons between metallic gratings and an opaque metallic film, separated by a dielectric spacer, large emissivity peaks are almost independent of the emission angle and they can be achieved at the resonance frequencies. The reflectance spectrum of the proposed structure shows two resonances dip, which leads to a sharp emissivity peak. The relations of the reflection and absorption properties and the influence of geometric parameters on the radiative properties are investigated by rigorous coupled-wave analysis (RCWA). The proposed structure can be easily constructed, using micro/nanofabrication and can be used as the coherent thermal emission source.

Keywords: Coherent thermal emission, Polartons, Reflectance, Resonance frequency, Rigorous coupled wave analysis (RCWA).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2129
4310 Speech Coding and Recognition

Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha

Abstract:

This paper investigates the performance of a speech recognizer in an interactive voice response system for various coded speech signals, coded by using a vector quantization technique namely Multi Switched Split Vector Quantization Technique. The process of recognizing the coded output can be used in Voice banking application. The recognition technique used for the recognition of the coded speech signals is the Hidden Markov Model technique. The spectral distortion performance, computational complexity, and memory requirements of Multi Switched Split Vector Quantization Technique and the performance of the speech recognizer at various bit rates have been computed. From results it is found that the speech recognizer is showing better performance at 24 bits/frame and it is found that the percentage of recognition is being varied from 100% to 93.33% for various bit rates.

Keywords: Linear predictive coding, Speech Recognition, Voice banking, Multi Switched Split Vector Quantization, Hidden Markov Model, Linear Predictive Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1788
4309 Slovenian Text-to-Speech Synthesis for Speech User Interfaces

Authors: Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešić, Mario Žganec, Stanislav Gruden

Abstract:

The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.

Keywords: text-to-speech synthesis, prosody modeling, speech user interface.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1401
4308 Application of Smooth Ergodic Hidden Markov Model in Text to Speech Systems

Authors: Armin Ghayoori, Faramarz Hendessi, Asrar Sheikh

Abstract:

In developing a text-to-speech system, it is well known that the accuracy of information extracted from a text is crucial to produce high quality synthesized speech. In this paper, a new scheme for converting text into its equivalent phonetic spelling is introduced and developed. This method is applicable to many applications in text to speech converting systems and has many advantages over other methods. The proposed method can also complement the other methods with a purpose of improving their performance. The proposed method is a probabilistic model and is based on Smooth Ergodic Hidden Markov Model. This model can be considered as an extension to HMM. The proposed method is applied to Persian language and its accuracy in converting text to speech phonetics is evaluated using simulations.

Keywords: Hidden Markov Models, text, synthesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1509
4307 Environmental Interference Cancellation of Speech with the Radial Basis Function Networks: An Experimental Comparison

Authors: Nima Hatami

Abstract:

In this paper, we use Radial Basis Function Networks (RBFN) for solving the problem of environmental interference cancellation of speech signal. We show that the Second Order Thin- Plate Spline (SOTPS) kernel cancels the interferences effectively. For make comparison, we test our experiments on two conventional most used RBFN kernels: the Gaussian and First order TPS (FOTPS) basis functions. The speech signals used here were taken from the OGI Multi-Language Telephone Speech Corpus database and were corrupted with six type of environmental noise from NOISEX-92 database. Experimental results show that the SOTPS kernel can considerably outperform the Gaussian and FOTPS functions on speech interference cancellation problem.

Keywords: Environmental interference, interference cancellation of speech, Radial Basis Function networks, Gaussian and TPS kernels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1520
4306 A Tool for Audio Quality Evaluation Under Hostile Environment

Authors: Akhil Kumar Arya, Jagdeep Singh Lather, Lillie Dewan

Abstract:

In this paper is to evaluate audio and speech quality with the help of Digital Audio Watermarking Technique under the different types of attacks (signal impairments) like Gaussian Noise, Compression Error and Jittering Effect. Further attacks are considered as Hostile Environment. Audio and Speech Quality Evaluation is an important research topic. The traditional way for speech quality evaluation is using subjective tests. They are reliable, but very expensive, time consuming, and cannot be used in certain applications such as online monitoring. Objective models, based on human perception, were developed to predict the results of subjective tests. The existing objective methods require either the original speech or complicated computation model, which makes some applications of quality evaluation impossible.

Keywords: Digital Watermarking, DCT, Speech Quality, Attacks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1581
4305 Recognition of Isolated Speech Signals using Simplified Statistical Parameters

Authors: Abhijit Mitra, Bhargav Kumar Mitra, Biswajoy Chatterjee

Abstract:

We present a novel scheme to recognize isolated speech signals using certain statistical parameters derived from those signals. The determination of the statistical estimates is based on extracted signal information rather than the original signal information in order to reduce the computational complexity. Subtle details of these estimates, after extracting the speech signal from ambience noise, are first exploited to segregate the polysyllabic words from the monosyllabic ones. Precise recognition of each distinct word is then carried out by analyzing the histogram, obtained from these information.

Keywords: Isolated speech signals, Block overlapping technique, Positive peaks, Histogram analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1377