Search results for: Speech Quality
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3150

Search results for: Speech Quality

3120 Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control

Authors: Van Nhan Nguyen, Harald Holone

Abstract:

Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.

Keywords: Automatic Speech Recognition, ASR, Air Traffic Control, ATC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3984
3119 Performance Evaluation of Acoustic-Spectrographic Voice Identification Method in Native and Non-Native Speech

Authors: E. Krasnova, E. Bulgakova, V. Shchemelinin

Abstract:

The paper deals with acoustic-spectrographic voice identification method in terms of its performance in non-native language speech. Performance evaluation is conducted by comparing the result of the analysis of recordings containing native language speech with recordings that contain foreign language speech. Our research is based on Tajik and Russian speech of Tajik native speakers due to the character of the criminal situation with drug trafficking. We propose a pilot experiment that represents a primary attempt enter the field.

Keywords: Speaker identification, acoustic-spectrographic method, non-native speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 825
3118 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy

Authors: Nazaket Gazieva

Abstract:

Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.

Keywords: Biometric voice prints, fundamental frequency, phonogram, speech signal, temporal characteristics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 496
3117 From Maskee to Audible Noise in Perceptual Speech Enhancement

Authors: Asmaa Amehraye, Dominique Pastor, Ahmed Tamtaoui, Driss Aboutajdine

Abstract:

A new analysis of perceptual speech enhancement is presented. It focuses on the fact that if only noise above the masking threshold is filtered, then noise below the masking threshold, but above the absolute threshold of hearing, can become audible after the masker filtering. This particular drawback of some perceptual filters, hereafter called the maskee-to-audible-noise (MAN) phenomenon, favours the emergence of isolated tonals that increase musical noise. Two filtering techniques that avoid or correct the MAN phenomenon are proposed to effectively suppress background noise without introducing much distortion. Experimental results, including objective and subjective measurements, show that these techniques improve the enhanced speech quality and the gain they bring emphasizes the importance of the MAN phenomenon.

Keywords: Perceptual speech filtering, maskee to audible noise, distorsion, musical noise.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1450
3116 The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition

Authors: Fawaz S. Al-Anzi, Dia AbuZeina

Abstract:

Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature extraction process, which parameterizes the speech signal to produce the corresponding feature vectors. Feature extraction process aims at approximating the linguistic content that is conveyed by the input speech signal. In speech processing field, there are several methods to extract speech features, however, Mel Frequency Cepstral Coefficients (MFCC) is the popular technique. It has been long observed that the MFCC is dominantly used in the well-known recognizers such as the Carnegie Mellon University (CMU) Sphinx and the Markov Model Toolkit (HTK). Hence, this paper focuses on the MFCC method as the standard choice to identify the different speech segments in order to obtain the language phonemes for further training and decoding steps. Due to MFCC good performance, the previous studies show that the MFCC dominates the Arabic ASR research. In this paper, we demonstrate MFCC as well as the intermediate steps that are performed to get these coefficients using the HTK toolkit.

Keywords: Speech recognition, acoustic features, Mel Frequency Cepstral Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1922
3115 Voice Features as the Diagnostic Marker of Autism

Authors: Elena Lyakso, Olga Frolova, Yuri Matveev

Abstract:

The aim of the study is to determine the acoustic features of voice and speech of children with autism spectrum disorders (ASD) as a possible additional diagnostic criterion. The participants in the study were 95 children with ASD aged 5-16 years, 150 typically development (TD) children, and 103 adults – listening to children’s speech samples. Three types of experimental methods for speech analysis were performed: spectrographic, perceptual by listeners, and automatic recognition. In the speech of children with ASD, the pitch values, pitch range, values of frequency and intensity of the third formant (emotional) leading to the “atypical” spectrogram of vowels are higher than corresponding parameters in the speech of TD children. High values of vowel articulation index (VAI) are specific for ASD children’s speech signals. These acoustic features can be considered as diagnostic marker of autism. The ability of humans and automatic recognition of the psychoneurological state of children via their speech is determined.

Keywords: Autism spectrum disorders, biomarker of autism, child speech, voice features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 541
3114 A Sparse Representation Speech Denoising Method Based on Adapted Stopping Residue Error

Authors: Qianhua He, Weili Zhou, Aiwu Chen

Abstract:

A sparse representation speech denoising method based on adapted stopping residue error was presented in this paper. Firstly, the cross-correlation between the clean speech spectrum and the noise spectrum was analyzed, and an estimation method was proposed. In the denoising method, an over-complete dictionary of the clean speech power spectrum was learned with the K-singular value decomposition (K-SVD) algorithm. In the sparse representation stage, the stopping residue error was adaptively achieved according to the estimated cross-correlation and the adjusted noise spectrum, and the orthogonal matching pursuit (OMP) approach was applied to reconstruct the clean speech spectrum from the noisy speech. Finally, the clean speech was re-synthesised via the inverse Fourier transform with the reconstructed speech spectrum and the noisy speech phase. The experiment results show that the proposed method outperforms the conventional methods in terms of subjective and objective measure.

Keywords: Speech denoising, sparse representation, K-singular value decomposition, orthogonal matching pursuit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 967
3113 Eisenhower’s Farewell Speech: Initial and Continuing Communication Effects

Authors: B. Kuiper

Abstract:

When Dwight D. Eisenhower delivered his final Presidential speech in 1961, he was using the opportunity to bid farewell to America, but he was also trying to warn his fellow countrymen about deeper challenges threatening the country. In this analysis, Eisenhower’s speech is examined in light of the impact it had on American culture, communication concepts, and political ramifications. The paper initially highlights the previous literature on the speech, especially in light of its 50th anniversary, and reveals a man whose main concern was how the speech’s words would affect his beloved country. The painstaking approach to the wording of the speech to reveal the intent is key, particularly in light of analyzing the motivations according to “virtuous communication.” This philosophical construct indicates that Eisenhower’s Farewell Address was crafted carefully according to a departing President’s deepest values and concerns, concepts that he wanted to pass along to his successor, to his country, and even to the world.

Keywords: Eisenhower, mass communication, political speech, rhetoric.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1817
3112 Hybrid Modeling Algorithm for Continuous Tamil Speech Recognition

Authors: M. Kalamani, S. Valarmathy, M. Krishnamoorthi

Abstract:

In this paper, Fuzzy C-Means clustering with Expectation Maximization-Gaussian Mixture Model based hybrid modeling algorithm is proposed for Continuous Tamil Speech Recognition. The speech sentences from various speakers are used for training and testing phase and objective measures are between the proposed and existing Continuous Speech Recognition algorithms. From the simulated results, it is observed that the proposed algorithm improves the recognition accuracy and F-measure up to 3% as compared to that of the existing algorithms for the speech signal from various speakers. In addition, it reduces the Word Error Rate, Error Rate and Error up to 4% as compared to that of the existing algorithms. In all aspects, the proposed hybrid modeling for Tamil speech recognition provides the significant improvements for speechto- text conversion in various applications.

Keywords: Speech Segmentation, Feature Extraction, Clustering, HMM, EM-GMM, CSR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2093
3111 Neural Network Based Speech to Text in Malay Language

Authors: H. F. A. Abdul Ghani, R. R. Porle

Abstract:

Speech to text in Malay language is a system that converts Malay speech into text. The Malay language recognition system is still limited, thus, this paper aims to investigate the performance of ten Malay words obtained from the online Malay news. The methodology consists of three stages, which are preprocessing, feature extraction, and speech classification. In preprocessing stage, the speech samples are filtered using pre emphasis. After that, feature extraction method is applied to the samples using Mel Frequency Cepstrum Coefficient (MFCC). Lastly, speech classification is performed using Feedforward Neural Network (FFNN). The accuracy of the classification is further investigated based on the hidden layer size. From experimentation, the classifier with 40 hidden neurons shows the highest classification rate which is 94%.  

Keywords: Feed-Forward Neural Network, FFNN, Malay speech recognition, Mel Frequency Cepstrum Coefficient, MFCC, speech-to-text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 667
3110 On the Effectivity of Different Pseudo-Noise and Orthogonal Sequences for Speech Encryption from Correlation Properties

Authors: V. Anil Kumar, Abhijit Mitra, S. R. Mahadeva Prasanna

Abstract:

We analyze the effectivity of different pseudo noise (PN) and orthogonal sequences for encrypting speech signals in terms of perceptual intelligence. Speech signal can be viewed as sequence of correlated samples and each sample as sequence of bits. The residual intelligibility of the speech signal can be reduced by removing the correlation among the speech samples. PN sequences have random like properties that help in reducing the correlation among speech samples. The mean square aperiodic auto-correlation (MSAAC) and the mean square aperiodic cross-correlation (MSACC) measures are used to test the randomness of the PN sequences. Results of the investigation show the effectivity of large Kasami sequences for this purpose among many PN sequences.

Keywords: Speech encryption, pseudo-noise codes, maximallength, Gold, Barker, Kasami, Walsh-Hadamard, autocorrelation, crosscorrelation, figure of merit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1999
3109 A Modified Speech Enhancement Using Adaptive Gain Equalizer with Non linear Spectral Subtraction for Robust Speech Recognition

Authors: C. Ganesh Babu, P. T. Vanathi

Abstract:

In this paper we present an enhanced noise reduction method for robust speech recognition using Adaptive Gain Equalizer with Non linear Spectral Subtraction. In Adaptive Gain Equalizer method (AGE), the input signal is divided into a number of subbands that are individually weighed in time domain, in accordance to the short time Signal-to-Noise Ratio (SNR) in each subband estimation at every time instant. Instead of focusing on suppression the noise on speech enhancement is focused. When analysis was done under various noise conditions for speech recognition, it was found that Adaptive Gain Equalizer method algorithm has an obvious failing point for a SNR of -5 dB, with inadequate levels of noise suppression for SNR less than this point. This work proposes the implementation of AGE when coupled with Non linear Spectral Subtraction (AGE-NSS) for robust speech recognition. The experimental result shows that out AGE-NSS performs the AGE when SNR drops below -5db level.

Keywords: Adaptive Gain Equalizer, Non Linear Spectral Subtraction, Speech Enhancement, and Speech Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1662
3108 Speech Acts and Politeness Strategies in an EFL Classroom in Georgia

Authors: Tinatin Kurdghelashvili

Abstract:

The paper deals with the usage of speech acts and politeness strategies in an EFL classroom in Georgia (Rep of). It explores the students’ and the teachers’ practice of the politeness strategies and the speech acts of apology, thanking, request, compliment / encouragement, command, agreeing / disagreeing, addressing and code switching. The research method includes observation as well as a questionnaire. The target group involves the students from Georgian public schools and two certified, experienced local English teachers. The analysis is based on Searle’s Speech Act Theory and Brown and Levinson’s politeness strategies. The findings show that the students have certain knowledge regarding politeness yet they fail to apply them in English communication. In addition, most of the speech acts from the classroom interaction are used by the teachers and not the students. Thereby, it is suggested that teachers should cultivate the students’ communicative competence and attempt to give them opportunities to practise more English speech acts than they do today.

Keywords: English as a foreign language, Georgia, politeness principles, speech acts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6112
3107 Voice Driven Applications in Non-stationary and Chaotic Environment

Authors: C. Kwan, X. Li, D. Lao, Y. Deng, Z. Ren, B. Raj, R. Singh, R. Stern

Abstract:

Automated operations based on voice commands will become more and more important in many applications, including robotics, maintenance operations, etc. However, voice command recognition rates drop quite a lot under non-stationary and chaotic noise environments. In this paper, we tried to significantly improve the speech recognition rates under non-stationary noise environments. First, 298 Navy acronyms have been selected for automatic speech recognition. Data sets were collected under 4 types of noisy environments: factory, buccaneer jet, babble noise in a canteen, and destroyer. Within each noisy environment, 4 levels (5 dB, 15 dB, 25 dB, and clean) of Signal-to-Noise Ratio (SNR) were introduced to corrupt the speech. Second, a new algorithm to estimate speech or no speech regions has been developed, implemented, and evaluated. Third, extensive simulations were carried out. It was found that the combination of the new algorithm, the proper selection of language model and a customized training of the speech recognizer based on clean speech yielded very high recognition rates, which are between 80% and 90% for the four different noisy conditions. Fourth, extensive comparative studies have also been carried out.

Keywords: Non-stationary, speech recognition, voice commands.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490
3106 A Semi- One Time Pad Using Blind Source Separation for Speech Encryption

Authors: Long Jye Sheu, Horng-Shing Chiou, Wei Ching Chen

Abstract:

We propose a new perspective on speech communication using blind source separation. The original speech is mixed with key signals which consist of the mixing matrix, chaotic signals and a random noise. However, parts of the keys (the mixing matrix and the random noise) are not necessary in decryption. In practice implement, one can encrypt the speech by changing the noise signal every time. Hence, the present scheme obtains the advantages of a One Time Pad encryption while avoiding its drawbacks in key exchange. It is demonstrated that the proposed scheme is immune against traditional attacks.

Keywords: one time pad, blind source separation, independentcomponent analysis, speech encryption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1521
3105 Advances in Artificial Intelligence Using Speech Recognition

Authors: Khaled M. Alhawiti

Abstract:

This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.

Keywords: Speech recognition, acoustic phonetic, artificial intelligence, Hidden Markov Models (HMM), statistical models of speech recognition, human machine performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7898
3104 SMaTTS: Standard Malay Text to Speech System

Authors: Othman O. Khalifa, Zakiah Hanim Ahmad, Teddy Surya Gunawan

Abstract:

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.

Keywords: Natural Language Processing, Text-To-Speech (TTS), Diphone, source filter, low-/ high- level synthesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1930
3103 Speech Intelligibility Improvement Using Variable Level Decomposition DWT

Authors: Samba Raju, Chiluveru, Manoj Tripathy

Abstract:

Intelligibility is an essential characteristic of a speech signal, which is used to help in the understanding of information in speech signal. Background noise in the environment can deteriorate the intelligibility of a recorded speech. In this paper, we presented a simple variance subtracted - variable level discrete wavelet transform, which improve the intelligibility of speech. The proposed algorithm does not require an explicit estimation of noise, i.e., prior knowledge of the noise; hence, it is easy to implement, and it reduces the computational burden. The proposed algorithm decides a separate decomposition level for each frame based on signal dominant and dominant noise criteria. The performance of the proposed algorithm is evaluated with speech intelligibility measure (STOI), and results obtained are compared with Universal Discrete Wavelet Transform (DWT) thresholding and Minimum Mean Square Error (MMSE) methods. The experimental results revealed that the proposed scheme outperformed competing methods

Keywords: Discrete Wavelet Transform, speech intelligibility, STOI, standard deviation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 638
3102 Various Speech Processing Techniques For Speech Compression And Recognition

Authors: Jalal Karam

Abstract:

Years of extensive research in the field of speech processing for compression and recognition in the last five decades, resulted in a severe competition among the various methods and paradigms introduced. In this paper we include the different representations of speech in the time-frequency and time-scale domains for the purpose of compression and recognition. The examination of these representations in a variety of related work is accomplished. In particular, we emphasize methods related to Fourier analysis paradigms and wavelet based ones along with the advantages and disadvantages of both approaches.

Keywords: Time-Scale, Wavelets, Time-Frequency, Compression, Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2290
3101 A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children

Authors: Mumtaz Begum Mustafa, Siti Salwah Salim, Feizal Dani Rahman

Abstract:

Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.

Keywords: Automatic speech recognition system, children speech, adaptation, Malay.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1712
3100 Speech Coding and Recognition

Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha

Abstract:

This paper investigates the performance of a speech recognizer in an interactive voice response system for various coded speech signals, coded by using a vector quantization technique namely Multi Switched Split Vector Quantization Technique. The process of recognizing the coded output can be used in Voice banking application. The recognition technique used for the recognition of the coded speech signals is the Hidden Markov Model technique. The spectral distortion performance, computational complexity, and memory requirements of Multi Switched Split Vector Quantization Technique and the performance of the speech recognizer at various bit rates have been computed. From results it is found that the speech recognizer is showing better performance at 24 bits/frame and it is found that the percentage of recognition is being varied from 100% to 93.33% for various bit rates.

Keywords: Linear predictive coding, Speech Recognition, Voice banking, Multi Switched Split Vector Quantization, Hidden Markov Model, Linear Predictive Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1787
3099 Slovenian Text-to-Speech Synthesis for Speech User Interfaces

Authors: Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešić, Mario Žganec, Stanislav Gruden

Abstract:

The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.

Keywords: text-to-speech synthesis, prosody modeling, speech user interface.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1400
3098 Environmental Interference Cancellation of Speech with the Radial Basis Function Networks: An Experimental Comparison

Authors: Nima Hatami

Abstract:

In this paper, we use Radial Basis Function Networks (RBFN) for solving the problem of environmental interference cancellation of speech signal. We show that the Second Order Thin- Plate Spline (SOTPS) kernel cancels the interferences effectively. For make comparison, we test our experiments on two conventional most used RBFN kernels: the Gaussian and First order TPS (FOTPS) basis functions. The speech signals used here were taken from the OGI Multi-Language Telephone Speech Corpus database and were corrupted with six type of environmental noise from NOISEX-92 database. Experimental results show that the SOTPS kernel can considerably outperform the Gaussian and FOTPS functions on speech interference cancellation problem.

Keywords: Environmental interference, interference cancellation of speech, Radial Basis Function networks, Gaussian and TPS kernels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1518
3097 Author's Approach to the Problem of Correctional Speech Therapy with Children Suffering from Alalia

Authors: Е. V. Kutsina, S. A. Tarasova

Abstract:

In this article we present a methodology which enables preschool and primary school unlanguaged children to remember words, phrases and texts with the help of graphic signs - letters, syllables and words. Reading for a child becomes a support for speech development. Teaching is based on the principle "from simple to complex", "a letter - a syllable - a word - a proposal - a text." Availability of multi-level texts allows using this methodology for working with children who have different levels of speech development.

Keywords: Alalia, analytic-synthetic method, development of coherent speech, formation of vocabulary, learning to read, , sentence formation, three-level stories, unlanguaged children.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1898
3096 Recognition of Isolated Speech Signals using Simplified Statistical Parameters

Authors: Abhijit Mitra, Bhargav Kumar Mitra, Biswajoy Chatterjee

Abstract:

We present a novel scheme to recognize isolated speech signals using certain statistical parameters derived from those signals. The determination of the statistical estimates is based on extracted signal information rather than the original signal information in order to reduce the computational complexity. Subtle details of these estimates, after extracting the speech signal from ambience noise, are first exploited to segregate the polysyllabic words from the monosyllabic ones. Precise recognition of each distinct word is then carried out by analyzing the histogram, obtained from these information.

Keywords: Isolated speech signals, Block overlapping technique, Positive peaks, Histogram analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1376
3095 Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language

Authors: Khalid A. Darabkh, Ala F. Khalifeh, Baraa A. Bathech, Saed W. Sabah

Abstract:

Despite the fact that Arabic language is currently one of the most common languages worldwide, there has been only a little research on Arabic speech recognition relative to other languages such as English and Japanese. Generally, digital speech processing and voice recognition algorithms are of special importance for designing efficient, accurate, as well as fast automatic speech recognition systems. However, the speech recognition process carried out in this paper is divided into three stages as follows: firstly, the signal is preprocessed to reduce noise effects. After that, the signal is digitized and hearingized. Consequently, the voice activity regions are segmented using voice activity detection (VAD) algorithm. Secondly, features are extracted from the speech signal using Mel-frequency cepstral coefficients (MFCC) algorithm. Moreover, delta and acceleration (delta-delta) coefficients have been added for the reason of improving the recognition accuracy. Finally, each test word-s features are compared to the training database using dynamic time warping (DTW) algorithm. Utilizing the best set up made for all affected parameters to the aforementioned techniques, the proposed system achieved a recognition rate of about 98.5% which outperformed other HMM and ANN-based approaches available in the literature.

Keywords: Arabic speech recognition, MFCC, DTW, VAD.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4030
3094 Detection of Clipped Fragments in Speech Signals

Authors: Sergei Aleinik, Yuri Matveev

Abstract:

In this paper a novel method for the detection of  clipping in speech signals is described. It is shown that the new  method has better performance than known clipping detection  methods, is easy to implement, and is robust to changes in signal  amplitude, size of data, etc. Statistical simulation results are  presented.

 

Keywords: Clipping, clipped signal, speech signal processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2621
3093 A New Time-Frequency Speech Analysis Approach Based On Adaptive Fourier Decomposition

Authors: Liming Zhang

Abstract:

In this paper, a new adaptive Fourier decomposition (AFD) based time-frequency speech analysis approach is proposed. Given the fact that the fundamental frequency of speech signals often undergo fluctuation, the classical short-time Fourier transform (STFT) based spectrogram analysis suffers from the difficulty of window size selection. AFD is a newly developed signal decomposition theory. It is designed to deal with time-varying non-stationary signals. Its outstanding characteristic is to provide instantaneous frequency for each decomposed component, so the time-frequency analysis becomes easier. Experiments are conducted based on the sample sentence in TIMIT Acoustic-Phonetic Continuous Speech Corpus. The results show that the AFD based time-frequency distribution outperforms the STFT based one.

Keywords: Adaptive fourier decomposition, instantaneous frequency, speech analysis, time-frequency distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1682
3092 Speech Enhancement of Vowels Based on Pitch and Formant Frequency

Authors: R. Rishma Rodrigo, R. Radhika, M. Vanitha Lakshmi

Abstract:

Numerous signal processing based speech enhancement systems have been proposed to improve intelligibility in the presence of noise. Traditionally, studies of neural vowel encoding have focused on the representation of formants (peaks in vowel spectra) in the discharge patterns of the population of auditory-nerve (AN) fibers. A method is presented for recording high-frequency speech components into a low-frequency region, to increase audibility for hearing loss listeners. The purpose of the paper is to enhance the formant of the speech based on the Kaiser window. The pitch and formant of the signal is based on the auto correlation, zero crossing and magnitude difference function. The formant enhancement stage aims to restore the representation of formants at the level of the midbrain. A MATLAB software’s are used for the implementation of the system with low complexity is developed.

Keywords: Formant estimation, formant enhancement, pitch detection, speech analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
3091 On Preprocessing of Speech Signals

Authors: Ayaz Keerio, Bhargav Kumar Mitra, Philip Birch, Rupert Young, Chris Chatwin

Abstract:

Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.

Keywords: STE based methods, Mahalanobis distance, 3 edit σ rule, Hampel Identifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1652