Search results for: speech signal processing
5596 Dual-Channel Multi-Band Spectral Subtraction Algorithm Dedicated to a Bilateral Cochlear Implant
Authors: Fathi Kallel, Ahmed Ben Hamida, Christian Berger-Vachon
Abstract:
In this paper, a Speech Enhancement Algorithm based on Multi-Band Spectral Subtraction (MBSS) principle is evaluated for Bilateral Cochlear Implant (BCI) users. Specifically, dual-channel noise power spectral estimation algorithm using Power Spectral Densities (PSD) and Cross Power Spectral Densities (CPSD) of the observed signals is studied. The enhanced speech signal is obtained using Dual-Channel Multi-Band Spectral Subtraction ‘DC-MBSS’ algorithm. For performance evaluation, objective speech assessment test relying on Perceptual Evaluation of Speech Quality (PESQ) score is performed to fix the optimal number of frequency bands needed in DC-MBSS algorithm. In order to evaluate the speech intelligibility, subjective listening tests are assessed with 3 deafened BCI patients. Experimental results obtained using French Lafon database corrupted by an additive babble noise at different Signal-to-Noise Ratios (SNR) showed that DC-MBSS algorithm improves speech understanding for single and multiple interfering noise sources.Keywords: speech enhancement, spectral substracion, noise estimation, cochlear impalnt
Procedia PDF Downloads 5495595 Exploiting Fast Independent Component Analysis Based Algorithm for Equalization of Impaired Baseband Received Signal
Authors: Muhammad Umair, Syed Qasim Gilani
Abstract:
A technique using Independent Component Analysis (ICA) for blind receiver signal processing is investigated. The problem of the receiver signal processing is viewed as of signal equalization and implementation imperfections compensation. Based on this, a model similar to a general ICA problem is developed for the received signal. Then, the use of ICA technique for blind signal equalization in the time domain is presented. The equalization is regarded as a signal separation problem, since the desired signal is separated from interference terms. This problem is addressed in the paper by over-sampling of the received signal. By using ICA for equalization, besides channel equalization, other transmission imperfections such as Direct current (DC) bias offset, carrier phase and In phase Quadrature phase imbalance will also be corrected. Simulation results for a system using 16-Quadraure Amplitude Modulation(QAM) are presented to show the performance of the proposed scheme.Keywords: blind equalization, blind signal separation, equalization, independent component analysis, transmission impairments, QAM receiver
Procedia PDF Downloads 2145594 Distant Speech Recognition Using Laser Doppler Vibrometer
Authors: Yunbin Deng
Abstract:
Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR
Procedia PDF Downloads 1795593 An Analysis of Illocutioary Act in Martin Luther King Jr.'s Propaganda Speech Entitled 'I Have a Dream'
Authors: Mahgfirah Firdaus Soberatta
Abstract:
Language cannot be separated from human life. Humans use language to convey ideas, thoughts, and feelings. We can use words for different things for example like asserted, advising, promise, give opinions, hopes, etc. Propaganda is an attempt which seeks to obtain stable behavior to adopt everyone to his everyday life. It also controls the thoughts and attitudes of individuals in social settings permanent. In this research, the writer will discuss about the speech act in a propaganda speech delivered by Martin Luther King Jr. in Washington at Lincoln Memorial on August 28, 1963. 'I Have a Dream' is a public speech delivered by American civil rights activist MLK, he calls from an end to racism in USA. In this research, the writer uses Searle theory to analyze the types of illocutionary speech act that used by Martin Luther King Jr. in his propaganda speech. In this research, the writer uses a qualitative method described in descriptive, because the research wants to describe and explain the types of illocutionary speech acts used by Martin Luther King Jr. in his propaganda speech. The findings indicate that there are five types of speech acts in Martin Luther King Jr. speech. MLK also used direct speech and indirect speech in his propaganda speech. However, direct speech is the dominant speech act that MLK used in his propaganda speech. It is hoped that this research is useful for the readers to enrich their knowledge in a particular field of pragmatic speech acts.Keywords: speech act, propaganda, Martin Luther King Jr., speech
Procedia PDF Downloads 4415592 EEG Signal Processing Methods to Differentiate Mental States
Authors: Sun H. Hwang, Young E. Lee, Yunhan Ga, Gilwon Yoon
Abstract:
EEG is a very complex signal with noises and other bio-potential interferences. EOG is the most distinct interfering signal when EEG signals are measured and analyzed. It is very important how to process raw EEG signals in order to obtain useful information. In this study, the EEG signal processing techniques such as EOG filtering and outlier removal were examined to minimize unwanted EOG signals and other noises. The two different mental states of resting and focusing were examined through EEG analysis. A focused state was induced by letting subjects to watch a red dot on the white screen. EEG data for 32 healthy subjects were measured. EEG data after 60-Hz notch filtering were processed by a commercially available EOG filtering and our presented algorithm based on the removal of outliers. The ratio of beta wave to theta wave was used as a parameter for determining the degree of focusing. The results show that our algorithm was more appropriate than the existing EOG filtering.Keywords: EEG, focus, mental state, outlier, signal processing
Procedia PDF Downloads 2835591 Online Prediction of Nonlinear Signal Processing Problems Based Kernel Adaptive Filtering
Authors: Hamza Nejib, Okba Taouali
Abstract:
This paper presents two of the most knowing kernel adaptive filtering (KAF) approaches, the kernel least mean squares and the kernel recursive least squares, in order to predict a new output of nonlinear signal processing. Both of these methods implement a nonlinear transfer function using kernel methods in a particular space named reproducing kernel Hilbert space (RKHS) where the model is a linear combination of kernel functions applied to transform the observed data from the input space to a high dimensional feature space of vectors, this idea known as the kernel trick. Then KAF is the developing filters in RKHS. We use two nonlinear signal processing problems, Mackey Glass chaotic time series prediction and nonlinear channel equalization to figure the performance of the approaches presented and finally to result which of them is the adapted one.Keywords: online prediction, KAF, signal processing, RKHS, Kernel methods, KRLS, KLMS
Procedia PDF Downloads 3995590 Time Delay Estimation Using Signal Envelopes for Synchronisation of Recordings
Authors: Sergei Aleinik, Mikhail Stolbov
Abstract:
In this work, a method of time delay estimation for dual-channel acoustic signals (speech, music, etc.) recorded under reverberant conditions is investigated. Standard methods based on cross-correlation of the signals show poor results in cases involving strong reverberation, large distances between microphones and asynchronous recordings. Under similar conditions, a method based on cross-correlation of temporal envelopes of the signals delivers a delay estimation of acceptable quality. This method and its properties are described and investigated in detail, including its limits of applicability. The method’s optimal parameter estimation and a comparison with other known methods of time delay estimation are also provided.Keywords: cross-correlation, delay estimation, signal envelope, signal processing
Procedia PDF Downloads 4855589 Efficient Filtering of Graph Based Data Using Graph Partitioning
Authors: Nileshkumar Vaishnav, Aditya Tatu
Abstract:
An algebraic framework for processing graph signals axiomatically designates the graph adjacency matrix as the shift operator. In this setup, we often encounter a problem wherein we know the filtered output and the filter coefficients, and need to find out the input graph signal. Solution to this problem using direct approach requires O(N3) operations, where N is the number of vertices in graph. In this paper, we adapt the spectral graph partitioning method for partitioning of graphs and use it to reduce the computational cost of the filtering problem. We use the example of denoising of the temperature data to illustrate the efficacy of the approach.Keywords: graph signal processing, graph partitioning, inverse filtering on graphs, algebraic signal processing
Procedia PDF Downloads 3115588 A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children
Authors: Mumtaz Begum Mustafa, Siti Salwah Salim, Feizal Dani Rahman
Abstract:
Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.Keywords: Automatic Speech Recognition System, children speech, adaptation, Malay
Procedia PDF Downloads 3975587 Lab Bench for Synthetic Aperture Radar Imaging System
Authors: Karthiyayini Nagarajan, P. V. Ramakrishna
Abstract:
Radar Imaging techniques provides extensive applications in the field of remote sensing, majorly Synthetic Aperture Radar (SAR) that provide high resolution target images. This paper work puts forward the effective and realizable signal generation and processing for SAR images. The major units in the system include camera, signal generation unit, signal processing unit and display screen. The real radio channel is replaced by its mathematical model based on optical image to calculate a reflected signal model in real time. Signal generation realizes the algorithm and forms the radar reflection model. Signal processing unit provides range and azimuth resolution through matched filtering and spectrum analysis procedure to form radar image on the display screen. The restored image has the same quality as that of the optical image. This SAR imaging system has been designed and implemented using MATLAB and Quartus II tools on Stratix III device as a System (Lab Bench) that works in real time to study/investigate on radar imaging rudiments and signal processing scheme for educational and research purposes.Keywords: synthetic aperture radar, radio reflection model, lab bench, imaging engineering
Procedia PDF Downloads 4975586 Design and Implementation of a Lab Bench for Synthetic Aperture Radar Imaging System
Authors: Karthiyayini Nagarajan, P. V. RamaKrishna
Abstract:
Radar Imaging techniques provides extensive applications in the field of remote sensing, majorly Synthetic Aperture Radar(SAR) that provide high resolution target images. This paper work puts forward the effective and realizable signal generation and processing for SAR images. The major units in the system include camera, signal generation unit, signal processing unit and display screen. The real radio channel is replaced by its mathematical model based on optical image to calculate a reflected signal model in real time. Signal generation realizes the algorithm and forms the radar reflection model. Signal processing unit provides range and azimuth resolution through matched filtering and spectrum analysis procedure to form radar image on the display screen. The restored image has the same quality as that of the optical image. This SAR imaging system has been designed and implemented using MATLAB and Quartus II tools on Stratix III device as a System(lab bench) that works in real time to study/investigate on radar imaging rudiments and signal processing scheme for educational and research purposes.Keywords: synthetic aperture radar, radio reflection model, lab bench
Procedia PDF Downloads 4685585 Morpheme Based Parts of Speech Tagger for Kannada Language
Authors: M. C. Padma, R. J. Prathibha
Abstract:
Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech
Procedia PDF Downloads 2965584 Detection of Phoneme [S] Mispronounciation for Sigmatism Diagnosis in Adults
Authors: Michal Krecichwost, Zauzanna Miodonska, Pawel Badura
Abstract:
The diagnosis of sigmatism is mostly based on the observation of articulatory organs. It is, however, not always possible to precisely observe the vocal apparatus, in particular in the oral cavity of the patient. Speech processing can allow to objectify the therapy and simplify the verification of its progress. In the described study the methodology for classification of incorrectly pronounced phoneme [s] is proposed. The recordings come from adults. They were registered with the speech recorder at the sampling rate of 44.1 kHz and the resolution of 16 bit. The database of pathological and normative speech has been collected for the study including reference assessments provided by the speech therapy experts. Ten adult subjects were asked to simulate a certain type of stigmatism under the speech therapy expert supervision. In the recordings, the analyzed phone [s] was surrounded by vowels, viz: ASA, ESE, ISI, SPA, USU, YSY. Thirteen MFCC (mel-frequency cepstral coefficients) and RMS (root mean square) values are calculated within each frame being a part of the analyzed phoneme. Additionally, 3 fricative formants along with corresponding amplitudes are determined for the entire segment. In order to aggregate the information within the segment, the average value of each MFCC coefficient is calculated. All features of other types are aggregated by means of their 75th percentile. The proposed method of features aggregation reduces the size of the feature vector used in the classification. Binary SVM (support vector machine) classifier is employed at the phoneme recognition stage. The first group consists of pathological phones, while the other of the normative ones. The proposed feature vector yields classification sensitivity and specificity measures above 90% level in case of individual logo phones. The employment of a fricative formants-based information improves the sole-MFCC classification results average of 5 percentage points. The study shows that the employment of specific parameters for the selected phones improves the efficiency of pathology detection referred to the traditional methods of speech signal parameterization.Keywords: computer-aided pronunciation evaluation, sibilants, sigmatism diagnosis, speech processing
Procedia PDF Downloads 2835583 The Effect of Speech-Shaped Noise and Speaker’s Voice Quality on First-Grade Children’s Speech Perception and Listening Comprehension
Authors: I. Schiller, D. Morsomme, A. Remacle
Abstract:
Children’s ability to process spoken language develops until the late teenage years. At school, where efficient spoken language processing is key to academic achievement, listening conditions are often unfavorable. High background noise and poor teacher’s voice represent typical sources of interference. It can be assumed that these factors particularly affect primary school children, because their language and literacy skills are still low. While it is generally accepted that background noise and impaired voice impede spoken language processing, there is an increasing need for analyzing impacts within specific linguistic areas. Against this background, the aim of the study was to investigate the effect of speech-shaped noise and imitated dysphonic voice on first-grade primary school children’s speech perception and sentence comprehension. Via headphones, 5 to 6-year-old children, recruited within the French-speaking community of Belgium, listened to and performed a minimal-pair discrimination task and a sentence-picture matching task. Stimuli were randomly presented according to four experimental conditions: (1) normal voice / no noise, (2) normal voice / noise, (3) impaired voice / no noise, and (4) impaired voice / noise. The primary outcome measure was task score. How did performance vary with respect to listening condition? Preliminary results will be presented with respect to speech perception and sentence comprehension and carefully interpreted in the light of past findings. This study helps to support our understanding of children’s language processing skills under adverse conditions. Results shall serve as a starting point for probing new measures to optimize children’s learning environment.Keywords: impaired voice, sentence comprehension, speech perception, speech-shaped noise, spoken language processing
Procedia PDF Downloads 1925582 The Online Advertising Speech that Effect to the Thailand Internet User Decision Making
Authors: Panprae Bunyapukkna
Abstract:
This study investigated figures of speech used in fragrance advertising captions on the Internet. The objectives of the study were to find out the frequencies of figures of speech in fragrance advertising captions and the types of figures of speech most commonly applied in captions. The relation between figures of speech and fragrance was also examined in order to analyze how figures of speech were used to represent fragrance. Thirty-five fragrance advertisements were randomly selected from the Internet. Content analysis was applied in order to consider the relation between figures of speech and fragrance. The results showed that figures of speech were found in almost every fragrance advertisement except one advertisement of Lancôme. Thirty-four fragrance advertising captions used at least one kind of figure of speech. Metaphor was most frequently found and also most frequently applied in fragrance advertising captions, followed by alliteration, rhyme, simile and personification, and hyperbole respectively.Keywords: advertising speech, fragrance advertisements, figures of speech, metaphor
Procedia PDF Downloads 2415581 Embedded System of Signal Processing on FPGA: Underwater Application Architecture
Authors: Abdelkader Elhanaoui, Mhamed Hadji, Rachid Skouri, Said Agounad
Abstract:
The purpose of this paper is to study the phenomenon of acoustic scattering by using a new method. The signal processing (Fast Fourier Transform FFT Inverse Fast Fourier Transform iFFT and BESSEL functions) is widely applied to obtain information with high precision accuracy. Signal processing has a wider implementation in general-purpose pro-cessors. Our interest was focused on the use of FPGAs (Field-Programmable Gate Ar-rays) in order to minimize the computational complexity in single processor architecture, then be accelerated on FPGA and meet real-time and energy efficiency requirements. Gen-eral-purpose processors are not efficient for signal processing. We implemented the acous-tic backscattered signal processing model on the Altera DE-SOC board and compared it to Odroid xu4. By comparison, the computing latency of Odroid xu4 and FPGA is 60 sec-onds and 3 seconds, respectively. The detailed SoC FPGA-based system has shown that acoustic spectra are performed up to 20 times faster than the Odroid xu4 implementation. FPGA-based system of processing algorithms is realized with an absolute error of about 10⁻³. This study underlines the increasing importance of embedded systems in underwater acoustics, especially in non-destructive testing. It is possible to obtain information related to the detection and characterization of submerged cells. So we have achieved good exper-imental results in real-time and energy efficiency.Keywords: DE1 FPGA, acoustic scattering, form function, signal processing, non-destructive testing
Procedia PDF Downloads 795580 TeleMe Speech Booster: Web-Based Speech Therapy and Training Program for Children with Articulation Disorders
Authors: C. Treerattanaphan, P. Boonpramuk, P. Singla
Abstract:
Frequent, continuous speech training has proven to be a necessary part of a successful speech therapy process, but constraints of traveling time and employment dispensation become key obstacles especially for individuals living in remote areas or for dependent children who have working parents. In order to ameliorate speech difficulties with ample guidance from speech therapists, a website has been developed that supports speech therapy and training for people with articulation disorders in the standard Thai language. This web-based program has the ability to record speech training exercises for each speech trainee. The records will be stored in a database for the speech therapist to investigate, evaluate, compare and keep track of all trainees’ progress in detail. Speech trainees can request live discussions via video conference call when needed. Communication through this web-based program facilitates and reduces training time in comparison to walk-in training or appointments. This type of training also allows people with articulation disorders to practice speech lessons whenever or wherever is convenient for them, which can lead to a more regular training processes.Keywords: web-based remote training program, Thai speech therapy, articulation disorders, speech booster
Procedia PDF Downloads 3755579 Development of Non-Intrusive Speech Evaluation Measure Using S-Transform and Light-Gbm
Authors: Tusar Kanti Dash, Ganapati Panda
Abstract:
The evaluation of speech quality and intelligence is critical to the overall effectiveness of the Speech Enhancement Algorithms. Several intrusive and non-intrusive measures are employed to calculate these parameters. Non-Intrusive Evaluation is most challenging as, very often, the reference clean speech data is not available. In this paper, a novel non-intrusive speech evaluation measure is proposed using audio features derived from the Stockwell transform. These features are used with the Light Gradient Boosting Machine for the effective prediction of speech quality and intelligibility. The proposed model is analyzed using noisy and reverberant speech from four databases, and the results are compared with the standard Intrusive Evaluation Measures. It is observed from the comparative analysis that the proposed model is performing better than the standard Non-Intrusive models.Keywords: non-Intrusive speech evaluation, S-transform, light GBM, speech quality, and intelligibility
Procedia PDF Downloads 2595578 A Voice Signal Encryption Scheme Based on Chaotic Theory
Authors: Hailang Yang
Abstract:
To ensure the confidentiality and integrity of speech signals in communication transmission, this paper proposes a voice signal encryption scheme based on chaotic theory. Firstly, the scheme utilizes chaotic mapping to generate a key stream and then employs the key stream to perform bitwise exclusive OR (XOR) operations for encrypting the speech signal. Additionally, the scheme utilizes a chaotic hash function to generate a Message Authentication Code (MAC), which is appended to the encrypted data to verify the integrity of the data. Subsequently, we analyze the security performance and encryption efficiency of the scheme, comparing and optimizing it against existing solutions. Finally, experimental results demonstrate that the proposed scheme can resist common attacks, achieving high-quality encryption and speed.Keywords: chaotic theory, XOR encryption, chaotic hash function, Message Authentication Code (MAC)
Procedia PDF Downloads 515577 Annexation (Al-Iḍāfah) in Thariq bin Ziyad’s Speech
Authors: Annisa D. Febryandini
Abstract:
Annexation is a typical construction that commonly used in Arabic language. The use of the construction appears in Arabic speech such as the speech of Thariq bin Ziyad. The speech as one of the most famous speeches in the history of Islam uses many annexations. This qualitative research paper uses the secondary data by library method. Based on the data, this paper concludes that the speech has two basic structures with some variations and has some grammatical relationship. Different from the other researches that identify the speech in sociology field, the speech in this paper will be analyzed in linguistic field to take a look at the structure of its annexation as well as the grammatical relationship.Keywords: annexation, Thariq bin Ziyad, grammatical relationship, Arabic syntax
Procedia PDF Downloads 3185576 ICanny: CNN Modulation Recognition Algorithm
Authors: Jingpeng Gao, Xinrui Mao, Zhibin Deng
Abstract:
Aiming at the low recognition rate on the composite signal modulation in low signal to noise ratio (SNR), this paper proposes a modulation recognition algorithm based on ICanny-CNN. Firstly, the radar signal is transformed into the time-frequency image by Choi-Williams Distribution (CWD). Secondly, we propose an image processing algorithm using the Guided Filter and the threshold selection method, which is combined with the hole filling and the mask operation. Finally, the shallow convolutional neural network (CNN) is combined with the idea of the depth-wise convolution (Dw Conv) and the point-wise convolution (Pw Conv). The proposed CNN is designed to complete image classification and realize modulation recognition of radar signal. The simulation results show that the proposed algorithm can reach 90.83% at 0dB and 71.52% at -8dB. Therefore, the proposed algorithm has a good classification and anti-noise performance in radar signal modulation recognition and other fields.Keywords: modulation recognition, image processing, composite signal, improved Canny algorithm
Procedia PDF Downloads 1915575 Voice Commands Recognition of Mentor Robot in Noisy Environment Using HTK
Authors: Khenfer-Koummich Fatma, Hendel Fatiha, Mesbahi Larbi
Abstract:
this paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a man-machine interface with a voice recognition system that allows the operator to tele-operate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands spoken in two languages: French and Arabic. The recognition rate obtained is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equal to 30 db, the Arabic speech recognition rate is 69% and 80% for French speech recognition rate. This can be explained by the ability of phonetic context of each speech when the noise is added.Keywords: voice command, HMM, TIMIT, noise, HTK, Arabic, speech recognition
Procedia PDF Downloads 3825574 The Principle Probabilities of Space-Distance Resolution for a Monostatic Radar and Realization in Cylindrical Array
Authors: Anatoly D. Pluzhnikov, Elena N. Pribludova, Alexander G. Ryndyk
Abstract:
In conjunction with the problem of the target selection on a clutter background, the analysis of the scanning rate influence on the spatial-temporal signal structure, the generalized multivariate correlation function and the quality of the resolution with the increase pulse repetition frequency is made. The possibility of the object space-distance resolution, which is conditioned by the range-to-angle conversion with an increased scanning rate, is substantiated. The calculations for the real cylindrical array at high scanning rate are presented. The high scanning rate let to get the signal to noise improvement of the order of 10 dB for the space-time signal processing.Keywords: antenna pattern, array, signal processing, spatial resolution
Procedia PDF Downloads 1805573 A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit
Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic
Abstract:
A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30dB SNR as a reference for voice activity.Keywords: atomic decomposition, gabor, gammatone, matching pursuit, voice activity detection
Procedia PDF Downloads 2905572 Speech Perception by Monolingual and Bilingual Dravidian Speakers under Adverse Listening Conditions
Authors: S. B. Rathna Kumar, Sale Kranthi, Sandya K. Varudhini
Abstract:
The precise perception of spoken language is influenced by several variables, including the listeners’ native language, distance between speaker and listener, reverberation and background noise. When noise is present in an acoustic environment, it masks the speech signal resulting in reduction in the redundancy of the acoustic and linguistic cues of speech. There is strong evidence that bilinguals face difficulty in speech perception for their second language compared with monolingual speakers under adverse listening conditions such as presence of background noise. This difficulty persists even for speakers who are highly proficient in their second language and is greater in those who have learned the second language later in life. The present study aimed to assess the performance of monolingual (Telugu speaking) and bilingual (Tamil as first language and Telugu as second language) speakers on Telugu speech perception task under quiet and noisy environments. The results indicated that both the groups performed similar in both quiet and noisy environments. The findings of the present study are not in accordance with the findings of previous studies which strongly report poorer speech perception in adverse listening conditions such as noise with bilingual speakers for their second language compared with monolinguals.Keywords: monolingual, bilingual, second language, speech perception, quiet, noise
Procedia PDF Downloads 3895571 An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System
Authors: Ben Soltane Cheima, Ittansa Yonas Kelbesa
Abstract:
Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.Keywords: feature extraction, speaker modeling, feature matching, Mel frequency cepstrum coefficient (MFCC), Gaussian mixture model (GMM), vector quantization (VQ), Linde-Buzo-Gray (LBG), expectation maximization (EM), pre-processing, voice activity detection (VAD), short time energy (STE), background noise statistical modeling, closed-set tex-independent speaker identification system (CISI)
Procedia PDF Downloads 3095570 Visual Speech Perception of Arabic Emphatics
Authors: Maha Saliba Foster
Abstract:
Speech perception has been recognized as a bi-sensory process involving the auditory and visual channels. Compared to the auditory modality, the contribution of the visual signal to speech perception is not very well understood. Studying how the visual modality affects speech recognition can have pedagogical implications in second language learning, as well as clinical application in speech therapy. The current investigation explores the potential effect of speech visual cues on the perception of Arabic emphatics (AEs). The corpus consists of 36 minimal pairs each containing two contrasting consonants, an AE versus a non-emphatic (NE). Movies of four Lebanese speakers were edited to allow perceivers to have partial view of facial regions: lips only, lips-cheeks, lips-chin, lips-cheeks-chin, lips-cheeks-chin-neck. In the absence of any auditory information and relying solely on visual speech, perceivers were above chance at correctly identifying AEs or NEs across vowel contexts; moreover, the models were able to predict the probability of perceivers’ accuracy in identifying some of the COIs produced by certain speakers; additionally, results showed an overlap between the measurements selected by the computer and those selected by human perceivers. The lack of significant face effect on the perception of AEs seems to point to the lips, present in all of the videos, as the most important and often sufficient facial feature for emphasis recognition. Future investigations will aim at refining the analyses of visual cues used by perceivers by using Principal Component Analysis and including time evolution of facial feature measurements.Keywords: Arabic emphatics, machine learning, speech perception, visual speech perception
Procedia PDF Downloads 3065569 Signal Processing of Barkhausen Noise Signal for Assessment of Increasing Down Feed in Surface Ground Components with Poor Micro-Magnetic Response
Authors: Tanmaya Kumar Dash, Tarun Karamshetty, Soumitra Paul
Abstract:
The Barkhausen Noise Analysis (BNA) technique has been utilized to assess surface integrity of steels. But the BNA technique is not very successful in evaluating surface integrity of ground steels that exhibit poor micro-magnetic response. A new approach has been proposed for the processing of BN signal with Fast Fourier transforms while Wavelet transforms has been used to remove noise from the BN signal, with judicious choice of the ‘threshold’ value, when the micro-magnetic response of the work material is poor. In the present study, the effect of down feed induced upon conventional plunge surface grinding of hardened bearing steel has been investigated along with an ultrasonically cleaned, wet polished and a sample ground with spark out technique for benchmarking. Moreover, the FFT analysis has been established, at different sets of applied voltages and applied frequency and the pattern of the BN signal in the frequency domain is analyzed. The study also depicts the wavelet transforms technique with different levels of decomposition and different mother wavelets, which has been used to reduce the noise value in BN signal of materials with poor micro-magnetic response, in order to standardize the procedure for all BN signals depending on the frequency of the applied voltage.Keywords: barkhausen noise analysis, grinding, magnetic properties, signal processing, micro-magnetic response
Procedia PDF Downloads 6675568 Speech Impact Realization via Manipulative Argumentation Techniques in Modern American Political Discourse
Authors: Zarine Avetisyan
Abstract:
Paper presents the discussion of scholars concerning speech impact, peculiarities of its realization, speech strategies, and techniques. Departing from the viewpoints of many prominent linguists, the paper suggests manipulative argumentation be viewed as a most pervasive speech strategy with a certain set of techniques which are to be found in modern American political discourse. The precedence of their occurrence allows us to regard them as pragmatic patterns of speech impact realization in effective public speaking.Keywords: speech impact, manipulative argumentation, political discourse, technique
Procedia PDF Downloads 5085567 Recognition of Voice Commands of Mentor Robot in Noisy Environment Using Hidden Markov Model
Authors: Khenfer Koummich Fatma, Hendel Fatiha, Mesbahi Larbi
Abstract:
This paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a human-machine interface with a voice recognition system that allows the operator to teleoperate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands pronounced in two languages: French and Arabic. The obtained recognition rate is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equals 30 dB, in this case; the Arabic speech recognition rate is 69%, and the French speech recognition rate is 80%. This can be explained by the ability of phonetic context of each speech when the noise is added.Keywords: Arabic speech recognition, Hidden Markov Model (HMM), HTK, noise, TIMIT, voice command
Procedia PDF Downloads 385