Search results for: speech signals

766 Performance Evaluation of Music and Minimum Norm Eigenvector Algorithms in Resolving Noisy Multiexponential Signals

Authors: Abdussamad U. Jibia, Momoh-Jimoh E. Salami

Abstract:

Eigenvector methods are gaining increasing acceptance in the area of spectrum estimation. This paper presents a successful attempt at testing and evaluating the performance of two of the most popular types of subspace techniques in determining the parameters of multiexponential signals with real decay constants buried in noise. In particular, MUSIC (Multiple Signal Classification) and minimum-norm techniques are examined. It is shown that these methods perform almost equally well on multiexponential signals with MUSIC displaying better defined peaks.

Keywords: Eigenvector, minimum norm, multiexponential, subspace.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1738

765 Accent Identification by Clustering and Scoring Formants

Authors: Dejan Stantic, Jun Jo

Abstract:

There have been significant improvements in automatic voice recognition technology. However, existing systems still face difficulties, particularly when used by non-native speakers with accents. In this paper we address a problem of identifying the English accented speech of speakers from different backgrounds. Once an accent is identified the speech recognition software can utilise training set from appropriate accent and therefore improve the efficiency and accuracy of the speech recognition system. We introduced the Q factor, which is defined by the sum of relationships between frequencies of the formants. Four different accents were considered and experimented for this research. A scoring method was introduced in order to effectively analyse accents. The proposed concept indicates that the accent could be identified by analysing their formants.

Keywords: Accent Identification, Formants, Q Factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2089

764 Delineating Students’ Speaking Anxieties and Assessment Gaps in Online Speech Performances

Authors: Mary Jane B. Suarez

Abstract:

Speech anxiety is innumerable in any traditional communication classes especially for ESL students. The speech anxiety intensifies when communication skills assessments have taken its toll in an online mode of learning due to the perils of the COVID-19 virus. Teachers and students have experienced vast ambiguity on how to realize a still effective way to teach and learn various speaking skills amidst the pandemic. This mixed method study determined the factors that affected the public speaking skills of students in online performances, delineated the assessment gaps in assessing speaking skills in an online setup, and recommended ways to address students’ speech anxieties. Using convergent parallel design, quantitative data were gathered by examining the desired learning competencies of the English course including a review of the teacher’s class record to analyze how students’ performances reflected a significantly high level of anxiety in online speech delivery. Focus group discussion was also conducted for qualitative data describing students’ public speaking anxiety and assessment gaps. Results showed a significantly high level of students’ speech anxiety affected by time constraints, use of technology, lack of audience response, being conscious of making mistakes, and the use of English as a second language. The study presented recommendations to redesign curricular assessments of English teachers and to have a robust diagnosis of students’ speaking anxiety to better cater to the needs of learners in attempt to bridge any gaps in cultivating public speaking skills of students as educational institutions segue from the pandemic to the post-pandemic milieu.

Keywords: Blended learning, communication skills assessment, online speech delivery, public speaking anxiety, speech anxiety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178

763 Multi Switched Split Vector Quantizer

Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha

Abstract:

Vector quantization is a powerful tool for speech coding applications. This paper deals with LPC Coding of speech signals which uses a new technique called Multi Switched Split Vector Quantization, This is a hybrid of two product code vector quantization techniques namely the Multi stage vector quantization technique, and Switched split vector quantization technique,. Multi Switched Split Vector Quantization technique quantizes the linear predictive coefficients in terms of line spectral frequencies. From results it is proved that Multi Switched Split Vector Quantization provides better trade off between bitrate and spectral distortion performance, computational complexity and memory requirements when compared to Switched Split Vector Quantization, Multi stage vector quantization, and Split Vector Quantization techniques. By employing the switching technique at each stage of the vector quantizer the spectral distortion, computational complexity and memory requirements were greatly reduced. Spectral distortion was measured in dB, Computational complexity was measured in floating point operations (flops), and memory requirements was measured in (floats).

Keywords: Unconstrained vector quantization, Linear predictiveCoding, Split vector quantization, Multi stage vector quantization, Switched Split vector quantization, Line Spectral Frequencies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1741

762 Environmentally Adaptive Acoustic Echo Suppression for Barge-in Speech Recognition

Authors: Jong Han Joo, Jeong Hun Lee, Young Sun Kim, Jae Young Kang, Seung Ho Choi

Abstract:

In this study, we propose a novel technique for acoustic echo suppression (AES) during speech recognition under barge-in conditions. Conventional AES methods based on spectral subtraction apply fixed weights to the estimated echo path transfer function (EPTF) at the current signal segment and to the EPTF estimated until the previous time interval. However, the effects of echo path changes should be considered for eliminating the undesired echoes. We describe a new approach that adaptively updates weight parameters in response to abrupt changes in the acoustic environment due to background noises or double-talk. Furthermore, we devised a voice activity detector and an initial time-delay estimator for barge-in speech recognition in communication networks. The initial time delay is estimated using log-spectral distance measure, as well as cross-correlation coefficients. The experimental results show that the developed techniques can be successfully applied in barge-in speech recognition systems.

Keywords: Acoustic echo suppression, barge-in, speech recognition, echo path transfer function, initial delay estimator, voice activity detector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2317

761 Computing Fractal Dimension of Signals using Multiresolution Box-counting Method

Authors: B. S. Raghavendra, D. Narayana Dutt

Abstract:

In this paper, we have developed a method to compute fractal dimension (FD) of discrete time signals, in the time domain, by modifying the box-counting method. The size of the box is dependent on the sampling frequency of the signal. The number of boxes required to completely cover the signal are obtained at multiple time resolutions. The time resolutions are made coarse by decimating the signal. The loglog plot of total number of boxes required to cover the curve versus size of the box used appears to be a straight line, whose slope is taken as an estimate of FD of the signal. The results are provided to demonstrate the performance of the proposed method using parametric fractal signals. The estimation accuracy of the method is compared with that of Katz, Sevcik, and Higuchi methods. In addition, some properties of the FD are discussed.

Keywords: Box-counting, Fractal dimension, Higuchi method, Katz method, Parametric fractal signals, Sevcik method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4594

760 A Side-Peak Cancellation Scheme for CBOC Code Acquisition

Authors: Youngpo Lee, Seokho Yoon

Abstract:

In this paper, we propose a side-peak cancellation scheme for code acquisition of composite binary offset carrier (CBOC) signals. We first model the family of CBOC signals in a generic form, and then, propose a side-peak cancellation scheme by combining correlation functions between the divided sub-carrier and received signals. From numerical results, it is shown that the proposed scheme removes the side-peak completely, and moreover, the resulting correlation function demonstrates the better power ratio performance than the CBOC autocorrelation.

Keywords: CBOC, side-peak, ambiguity problem, synchronization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1740

759 An Approach of the Inverter Voltage Used for the Linear Machine with Multi Air-Gap Structure

Authors: Pierre Kenfack

Abstract:

In this paper we present a contribution for the modelling and control of the inverter voltage of a permanent magnet linear generator with multi air-gap structure. The time domain control method is based on instant comparison of reference signals, in the form of current or voltage, with actual or measured signals. The reference current or voltage must be kept close to the actual signal with a reasonable tolerance. In this work, the time domain control method is used to control tracking signals. The performance evaluation concerns the continuation of reference signal. Simulations validate very well the tracking of reference variables (current, voltage) by measured or actual signals. All is simulated and presented under PSIM Software to show the performance and robustness of the proposed controller.

Keywords: Control, permanent magnet, linear machine, multi air-gap structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 582

758 Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System

Authors: Yueming Wang, Rendong Ying, Sumxin Jiang, Peilin Liu

Abstract:

Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.

Keywords: ID3 Decision Tree, MFCC, Orchestra/Percussion Classification, USAC

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673

757 Wave Atom Transform Based Two Class Motor Imagery Classification

Authors: Nebi Gedik

Abstract:

Electroencephalography (EEG) investigations of the brain computer interfaces are based on the electrical signals resulting from neural activities in the brain. In this paper, it is offered a method for classifying motor imagery EEG signals. The suggested method classifies EEG signals into two classes using the wave atom transform, and the transform coefficients are assessed, creating the feature set. Classification is done with SVM and k-NN algorithms with and without feature selection. For feature selection t-test approaches are utilized. A test of the approach is performed on the BCI competition III dataset IIIa.

Keywords: motor imagery, EEG, wave atom transform, SVM, k-NN, t-test

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 492

756 A High Quality Speech Coder at 600 bps

Authors: Yong Zhang, Ruimin Hu

Abstract:

This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.

Keywords: Speech coding, Vector quantization, linear predicition, Mixed sinusoidal excitation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2188

755 Recovery of Missing Samples in Multi-channel Oversampling of Multi-banded Signals

Authors: J. M. Kim, K. H. Kwon

Abstract:

We show that in a two-channel sampling series expansion of band-pass signals, any finitely many missing samples can always be recovered via oversampling in a larger band-pass region. We also obtain an analogous result for multi-channel oversampling of harmonic signals.

Keywords: oversampling, multi-channel sampling, recovery of missing samples, band-pass signal, harmonic signal

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1283

754 Signal Reconstruction Using Cepstrum of Higher Order Statistics

Authors: Adnan Al-Smadi, Mahmoud Smadi

Abstract:

This paper presents an algorithm for reconstructing phase and magnitude responses of the impulse response when only the output data are available. The system is driven by a zero-mean independent identically distributed (i.i.d) non-Gaussian sequence that is not observed. The additive noise is assumed to be Gaussian. This is an important and essential problem in many practical applications of various science and engineering areas such as biomedical, seismic, and speech processing signals. The method is based on evaluating the bicepstrum of the third-order statistics of the observed output data. Simulations results are presented that demonstrate the performance of this method.

Keywords: Cepstrum, bicepstrum, third order statistics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2037

753 Applying Similarity Theory and Hilbert Huang Transform for Estimating the Differences of Pig-s Blood Pressure Signals between Situations of Intestinal Artery Blocking and Unblocking

Authors: Jia-Rong Yeh, Tzu-Yu Lin, Jiann-Shing Shieh, Yun Chen

Abstract:

A mammal-s body can be seen as a blood vessel with complex tunnels. When heart pumps blood periodically, blood runs through blood vessels and rebounds from walls of blood vessels. Blood pressure signals can be measured with complex but periodic patterns. When an artery is clamped during a surgical operation, the spectrum of blood pressure signals will be different from that of normal situation. In this investigation, intestinal artery clamping operations were conducted to a pig for simulating the situation of intestinal blocking during a surgical operation. Similarity theory is a convenient and easy tool to prove that patterns of blood pressure signals of intestinal artery blocking and unblocking are surely different. And, the algorithm of Hilbert Huang Transform can be applied to extract the character parameters of blood pressure pattern. In conclusion, the patterns of blood pressure signals of two different situations, intestinal artery blocking and unblocking, can be distinguished by these character parameters defined in this paper.

Keywords: Blood pressure, spectrum, intestinal artery, similarity theory and Hilbert Huang Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625

752 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Keywords: Brain balancing, kNN, power spectral density, 3D EEG model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2628

751 Classification of Non Stationary Signals Using Ben Wavelet and Artificial Neural Networks

Authors: Mohammed Benbrahim, Khalid Benjelloun, Aomar Ibenbrahim, Adil Daoudi

Abstract:

The automatic classification of non stationary signals is an important practical goal in several domains. An essential classification task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, we present a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "Ben wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.

Keywords: Seismic signals, Ben Wavelet, Dimensionality reduction, Artificial neural networks, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1450

750 Detection Characteristics of the Random and Deterministic Signals in Antenna Arrays

Authors: Olesya Bolkhovskaya, Alexey Davydov, Alexander Maltsev

Abstract:

In this paper, approach to incoherent signal detection in multi-element antenna array are researched and modeled. Two types of useful signals with unknown wavefront were considered: first one, deterministic (Barker code), and second one, random (Gaussian distribution). The derivation of the sufficient statistics took into account the linearity of the antenna array. The performance characteristics and detecting curves are modeled and compared for different useful signals parameters and for different number of elements of the antenna array. Results of researches in case of some additional conditions can be applied to a digital communications systems.

Keywords: Antenna array, detection curves, performance characteristics, quadrature processing, signal detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1774

749 Quality-Controlled Compression Method using Wavelet Transform for Electrocardiogram Signals

Authors: Redha Benzid, Farid Marir, Nour-Eddine Bouguechal

Abstract:

This paper presents a new Quality-Controlled, wavelet based, compression method for electrocardiogram (ECG) signals. Initially, an ECG signal is decomposed using the wavelet transform. Then, the resulting coefficients are iteratively thresholded to guarantee that a predefined goal percent root mean square difference (GPRD) is matched within tolerable boundaries. The quantization strategy of extracted non-zero wavelet coefficients (NZWC), according to the combination of RLE, HUFFMAN and arithmetic encoding of the NZWC and a resulting look up table, allow the accomplishment of high compression ratios with good quality reconstructed signals.

Keywords: ECG compression, Non-uniform Max-Lloydquantizer, PRD, Quality-Controlled, Wavelet transform

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1747

748 The Statistical Properties of Filtered Signals

Authors: Ephraim Gower, Thato Tsalaile, Monageng Kgwadi, Malcolm Hawksford.

Abstract:

In this paper, the statistical properties of filtered or convolved signals are considered by deriving the resulting density functions as well as the exact mean and variance expressions given a prior knowledge about the statistics of the individual signals in the filtering or convolution process. It is shown that the density function after linear convolution is a mixture density, where the number of density components is equal to the number of observations of the shortest signal. For circular convolution, the observed samples are characterized by a single density function, which is a sum of products.

Keywords: Circular Convolution, linear Convolution, mixture density function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1516

747 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses

Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas

Abstract:

We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.

Keywords: Transient noise pulses, noise reduction, dynamic time warping, speech recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1945

746 Time-Delay Estimation Using Cross-ΨB-Energy Operator

Authors: Z. Saidi, A.O. Boudraa, J.C. Cexus, S. Bourennane

Abstract:

In this paper, a new time-delay estimation technique based on the cross IB-energy operator [5] is introduced. This quadratic energy detector measures how much a signal is present in another one. The location of the peak of the energy operator, corresponding to the maximum of interaction between the two signals, is the estimate of the delay. The method is a fully data-driven approach. The discrete version of the continuous-time form of the cross IBenergy operator, for its implementation, is presented. The effectiveness of the proposed method is demonstrated on real underwater acoustic signals arriving from targets and the results compared to the cross-correlation method.

Keywords: Teager-Kaiser energy operator, Cross-energyoperator, Time-Delay, Underwater acoustic signals.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5647

745 Unit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis

Authors: Mohamed Ali KAMMOUN, Ahmed Ben HAMIDA

Abstract:

In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection. In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost. In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora.

Keywords: Unit selection, Corpus-based Speech Synthesis, Bigram model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441

744 Puff Noise Detection and Cancellation for Robust Speech Recognition

Authors: Sangjun Park, Jungpyo Hong, Byung-Ok Kang, Yun-keun Lee, Minsoo Hahn

Abstract:

In this paper, an algorithm for detecting and attenuating puff noises frequently generated under the mobile environment is proposed. As a baseline system, puff detection system is designed based on Gaussian Mixture Model (GMM), and 39th Mel Frequency Cepstral Coefficient (MFCC) is extracted as feature parameters. To improve the detection performance, effective acoustic features for puff detection are proposed. In addition, detected puff intervals are attenuated by high-pass filtering. The speech recognition rate was measured for evaluation and confusion matrix and ROC curve are used to confirm the validity of the proposed system.

Keywords: Gaussian mixture model, puff detection and cancellation, speech enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2233

743 The Estimation of Human Vital Signs Complexity

Authors: L. Bikulciene, E. Venskaityte, G. Jarusevicius

Abstract:

Nonstationary and nonlinear signals generated by living complex systems defy traditional mechanistic approaches, which are based on homeostasis. Previous our studies have shown that the evaluation of the interactions of physiological signals by using special analysis methods is suitable for observation of physiological processes. It is demonstrated the possibility of using deep physiological model, based on the interpretation of the changes of the human body’s functional states combined with an application of the analytical method based on matrix theory for the physiological signals analysis, which was applied on high risk cardiac patients. It is shown that evaluation of cardiac signals interactions show peculiar for each individual functional changes at the onset of hemodynamic restoration procedure. Therefore, we suggest that the alterations of functional state of the body, after patients overcome surgery can be complemented by the data received from the suggested approach of the evaluation of functional variables’ interactions.

Keywords: Cardiac diseases, Complex systems theory, ECG analysis, matrix analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2247

742 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: Bilingual, children who stutter, children with language impairment, Hidden Markov Models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1029

741 An Eigen-Approach for Estimating the Direction-of Arrival of Unknown Number of Signals

Authors: Dia I. Abu-Al-Nadi, M. J. Mismar, T. H. Ismail

Abstract:

A technique for estimating the direction-of-arrival (DOA) of unknown number of source signals is presented using the eigen-approach. The eigenvector corresponding to the minimum eigenvalue of the autocorrelation matrix yields the minimum output power of the array. Also, the array polynomial with this eigenvector possesses roots on the unit circle. Therefore, the pseudo-spectrum is found by perturbing the phases of the roots one by one and calculating the corresponding array output power. The results indicate that the DOAs and the number of source signals are estimated accurately in the presence of a wide range of input noise levels.

Keywords: Array signal processing, direction-of-arrival, antenna arrays, eigenvalues, eigenvectors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1378

740 Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition

Authors: Wernhuar Tarng, Yuan-Yuan Chen, Chien-Lung Li, Kun-Rong Hsie, Mingteh Chen

Abstract:

An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.

Keywords: Smart phones, emotional speech recognition, socialnetworks, support vector machines, time-frequency parameter, Mel-scale frequency cepstral coefficients (MFCC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1842

739 A Dictionary Learning Method Based On EMD for Audio Sparse Representation

Authors: Yueming Wang, Zenghui Zhang, Rendong Ying, Peilin Liu

Abstract:

Sparse representation has long been studied and several dictionary learning methods have been proposed. The dictionary learning methods are widely used because they are adaptive. In this paper, a new dictionary learning method for audio is proposed. Signals are at first decomposed into different degrees of Intrinsic Mode Functions (IMF) using Empirical Mode Decomposition (EMD) technique. Then these IMFs form a learned dictionary. To reduce the size of the dictionary, the K-means method is applied to the dictionary to generate a K-EMD dictionary. Compared to K-SVD algorithm, the K-EMD dictionary decomposes audio signals into structured components, thus the sparsity of the representation is increased by 34.4% and the SNR of the recovered audio signals is increased by 20.9%.

Keywords: Dictionary Learning, EMD, K-means Method, Sparse Representation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2628

738 Various Information Obtained from Acoustic Emissions Owing to Discharges in XLPE Cable

Authors: Tatsuya Sakoda, Yuta Nakamura, Junichiro Kitajima, Masaki Sugiura, Satoshi Kurihara, Kenji Baba, Koichiro Kaneko, Takayoshi Yarimitsu

Abstract:

An acoustic emission (AE) technique is useful for detection of partial discharges (PDs) at a joint and a terminal section of a cross-linked polyethylene (XLPE) cable. For AE technique, it is not difficult to detect a PD using AE sensors. However, it is difficult to grasp whether the detected AE signal is owing to a single discharge or not. Additionally, when an AE technique is applied at a terminal section of a XLPE cable in salt pollution district, for example, there is possibility of detection of AE signals owing to creeping discharges on the surface of electric power apparatus. In this study, we evaluated AE signals in order to grasp what kind of information we can get from detected AE signals. The results showed that envelop detection of AE signal and a period which some AE signals were continuously detected were good indexes for estimating state-of-discharge.

Keywords: acoustic emission, creeping discharge, partial discharge, XLPE cable

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1644

737 Voice Disorders Identification Using Hybrid Approach: Wavelet Analysis and Multilayer Neural Networks

Authors: L. Salhi, M. Talbi, A. Cherif

Abstract:

This paper presents a new strategy of identification and classification of pathological voices using the hybrid method based on wavelet transform and neural networks. After speech acquisition from a patient, the speech signal is analysed in order to extract the acoustic parameters such as the pitch, the formants, Jitter, and shimmer. Obtained results will be compared to those normal and standard values thanks to a programmable database. Sounds are collected from normal people and patients, and then classified into two different categories. Speech data base is consists of several pathological and normal voices collected from the national hospital “Rabta-Tunis". Speech processing algorithm is conducted in a supervised mode for discrimination of normal and pathology voices and then for classification between neural and vocal pathologies (Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation results will be presented in function of the disease and will be compared with the clinical diagnosis in order to have an objective evaluation of the developed tool.

Keywords: Formants, Neural Networks, Pathological Voices, Pitch, Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2842