Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9

Search results for: Spectrogram

9 Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

Keywords: Time-frequency plane, atomic decomposition, envelope sampling, Gabor atoms, matching pursuit, sparse dictionary learning, sparse autoencoder.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 946
8 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Keywords: Brain balancing, kNN, power spectral density, 3D EEG model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2032
7 Characterization of 3D-MRP for Analyzing of Brain Balancing Index (BBI) Pattern

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

This paper discusses on power spectral density (PSD) characteristics which are extracted from three-dimensional (3D) electroencephalogram (EEG) models. The EEG signal recording was conducted on 150 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, the values of maximum PSD were extracted as features from the model. These features are analyzed using mean relative power (MRP) and different mean relative power (DMRP) technique to observe the pattern among different brain balancing indexes. The results showed that by implementing these techniques, the pattern of brain balancing indexes can be clearly observed. Some patterns are indicates between index 1 to index 5 for left frontal (LF) and right frontal (RF).

Keywords: Power spectral density, 3D EEG model, brain balancing, mean relative power, different mean relative power.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1513
6 A New Time-Frequency Speech Analysis Approach Based On Adaptive Fourier Decomposition

Authors: Liming Zhang

Abstract:

In this paper, a new adaptive Fourier decomposition (AFD) based time-frequency speech analysis approach is proposed. Given the fact that the fundamental frequency of speech signals often undergo fluctuation, the classical short-time Fourier transform (STFT) based spectrogram analysis suffers from the difficulty of window size selection. AFD is a newly developed signal decomposition theory. It is designed to deal with time-varying non-stationary signals. Its outstanding characteristic is to provide instantaneous frequency for each decomposed component, so the time-frequency analysis becomes easier. Experiments are conducted based on the sample sentence in TIMIT Acoustic-Phonetic Continuous Speech Corpus. The results show that the AFD based time-frequency distribution outperforms the STFT based one.

Keywords: Adaptive fourier decomposition, instantaneous frequency, speech analysis, time-frequency distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1307
5 Multiclass Support Vector Machines for Environmental Sounds Classification Using log-Gabor Filters

Authors: S. Souli, Z. Lachiri

Abstract:

In this paper we propose a robust environmental sound classification approach, based on spectrograms features driven from log-Gabor filters. This approach includes two methods. In the first methods, the spectrograms are passed through an appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The second method uses the same steps but applied only to three patches extracted from each spectrogram.

To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %.

Keywords: Environmental sounds, Log-Gabor filters, Spectrogram, SVM Multiclass, Visual features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1403
4 Study on Performance of Wigner Ville Distribution for Linear FM and Transient Signal Analysis

Authors: Azeemsha Thacham Poyil, Nasimudeen KM

Abstract:

This research paper presents some methods to assess the performance of Wigner Ville Distribution for Time-Frequency representation of non-stationary signals, in comparison with the other representations like STFT, Spectrogram etc. The simultaneous timefrequency resolution of WVD is one of the important properties which makes it preferable for analysis and detection of linear FM and transient signals. There are two algorithms proposed here to assess the resolution and to compare the performance of signal detection. First method is based on the measurement of area under timefrequency plot; in case of a linear FM signal analysis. A second method is based on the instantaneous power calculation and is used in case of transient, non-stationary signals. The implementation is explained briefly for both methods with suitable diagrams. The accuracy of the measurements is validated to show the better performance of WVD representation in comparison with STFT and Spectrograms.

Keywords: WVD: Wigner Ville Distribution, STFT: Short Time Fourier Transform, FT: Fourier Transform, TFR: Time-Frequency Representation, FM: Frequency Modulation, LFM Signal: Linear FM Signal, JTFA: Joint time frequency analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2034
3 A Study of Visual Attention in Diagnosing Cerebellar Tumours

Authors: Kuryati Kipli, Kasumawati Lias, Dayang Azra Awang Mat, Al-Khalid Othman, Ade Syaheda Wani Marzuki, Nurdiani Zamhari

Abstract:

Visual attention allows user to select the most relevant information to ongoing behaviour. This paper presents a study on; i) the performance of people measurements, ii) accurateness of people measurement of the peaks that correspond to chemical quantities from the Magnetic Resonance Spectroscopy (MRS) graphs and iii) affects of people measurements to the algorithm-based diagnosis. Participant-s eye-movement was recorded using eye-tracker tool (Eyelink II). This experiment involves three participants for examining 20 MRS graphs to estimate the peaks of chemical quantities which indicate the abnormalities associated with Cerebellar Tumours (CT). The status of each MRS is verified by using decision algorithm. Analysis involves determination of humans-s eye movement pattern in measuring the peak of spectrograms, scan path and determining the relationship of distributions of fixation durations with the accuracy of measurement. In particular, the eye-tracking data revealed which aspects of the spectrogram received more visual attention and in what order they were viewed. This preliminary investigation provides a proof of concept for use of the eye tracking technology as the basis for expanded CT diagnosis.

Keywords: eye tracking, fixation durations, pattern, scan paths, spectrograms, visual.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1085
2 SySRA: A System of a Continuous Speech Recognition in Arab Language

Authors: Samir Abdelhamid, Noureddine Bouguechal

Abstract:

We report in this paper the model adopted by our system of continuous speech recognition in Arab language SySRA and the results obtained until now. This system uses the database Arabdic-10 which is a corpus of word for the Arab language and which was manually segmented. Phonetic decoding is represented by an expert system where the knowledge base is translated in the form of production rules. This expert system transforms a vocal signal into a phonetic lattice. The higher level of the system takes care of the recognition of the lattice thus obtained by deferring it in the form of written sentences (orthographical Form). This level contains initially the lexical analyzer which is not other than the module of recognition. We subjected this analyzer to a set of spectrograms obtained by dictating a score of sentences in Arab language. The rate of recognition of these sentences is about 70% which is, to our knowledge, the best result for the recognition of the Arab language. The test set consists of twenty sentences from four speakers not having taken part in the training.

Keywords: Continuous speech recognition, lexical analyzer, phonetic decoding, phonetic lattice, vocal signal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1023
1 Sperm Whale Signal Analysis: Comparison using the Auto Regressive model and the Daubechies 15 Wavelets Transform

Authors: Olivier Adam, Maciej Lopatka, Christophe Laplanche, Jean-Fran├žois Motsch

Abstract:

This article presents the results using a parametric approach and a Wavelet Transform in analysing signals emitting from the sperm whale. The extraction of intrinsic characteristics of these unique signals emitted by marine mammals is still at present a difficult exercise for various reasons: firstly, it concerns non-stationary signals, and secondly, these signals are obstructed by interfering background noise. In this article, we compare the advantages and disadvantages of both methods: Auto Regressive models and Wavelet Transform. These approaches serve as an alternative to the commonly used estimators which are based on the Fourier Transform for which the hypotheses necessary for its application are in certain cases, not sufficiently proven. These modern approaches provide effective results particularly for the periodic tracking of the signal's characteristics and notably when the signal-to-noise ratio negatively effects signal tracking. Our objectives are twofold. Our first goal is to identify the animal through its acoustic signature. This includes recognition of the marine mammal species and ultimately of the individual animal (within the species). The second is much more ambitious and directly involves the intervention of cetologists to study the sounds emitted by marine mammals in an effort to characterize their behaviour. We are working on an approach based on the recordings of marine mammal signals and the findings from this data result from the Wavelet Transform. This article will explore the reasons for using this approach. In addition, thanks to the use of new processors, these algorithms once heavy in calculation time can be integrated in a real-time system.

Keywords: Autoregressive model, Daubechies Wavelet, Fourier Transform, marine mammals, signal processing, spectrogram, sperm whale, Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1662