Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26

Search results for: MFCC

26 Neural Network Based Speech to Text in Malay Language

Authors: H. F. A. Abdul Ghani, R. R. Porle

Abstract:

Speech to text in Malay language is a system that converts Malay speech into text. The Malay language recognition system is still limited, thus, this paper aims to investigate the performance of ten Malay words obtained from the online Malay news. The methodology consists of three stages, which are preprocessing, feature extraction, and speech classification. In preprocessing stage, the speech samples are filtered using pre emphasis. After that, feature extraction method is applied to the samples using Mel Frequency Cepstrum Coefficient (MFCC). Lastly, speech classification is performed using Feedforward Neural Network (FFNN). The accuracy of the classification is further investigated based on the hidden layer size. From experimentation, the classifier with 40 hidden neurons shows the highest classification rate which is 94%.  

Keywords: Feed-Forward Neural Network, FFNN, Malay speech recognition, Mel Frequency Cepstrum Coefficient, MFCC, speech-to-text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22
25 The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition

Authors: Fawaz S. Al-Anzi, Dia AbuZeina

Abstract:

Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature extraction process, which parameterizes the speech signal to produce the corresponding feature vectors. Feature extraction process aims at approximating the linguistic content that is conveyed by the input speech signal. In speech processing field, there are several methods to extract speech features, however, Mel Frequency Cepstral Coefficients (MFCC) is the popular technique. It has been long observed that the MFCC is dominantly used in the well-known recognizers such as the Carnegie Mellon University (CMU) Sphinx and the Markov Model Toolkit (HTK). Hence, this paper focuses on the MFCC method as the standard choice to identify the different speech segments in order to obtain the language phonemes for further training and decoding steps. Due to MFCC good performance, the previous studies show that the MFCC dominates the Arabic ASR research. In this paper, we demonstrate MFCC as well as the intermediate steps that are performed to get these coefficients using the HTK toolkit.

Keywords: Speech recognition, acoustic features, Mel Frequency Cepstral Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1304
24 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach

Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik

Abstract:

We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.

Keywords: Noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 526
23 A Non-Parametric Based Mapping Algorithm for Use in Audio Fingerprinting

Authors: Analise Borg, Paul Micallef

Abstract:

Over the past few years, the online multimedia collection has grown at a fast pace. Several companies showed interest to study the different ways to organise the amount of audio information without the need of human intervention to generate metadata. In the past few years, many applications have emerged on the market which are capable of identifying a piece of music in a short time. Different audio effects and degradation make it much harder to identify the unknown piece. In this paper, an audio fingerprinting system which makes use of a non-parametric based algorithm is presented. Parametric analysis is also performed using Gaussian Mixture Models (GMMs). The feature extraction methods employed are the Mel Spectrum Coefficients and the MPEG-7 basic descriptors. Bin numbers replaced the extracted feature coefficients during the non-parametric modelling. The results show that nonparametric analysis offer potential results as the ones mentioned in the literature.

Keywords: Audio fingerprinting, mapping algorithm, Gaussian Mixture Models, MFCC, MPEG-7.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1936
22 An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System

Authors: Cheima Ben Soltane, Ittansa Yonas Kelbesa

Abstract:

Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.

Keywords: Feature Extraction, Speaker Modeling, Feature Matching, Mel Frequency Cepstrum Coefficient (MFCC), Gaussian mixture model (GMM), Vector Quantization (VQ), Linde-Buzo-Gray (LBG), Expectation Maximization (EM), pre-processing, Voice Activity Detection (VAD), Short Time Energy (STE), Background Noise Statistical Modeling, Closed-Set Tex-Independent Speaker Identification System (CISI).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1265
21 Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis

Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze

Abstract:

The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.

Keywords: Auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1994
20 Analysis of Combined Use of NN and MFCC for Speech Recognition

Authors: Safdar Tanweer, Abdul Mobin, Afshar Alam

Abstract:

The performance and analysis of speech recognition system is illustrated in this paper. An approach to recognize the English word corresponding to digit (0-9) spoken by 2 different speakers is captured in noise free environment. For feature extraction, speech Mel frequency cepstral coefficients (MFCC) has been used which gives a set of feature vectors from recorded speech samples. Neural network model is used to enhance the recognition performance. Feed forward neural network with back propagation algorithm model is used. However other speech recognition techniques such as HMM, DTW exist. All experiments are carried out on Matlab.

Keywords: Speech Recognition, MFCC, Neural Network, classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2755
19 Evaluation of Features Extraction Algorithms for a Real-Time Isolated Word Recognition System

Authors: Tomyslav Sledevič, Artūras Serackis, Gintautas Tamulevičius, Dalius Navakauskas

Abstract:

Paper presents an comparative evaluation of features extraction algorithm for a real-time isolated word recognition system based on FPGA. The Mel-frequency cepstral, linear frequency cepstral, linear predictive and their cepstral coefficients were implemented in hardware/software design. The proposed system was investigated in speaker dependent mode for 100 different Lithuanian words. The robustness of features extraction algorithms was tested recognizing the speech records at different signal to noise rates. The experiments on clean records show highest accuracy for Mel-frequency cepstral and linear frequency cepstral coefficients. For records with 15 dB signal to noise rate the linear predictive cepstral coefficients gives best result. The hard and soft part of the system is clocked on 50 MHz and 100 MHz accordingly. For the classification purpose the pipelined dynamic time warping core was implemented. The proposed word recognition system satisfy the real-time requirements and is suitable for applications in embedded systems.

Keywords: Isolated word recognition, features extraction, MFCC, LFCC, LPCC, LPC, FPGA, DTW.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3176
18 Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System

Authors: Yueming Wang, Rendong Ying, Sumxin Jiang, Peilin Liu

Abstract:

Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.

Keywords: ID3 Decision Tree, MFCC, Orchestra/Percussion Classification, USAC

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1325
17 Echo State Networks for Arabic Phoneme Recognition

Authors: Nadia Hmad, Tony Allen

Abstract:

This paper presents an ESN-based Arabic phoneme recognition system trained with supervised, forced and combined supervised/forced supervised learning algorithms. Mel-Frequency Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC) techniques are used and compared as the input feature extraction technique. The system is evaluated using 6 speakers from the King Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia dialectic and 34 speakers from the Center for Spoken Language Understanding (CSLU2002) database of speakers with different dialectics from 12 Arabic countries. Results for the KAPD and CSLU2002 Arabic databases show phoneme recognition performances of 72.31% and 38.20% respectively.

Keywords: Arabic phonemes recognition, echo state networks (ESNs), neural networks (NNs), supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2064
16 Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language

Authors: Khalid A. Darabkh, Ala F. Khalifeh, Baraa A. Bathech, Saed W. Sabah

Abstract:

Despite the fact that Arabic language is currently one of the most common languages worldwide, there has been only a little research on Arabic speech recognition relative to other languages such as English and Japanese. Generally, digital speech processing and voice recognition algorithms are of special importance for designing efficient, accurate, as well as fast automatic speech recognition systems. However, the speech recognition process carried out in this paper is divided into three stages as follows: firstly, the signal is preprocessed to reduce noise effects. After that, the signal is digitized and hearingized. Consequently, the voice activity regions are segmented using voice activity detection (VAD) algorithm. Secondly, features are extracted from the speech signal using Mel-frequency cepstral coefficients (MFCC) algorithm. Moreover, delta and acceleration (delta-delta) coefficients have been added for the reason of improving the recognition accuracy. Finally, each test word-s features are compared to the training database using dynamic time warping (DTW) algorithm. Utilizing the best set up made for all affected parameters to the aforementioned techniques, the proposed system achieved a recognition rate of about 98.5% which outperformed other HMM and ANN-based approaches available in the literature.

Keywords: Arabic speech recognition, MFCC, DTW, VAD.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3717
15 Comparison of Parameterization Methods in Recognizing Spoken Arabic Digits

Authors: Ali Ganoun

Abstract:

This paper proposes evaluation of sound parameterization methods in recognizing some spoken Arabic words, namely digits from zero to nine. Each isolated spoken word is represented by a single template based on a specific recognition feature, and the recognition is based on the Euclidean distance from those templates. The performance analysis of recognition is based on four parameterization features: the Burg Spectrum Analysis, the Walsh Spectrum Analysis, the Thomson Multitaper Spectrum Analysis and the Mel Frequency Cepstral Coefficients (MFCC) features. The main aim of this paper was to compare, analyze, and discuss the outcomes of spoken Arabic digits recognition systems based on the selected recognition features. The results acqired confirm that the use of MFCC features is a very promising method in recognizing Spoken Arabic digits.

Keywords: Speech Recognition, Spectrum Analysis, Burg Spectrum, Walsh Spectrum Analysis, Thomson Multitaper Spectrum, MFCC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1180
14 Puff Noise Detection and Cancellation for Robust Speech Recognition

Authors: Sangjun Park, Jungpyo Hong, Byung-Ok Kang, Yun-keun Lee, Minsoo Hahn

Abstract:

In this paper, an algorithm for detecting and attenuating puff noises frequently generated under the mobile environment is proposed. As a baseline system, puff detection system is designed based on Gaussian Mixture Model (GMM), and 39th Mel Frequency Cepstral Coefficient (MFCC) is extracted as feature parameters. To improve the detection performance, effective acoustic features for puff detection are proposed. In addition, detected puff intervals are attenuated by high-pass filtering. The speech recognition rate was measured for evaluation and confusion matrix and ROC curve are used to confirm the validity of the proposed system.

Keywords: Gaussian mixture model, puff detection and cancellation, speech enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1814
13 Musical Instrument Classification Using Embedded Hidden Markov Models

Authors: Ehsan Amid, Sina Rezaei Aghdam

Abstract:

In this paper, a novel method for recognition of musical instruments in a polyphonic music is presented by using an embedded hidden Markov model (EHMM). EHMM is a doubly embedded HMM structure where each state of the external HMM is an independent HMM. The classification is accomplished for two different internal HMM structures where GMMs are used as likelihood estimators for the internal HMMs. The results are compared to those achieved by an artificial neural network with two hidden layers. Appropriate classification accuracies were achieved both for solo instrument performance and instrument combinations which demonstrates that the new approach outperforms the similar classification methods by means of the dynamic of the signal.

Keywords: hidden Markov model (HMM), embedded hidden Markov models (EHMM), MFCC, musical instrument.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1525
12 Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition

Authors: Wernhuar Tarng, Yuan-Yuan Chen, Chien-Lung Li, Kun-Rong Hsie, Mingteh Chen

Abstract:

An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.

Keywords: Smart phones, emotional speech recognition, socialnetworks, support vector machines, time-frequency parameter, Mel-scale frequency cepstral coefficients (MFCC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1498
11 Comparison of MFCC and Cepstral Coefficients as a Feature Set for PCG Biometric Systems

Authors: Justin Leo Cheang Loong, Khazaimatol S Subari, Muhammad Kamil Abdullah, Nurul Nadia Ahmad, RosliBesar

Abstract:

Heart sound is an acoustic signal and many techniques used nowadays for human recognition tasks borrow speech recognition techniques. One popular choice for feature extraction of accoustic signals is the Mel Frequency Cepstral Coefficients (MFCC) which maps the signal onto a non-linear Mel-Scale that mimics the human hearing. However the Mel-Scale is almost linear in the frequency region of heart sounds and thus should produce similar results with the standard cepstral coefficients (CC). In this paper, MFCC is investigated to see if it produces superior results for PCG based human identification system compared to CC. Results show that the MFCC system is still superior to CC despite linear filter-banks in the lower frequency range, giving up to 95% correct recognition rate for MFCC and 90% for CC. Further experiments show that the high recognition rate is due to the implementation of filter-banks and not from Mel-Scaling.

Keywords: Biometric, Phonocardiogram, Cepstral Coefficients, Mel Frequency

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3143
10 Speaker Identification Using Admissible Wavelet Packet Based Decomposition

Authors: Mangesh S. Deshpande, Raghunath S. Holambe

Abstract:

Mel Frequency Cepstral Coefficient (MFCC) features are widely used as acoustic features for speech recognition as well as speaker recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable for speaker features that are located in high frequency regions. The speaker individual information, which is non-uniformly distributed in the high frequencies, is equally important for speaker recognition. Based on this fact we proposed an admissible wavelet packet based filter structure for speaker identification. Multiresolution capabilities of wavelet packet transform are used to derive the new features. The proposed scheme differs from previous wavelet based works, mainly in designing the filter structure. Unlike others, the proposed filter structure does not follow Mel scale. The closed-set speaker identification experiments performed on the TIMIT database shows improved identification performance compared to other commonly used Mel scale based filter structures using wavelets.

Keywords: Speaker identification, Wavelet transform, Feature extraction, MFCC, GMM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1579
9 Practical Method for Digital Music Matching Robust to Various Sound Qualities

Authors: Bokyung Sung, Jungsoo Kim, Jinman Kwun, Junhyung Park, Jihye Ryeo, Ilju Ko

Abstract:

In this paper, we propose a practical digital music matching system that is robust to variation in sound qualities. The proposed system is subdivided into two parts: client and server. The client part consists of the input, preprocessing and feature extraction modules. The preprocessing module, including the music onset module, revises the value gap occurring on the time axis between identical songs of different formats. The proposed method uses delta-grouped Mel frequency cepstral coefficients (MFCCs) to extract music features that are robust to changes in sound quality. According to the number of sound quality formats (SQFs) used, a music server is constructed with a feature database (FD) that contains different sub feature databases (SFDs). When the proposed system receives a music file, the selection module selects an appropriate SFD from a feature database; the selected SFD is subsequently used by the matching module. In this study, we used 3,000 queries for matching experiments in three cases with different FDs. In each case, we used 1,000 queries constructed by mixing 8 SQFs and 125 songs. The success rate of music matching improved from 88.6% when using single a single SFD to 93.2% when using quadruple SFDs. By this experiment, we proved that the proposed method is robust to various sound qualities.

Keywords: Digital Music, Music Matching, Variation in Sound Qualities, Robust Matching method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1062
8 Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter

Authors: Sandipan Chakroborty, Goutam Saha

Abstract:

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for speech related applications. On a recent contribution by authors, it has been shown that the Inverted Mel- Frequency Cepstral Coefficients (IMFCC) is useful feature set for SI, which contains complementary information present in high frequency region. This paper introduces the Gaussian shaped filter (GF) while calculating MFCC and IMFCC in place of typical triangular shaped bins. The objective is to introduce a higher amount of correlation between subband outputs. The performances of both MFCC & IMFCC improve with GF over conventional triangular filter (TF) based implementation, individually as well as in combination. With GMM as speaker modeling paradigm, the performances of proposed GF based MFCC and IMFCC in individual and fused mode have been verified in two standard databases YOHO, (Microphone Speech) and POLYCOST (Telephone Speech) each of which has more than 130 speakers.

Keywords: Gaussian Filter, Triangular Filter, Subbands, Correlation, MFCC, IMFCC, GMM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1943
7 Voice Command Recognition System Based on MFCC and VQ Algorithms

Authors: Mahdi Shaneh, Azizollah Taheri

Abstract:

The goal of this project is to design a system to recognition voice commands. Most of voice recognition systems contain two main modules as follow “feature extraction" and “feature matching". In this project, MFCC algorithm is used to simulate feature extraction module. Using this algorithm, the cepstral coefficients are calculated on mel frequency scale. VQ (vector quantization) method will be used for reduction of amount of data to decrease computation time. In the feature matching stage Euclidean distance is applied as similarity criterion. Because of high accuracy of used algorithms, the accuracy of this voice command system is high. Using these algorithms, by at least 5 times repetition for each command, in a single training session, and then twice in each testing session zero error rate in recognition of commands is achieved.

Keywords: MFCC, Vector quantization, Vocal tract, Voicecommand.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2621
6 Effective Digital Music Retrieval System through Content-based Features

Authors: Bokyung Sung, Kwanghyo Koo, Jungsoo Kim, Myung-Bum Jung, Jinman Kwon, Ilju Ko

Abstract:

In this paper, we propose effective system for digital music retrieval. We divided proposed system into Client and Server. Client part consists of pre-processing and Content-based feature extraction stages. In pre-processing stage, we minimized Time code Gap that is occurred among same music contents. As content-based feature, first-order differentiated MFCC were used. These presented approximately envelop of music feature sequences. Server part included Music Server and Music Matching stage. Extracted features from 1,000 digital music files were stored in Music Server. In Music Matching stage, we found retrieval result through similarity measure by DTW. In experiment, we used 450 queries. These were made by mixing different compression standards and sound qualities from 50 digital music files. Retrieval accurate indicated 97% and retrieval time was average 15ms in every single query. Out experiment proved that proposed system is effective in retrieve digital music and robust at various user environments of web.

Keywords: Music Retrieval, Content-based, Music Feature and Digital Music.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1096
5 Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

Authors: Sandipan Chakroborty, Anindya Roy, Goutam Saha

Abstract:

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.

Keywords: Complementary Information, Filter Bank, GMM, IMFCC, MFCC, Speaker Identification, Speaker Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1697
4 Automatic Detection of Syllable Repetition in Read Speech for Objective Assessment of Stuttered Disfluencies

Authors: K. M. Ravikumar, Balakrishna Reddy, R. Rajagopal, H. C. Nagaraj

Abstract:

Automatic detection of syllable repetition is one of the important parameter in assessing the stuttered speech objectively. The existing method which uses artificial neural network (ANN) requires high levels of agreement as prerequisite before attempting to train and test ANNs to separate fluent and nonfluent. We propose automatic detection method for syllable repetition in read speech for objective assessment of stuttered disfluencies which uses a novel approach and has four stages comprising of segmentation, feature extraction, score matching and decision logic. Feature extraction is implemented using well know Mel frequency Cepstra coefficient (MFCC). Score matching is done using Dynamic Time Warping (DTW) between the syllables. The Decision logic is implemented by Perceptron based on the score given by score matching. Although many methods are available for segmentation, in this paper it is done manually. Here the assessment by human judges on the read speech of 10 adults who stutter are described using corresponding method and the result was 83%.

Keywords: Assessment, DTW, MFCC, Objective, Perceptron, Stuttering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2302
3 Spectral Analysis of Speech: A New Technique

Authors: Neeta Awasthy, J.P.Saini, D.S.Chauhan

Abstract:

ICA which is generally used for blind source separation problem has been tested for feature extraction in Speech recognition system to replace the phoneme based approach of MFCC. Applying the Cepstral coefficients generated to ICA as preprocessing has developed a new signal processing approach. This gives much better results against MFCC and ICA separately, both for word and speaker recognition. The mixing matrix A is different before and after MFCC as expected. As Mel is a nonlinear scale. However, cepstrals generated from Linear Predictive Coefficient being independent prove to be the right candidate for ICA. Matlab is the tool used for all comparisons. The database used is samples of ISOLET.

Keywords: Cepstral Coefficient, Distance measures, Independent Component Analysis, Linear Predictive Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1528
2 Speaker Identification by Joint Statistical Characterization in the Log Gabor Wavelet Domain

Authors: Suman Senapati, Goutam Saha

Abstract:

Real world Speaker Identification (SI) application differs from ideal or laboratory conditions causing perturbations that leads to a mismatch between the training and testing environment and degrade the performance drastically. Many strategies have been adopted to cope with acoustical degradation; wavelet based Bayesian marginal model is one of them. But Bayesian marginal models cannot model the inter-scale statistical dependencies of different wavelet scales. Simple nonlinear estimators for wavelet based denoising assume that the wavelet coefficients in different scales are independent in nature. However wavelet coefficients have significant inter-scale dependency. This paper enhances this inter-scale dependency property by a Circularly Symmetric Probability Density Function (CS-PDF) related to the family of Spherically Invariant Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain and corresponding joint shrinkage estimator is derived by Maximum a Posteriori (MAP) estimator. A framework is proposed based on these to denoise speech signal for automatic speaker identification problems. The robustness of the proposed framework is tested for Text Independent Speaker Identification application on 100 speakers of POLYCOST and 100 speakers of YOHO speech database in three different noise environments. Experimental results show that the proposed estimator yields a higher improvement in identification accuracy compared to other estimators on popular Gaussian Mixture Model (GMM) based speaker model and Mel-Frequency Cepstral Coefficient (MFCC) features.

Keywords: Speaker Identification, Log Gabor Wavelet, Bayesian Bivariate Estimator, Circularly Symmetric Probability Density Function, SIRP.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1278
1 Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems

Authors: К. R. Aida–Zade, C. Ardil, S. S. Rustamov

Abstract:

Statement of the automatic speech recognition problem, the assignment of speech recognition and the application fields are shown in the paper. At the same time as Azerbaijan speech, the establishment principles of speech recognition system and the problems arising in the system are investigated. The computing algorithms of speech features, being the main part of speech recognition system, are analyzed. From this point of view, the determination algorithms of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features are developed. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested to improve the reliability of speech recognition system. To this end, the recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. This results in decrease of error rate during recognition. The training and recognition processes are realized by artificial neural networks in the automatic speech recognition system. The neural networks are trained by the conjugate gradient method. In the paper the problems observed by the number of speech features at training the neural networks of MFCC and LPC-based speech recognition subsystems are investigated. The variety of results of neural networks trained from different initial points in training process is analyzed. Methodology of combined use of neural networks trained from different initial points in speech recognition system is suggested to improve the reliability of recognition system and increase the recognition quality, and obtained practical results are shown.

Keywords: Speech recognition, cepstral analysis, Voice activation detection algorithm, Mel Frequency Cepstral Coefficients, features of speech, Cepstral Mean Subtraction, neural networks, Linear Predictive Coding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 450