Search results for: Speech segmentation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 566

Search results for: Speech segmentation

536 A Quantum-Inspired Evolutionary Algorithm forMultiobjective Image Segmentation

Authors: Hichem Talbi, Mohamed Batouche, Amer Draa

Abstract:

In this paper we present a new approach to deal with image segmentation. The fact that a single segmentation result do not generally allow a higher level process to take into account all the elements included in the image has motivated the consideration of image segmentation as a multiobjective optimization problem. The proposed algorithm adopts a split/merge strategy that uses the result of the k-means algorithm as input for a quantum evolutionary algorithm to establish a set of non-dominated solutions. The evaluation is made simultaneously according to two distinct features: intra-region homogeneity and inter-region heterogeneity. The experimentation of the new approach on natural images has proved its efficiency and usefulness.

Keywords: Image segmentation, multiobjective optimization, quantum computing, evolutionary algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2323
535 Automatic Segmentation of Thigh Magnetic Resonance Images

Authors: Lorena Urricelqui, Armando Malanda, Arantxa Villanueva

Abstract:

Purpose: To develop a method for automatic segmentation of adipose and muscular tissue in thighs from magnetic resonance images. Materials and methods: Thirty obese women were scanned on a Siemens Impact Expert 1T resonance machine. 1500 images were finally used in the tests. The developed segmentation method is a recursive and multilevel process that makes use of several concepts such as shaped histograms, adaptative thresholding and connectivity. The segmentation process was implemented in Matlab and operates without the need of any user interaction. The whole set of images were segmented with the developed method. An expert radiologist segmented the same set of images following a manual procedure with the aid of the SliceOmatic software (Tomovision). These constituted our 'goal standard'. Results: The number of coincidental pixels of the automatic and manual segmentation procedures was measured. The average results were above 90 % of success in most of the images. Conclusions: The proposed approach allows effective automatic segmentation of MRIs from thighs, comparable to expert manual performance.

Keywords: Segmentation, thigh, magnetic resonance image, fat, muscle.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870
534 Using Teager Energy Cepstrum and HMM distancesin Automatic Speech Recognition and Analysis of Unvoiced Speech

Authors: Panikos Heracleous

Abstract:

In this study, the use of silicon NAM (Non-Audible Murmur) microphone in automatic speech recognition is presented. NAM microphones are special acoustic sensors, which are attached behind the talker-s ear and can capture not only normal (audible) speech, but also very quietly uttered speech (non-audible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech conversion etc.) for sound-impaired people. Using a small amount of training data and adaptation approaches, 93.9% word accuracy was achieved for a 20k Japanese vocabulary dictation task. Non-audible murmur recognition in noisy environments is also investigated. In this study, further analysis of the NAM speech has been made using distance measures between hidden Markov model (HMM) pairs. It has been shown the reduced spectral space of NAM speech using a metric distance, however the location of the different phonemes of NAM are similar to the location of the phonemes of normal speech, and the NAM sounds are well discriminated. Promising results in using nonlinear features are also introduced, especially under noisy conditions.

Keywords: Speech recognition, unvoiced speech, nonlinear features, HMM distance measures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1616
533 Analysis of Combined Use of NN and MFCC for Speech Recognition

Authors: Safdar Tanweer, Abdul Mobin, Afshar Alam

Abstract:

The performance and analysis of speech recognition system is illustrated in this paper. An approach to recognize the English word corresponding to digit (0-9) spoken by 2 different speakers is captured in noise free environment. For feature extraction, speech Mel frequency cepstral coefficients (MFCC) has been used which gives a set of feature vectors from recorded speech samples. Neural network model is used to enhance the recognition performance. Feed forward neural network with back propagation algorithm model is used. However other speech recognition techniques such as HMM, DTW exist. All experiments are carried out on Matlab.

Keywords: Speech Recognition, MFCC, Neural Network, classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3231
532 Image Segmentation by Mathematical Morphology: An Approach through Linear, Bilinear and Conformal Transformation

Authors: Dibyendu Ghoshal, Pinaki Pratim Acharjya

Abstract:

Image segmentation process based on mathematical morphology has been studied in the paper. It has been established from the first principles of the morphological process, the entire segmentation is although a nonlinear signal processing task, the constituent wise, the intermediate steps are linear, bilinear and conformal transformation and they give rise to a non linear affect in a cumulative manner.

Keywords: Image segmentation, linear transform, bilinear transform, conformal transform, mathematical morphology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2141
531 On SNR Estimation by the Likelihood of near Pitch for Speech Detection

Authors: Young-Hwan Song, Doo-Heon Kyun, Jong-Kuk Kim, Myung-Jin Bae

Abstract:

People have the habitual pitch level which is used when people say something generally. However this pitch should be changed irregularly in the presence of noise. So it is useful to estimate SNR of speech signal by pitch. In this paper, we obtain the energy of input speech signal and then we detect a stationary region on voiced speech. And we get the pitch period by NAMDF for the stationary region that is not varied pitch rapidly. After getting pitch, each frame is divided by pitch period and the likelihood of closed pitch is estimated. In this paper, we proposed new parameter, NLF, to estimate the SNR of received speech signal. The NLF is derived from the correlation of near pitch periods. The NLF is obtained for each stationary region in voiced speech. Finally we confirmed good performance of the estimation of the SNR of received input speech in the presence of noise.

Keywords: Likelihood, pitch, SNR, speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1543
530 Dual Pyramid of Agents for Image Segmentation

Authors: K. Idir, H. Merouani, Y. Tlili.

Abstract:

An effective method for the early detection of breast cancer is the mammographic screening. One of the most important signs of early breast cancer is the presence of microcalcifications. For the detection of microcalcification in a mammography image, we propose to conceive a multiagent system based on a dual irregular pyramid. An initial segmentation is obtained by an incremental approach; the result represents level zero of the pyramid. The edge information obtained by application of the Canny filter is taken into account to affine the segmentation. The edge-agents and region-agents cooper level by level of the pyramid by exploiting its various characteristics to provide the segmentation process convergence.

Keywords: Dual Pyramid, Image Segmentation, Multi-agent System, Region/Edge Cooperation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1890
529 Deficiencies of Lung Segmentation Techniques using CT Scan Images for CAD

Authors: Nisar Ahmed Memon, Anwar Majid Mirza, S.A.M. Gilani

Abstract:

Segmentation is an important step in medical image analysis and classification for radiological evaluation or computer aided diagnosis. This paper presents the problem of inaccurate lung segmentation as observed in algorithms presented by researchers working in the area of medical image analysis. The different lung segmentation techniques have been tested using the dataset of 19 patients consisting of a total of 917 images. We obtained datasets of 11 patients from Ackron University, USA and of 8 patients from AGA Khan Medical University, Pakistan. After testing the algorithms against datasets, the deficiencies of each algorithm have been highlighted.

Keywords: Computer Aided Diagnosis (CAD), MathematicalMorphology, Medical Image Analysis, Region Growing, Segmentation, Thresholding,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2309
528 Speech Impact Realization via Manipulative Argumentation Techniques in Modern American Political Discourse

Authors: Zarine Avetisyan

Abstract:

The present paper presents the discussion of scholars concerning speech impact, peculiarities of its realization, speech strategies and techniques in particular. Departing from the viewpoints of many prominent linguists, the paper suggests that manipulative argumentation be viewed as a most pervasive speech strategy with a certain set of techniques which are to be found in modern American political discourse. The precedence of their occurrence allows us to regard them as pragmatic patterns of speech impact realization in effective public speaking.

Keywords: Manipulative argumentation, political discourse, speech impact, technique.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2235
527 Review of the Software Used for 3D Volumetric Reconstruction of the Liver

Authors: P. Strakos, M. Jaros, T. Karasek, T. Kozubek, P. Vavra, T. Jonszta

Abstract:

In medical imaging, segmentation of different areas of human body like bones, organs, tissues, etc. is an important issue. Image segmentation allows isolating the object of interest for further processing that can lead for example to 3D model reconstruction of whole organs. Difficulty of this procedure varies from trivial for bones to quite difficult for organs like liver. The liver is being considered as one of the most difficult human body organ to segment. It is mainly for its complexity, shape versatility and proximity of other organs and tissues. Due to this facts usually substantial user effort has to be applied to obtain satisfactory results of the image segmentation. Process of image segmentation then deteriorates from automatic or semi-automatic to fairly manual one. In this paper, overview of selected available software applications that can handle semi-automatic image segmentation with further 3D volume reconstruction of human liver is presented. The applications are being evaluated based on the segmentation results of several consecutive DICOM images covering the abdominal area of the human body.

Keywords: Image segmentation, semi-automatic, software, 3D volumetric reconstruction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4428
526 Speech Enhancement Using Kalman Filter in Communication

Authors: Eng. Alaa K. Satti Salih

Abstract:

Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.

Keywords: Autoregressive Process, Kalman filter, Matlab and Noise speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3991
525 Medical Image Segmentation and Detection of MR Images Based on Spatial Multiple-Kernel Fuzzy C-Means Algorithm

Authors: J. Mehena, M. C. Adhikary

Abstract:

In this paper, a spatial multiple-kernel fuzzy C-means (SMKFCM) algorithm is introduced for segmentation problem. A linear combination of multiples kernels with spatial information is used in the kernel FCM (KFCM) and the updating rules for the linear coefficients of the composite kernels are derived as well. Fuzzy cmeans (FCM) based techniques have been widely used in medical image segmentation problem due to their simplicity and fast convergence. The proposed SMKFCM algorithm provides us a new flexible vehicle to fuse different pixel information in medical image segmentation and detection of MR images. To evaluate the robustness of the proposed segmentation algorithm in noisy environment, we add noise in medical brain tumor MR images and calculated the success rate and segmentation accuracy. From the experimental results it is clear that the proposed algorithm has better performance than those of other FCM based techniques for noisy medical MR images.

Keywords: Clustering, fuzzy C-means, image segmentation, MR images, multiple kernels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2102
524 Segmental and Subsegmental Lung Vessel Segmentation in CTA Images

Authors: H. Özkan

Abstract:

In this paper, a novel and fast algorithm for segmental and subsegmental lung vessel segmentation is introduced using Computed Tomography Angiography images. This process is quite important especially at the detection of pulmonary embolism, lung nodule, and interstitial lung disease. The applied method has been realized at five steps. At the first step, lung segmentation is achieved. At the second one, images are threshold and differences between the images are detected. At the third one, left and right lungs are gathered with the differences which are attained in the second step and Exact Lung Image (ELI) is achieved. At the fourth one, image, which is threshold for vessel, is gathered with the ELI. Lastly, identifying and segmentation of segmental and subsegmental lung vessel have been carried out thanks to image which is obtained in the fourth step. The performance of the applied method is found quite well for radiologists and it gives enough results to the surgeries medically.

Keywords: Computed tomography angiography (CTA), Computer aided detection (CAD), Lung segmentation, Lung vessel segmentation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2136
523 An Automatic Gridding and Contour Based Segmentation Approach Applied to DNA Microarray Image Analysis

Authors: Alexandra Oliveros, Miguel Sotaquirá

Abstract:

DNA microarray technology is widely used by geneticists to diagnose or treat diseases through gene expression. This technology is based on the hybridization of a tissue-s DNA sequence into a substrate and the further analysis of the image formed by the thousands of genes in the DNA as green, red or yellow spots. The process of DNA microarray image analysis involves finding the location of the spots and the quantification of the expression level of these. In this paper, a tool to perform DNA microarray image analysis is presented, including a spot addressing method based on the image projections, the spot segmentation through contour based segmentation and the extraction of relevant information due to gene expression.

Keywords: Contour segmentation, DNA microarrays, edge detection, image processing, segmentation, spot addressing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1363
522 Color Image Segmentation and Multi-Level Thresholding by Maximization of Conditional Entropy

Authors: R.Sukesh Kumar, Abhisek Verma, Jasprit Singh

Abstract:

In this work a novel approach for color image segmentation using higher order entropy as a textural feature for determination of thresholds over a two dimensional image histogram is discussed. A similar approach is applied to achieve multi-level thresholding in both grayscale and color images. The paper discusses two methods of color image segmentation using RGB space as the standard processing space. The threshold for segmentation is decided by the maximization of conditional entropy in the two dimensional histogram of the color image separated into three grayscale images of R, G and B. The features are first developed independently for the three ( R, G, B ) spaces, and combined to get different color component segmentation. By considering local maxima instead of the maximum of conditional entropy yields multiple thresholds for the same image which forms the basis for multilevel thresholding.

Keywords: conditional entropy, multi-level thresholding, segmentation, two dimensional image histogram

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2965
521 Nature Inspired Metaheuristic Algorithms for Multilevel Thresholding Image Segmentation - A Survey

Authors: C. Deepika, J. Nithya

Abstract:

Segmentation is one of the essential tasks in image processing. Thresholding is one of the simplest techniques for performing image segmentation. Multilevel thresholding is a simple and effective technique. The primary objective of bi-level or multilevel thresholding for image segmentation is to determine a best thresholding value. To achieve multilevel thresholding various techniques has been proposed. A study of some nature inspired metaheuristic algorithms for multilevel thresholding for image segmentation is conducted. Here, we study about Particle swarm optimization (PSO) algorithm, artificial bee colony optimization (ABC), Ant colony optimization (ACO) algorithm and Cuckoo search (CS) algorithm.

Keywords: Ant colony optimization, Artificial bee colony optimization, Cuckoo search algorithm, Image segmentation, Multilevel thresholding, Particle swarm optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3479
520 Optimum Cascaded Design for Speech Enhancement Using Kalman Filter

Authors: T. Kishore Kumar

Abstract:

Speech enhancement is the process of eliminating noise and increasing the quality of a speech signal, which is contaminated with other kinds of distortions. This paper is on developing an optimum cascaded system for speech enhancement. This aim is attained without diminishing any relevant speech information and without much computational and time complexity. LMS algorithm, Spectral Subtraction and Kalman filter have been deployed as the main de-noising algorithms in this work. Since these algorithms suffer from respective shortcomings, this work has been undertaken to design cascaded systems in different combinations and the evaluation of such cascades by qualitative (listening) and quantitative (SNR) tests.

Keywords: LMS, Kalman filter, Speech Enhancement and Spectral Subtraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1699
519 Seed-Based Region Growing (SBRG) vs Adaptive Network-Based Inference System (ANFIS) vs Fuzzyc-Means (FCM): Brain Abnormalities Segmentation

Authors: Shafaf Ibrahim, Noor Elaiza Abdul Khalid, Mazani Manaf

Abstract:

Segmentation of Magnetic Resonance Imaging (MRI) images is the most challenging problems in medical imaging. This paper compares the performances of Seed-Based Region Growing (SBRG), Adaptive Network-Based Fuzzy Inference System (ANFIS) and Fuzzy c-Means (FCM) in brain abnormalities segmentation. Controlled experimental data is used, which designed in such a way that prior knowledge of the size of the abnormalities are known. This is done by cutting various sizes of abnormalities and pasting it onto normal brain tissues. The normal tissues or the background are divided into three different categories. The segmentation is done with fifty seven data of each category. The knowledge of the size of the abnormalities by the number of pixels are then compared with segmentation results of three techniques proposed. It was proven that the ANFIS returns the best segmentation performances in light abnormalities, whereas the SBRG on the other hand performed well in dark abnormalities segmentation.

Keywords: Seed-Based Region Growing (SBRG), Adaptive Network-Based Fuzzy Inference System (ANFIS), Fuzzy c-Means (FCM), Brain segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2269
518 Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control

Authors: Van Nhan Nguyen, Harald Holone

Abstract:

Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.

Keywords: Automatic Speech Recognition, ASR, Air Traffic Control, ATC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3994
517 Marketing Segmentation of Students Willing to Study Abroad based on Cluster Analysis

Authors: Kamila Tislerova, Marta Zambochova

Abstract:

Market segmentation is one of the most fundamental strategic marketing concepts. The better the segment which is chosen for targeting by a particular organisation, the more successful the organisation is assumed to be in the marketplace. Also higher education institutions have to improve their marketing tools for attracting foreign students, particularly when demanding tuition fees. This contribution aims at demonstrating the proper usage of the cluster analysis for segmentation (represented by students' willingness to study abroad) and also, based on large international survey, offers some practical marketing implications.

Keywords: Market Segmentation, Students' Preferences, Study Abroad, Cluster Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2185
516 A Neural Approach for Color-Textured Images Segmentation

Authors: Khalid Salhi, El Miloud Jaara, Mohammed Talibi Alaoui

Abstract:

In this paper, we present a neural approach for unsupervised natural color-texture image segmentation, which is based on both Kohonen maps and mathematical morphology, using a combination of the texture and the image color information of the image, namely, the fractal features based on fractal dimension are selected to present the information texture, and the color features presented in RGB color space. These features are then used to train the network Kohonen, which will be represented by the underlying probability density function, the segmentation of this map is made by morphological watershed transformation. The performance of our color-texture segmentation approach is compared first, to color-based methods or texture-based methods only, and then to k-means method.

Keywords: Segmentation, color-texture, neural networks, fractal, watershed.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1348
515 Speech Data Compression using Vector Quantization

Authors: H. B. Kekre, Tanuja K. Sarode

Abstract:

Mostly transforms are used for speech data compressions which are lossy algorithms. Such algorithms are tolerable for speech data compression since the loss in quality is not perceived by the human ear. However the vector quantization (VQ) has a potential to give more data compression maintaining the same quality. In this paper we propose speech data compression algorithm using vector quantization technique. We have used VQ algorithms LBG, KPE and FCG. The results table shows computational complexity of these three algorithms. Here we have introduced a new performance parameter Average Fractional Change in Speech Sample (AFCSS). Our FCG algorithm gives far better performance considering mean absolute error, AFCSS and complexity as compared to others.

Keywords: Vector Quantization, Data Compression, Encoding, , Speech coding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2366
514 Performance Evaluation of Acoustic-Spectrographic Voice Identification Method in Native and Non-Native Speech

Authors: E. Krasnova, E. Bulgakova, V. Shchemelinin

Abstract:

The paper deals with acoustic-spectrographic voice identification method in terms of its performance in non-native language speech. Performance evaluation is conducted by comparing the result of the analysis of recordings containing native language speech with recordings that contain foreign language speech. Our research is based on Tajik and Russian speech of Tajik native speakers due to the character of the criminal situation with drug trafficking. We propose a pilot experiment that represents a primary attempt enter the field.

Keywords: Speaker identification, acoustic-spectrographic method, non-native speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 833
513 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy

Authors: Nazaket Gazieva

Abstract:

Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.

Keywords: Biometric voice prints, fundamental frequency, phonogram, speech signal, temporal characteristics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 510
512 High-Individuality Voice Conversion Based on Concatenative Speech Synthesis

Authors: Kei Fujii, Jun Okawa, Kaori Suigetsu

Abstract:

Concatenative speech synthesis is a method that can make speech sound which has naturalness and high-individuality of a speaker by introducing a large speech corpus. Based on this method, in this paper, we propose a voice conversion method whose conversion speech has high-individuality and naturalness. The authors also have two subjective evaluation experiments for evaluating individuality and sound quality of conversion speech. From the results, following three facts have be confirmed: (a) the proposal method can convert the individuality of speakers well, (b) employing the framework of unit selection (especially join cost) of concatenative speech synthesis into conventional voice conversion improves the sound quality of conversion speech, and (c) the proposal method is robust against the difference of genders between a source speaker and a target speaker.

Keywords: concatenative speech synthesis, join cost, speaker individuality, unit selection, voice conversion

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1896
511 Assamese Numeral Corpus for Speech Recognition using Cooperative ANN Architecture

Authors: Mousmita Sarma, Krishna Dutta, Kandarpa Kumar Sarma

Abstract:

Speech corpus is one of the major components in a Speech Processing System where one of the primary requirements is to recognize an input sample. The quality and details captured in speech corpus directly affects the precision of recognition. The current work proposes a platform for speech corpus generation using an adaptive LMS filter and LPC cepstrum, as a part of an ANN based Speech Recognition System which is exclusively designed to recognize isolated numerals of Assamese language- a major language in the North Eastern part of India. The work focuses on designing an optimal feature extraction block and a few ANN based cooperative architectures so that the performance of the Speech Recognition System can be improved.

Keywords: Filter, Feature, LMS, LPC, Cepstrum, ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2353
510 Manipulation of Image Segmentation Using Cleverness Artificial Bee Colony Approach

Authors: Y. Harold Robinson, E. Golden Julie, P. Joyce Beryl Princess

Abstract:

Image segmentation is the concept of splitting the images into several images. Image Segmentation algorithm is used to manipulate the process of image segmentation. The advantage of ABC is that it conducts every worldwide exploration and inhabitant exploration for iteration. Particle Swarm Optimization (PSO) and Evolutionary Particle Swarm Optimization (EPSO) encompass a number of search problems. Cleverness Artificial Bee Colony algorithm has been imposed to increase the performance of a neighborhood search. The simulation results clearly show that the presented ABC methods outperform the existing methods. The result shows that the algorithms can be used to implement the manipulator for grasping of colored objects. The efficiency of the presented method is improved a lot by comparing to other methods.

Keywords: Color information, EPSO, ABC, image segmentation, particle swarm optimization, active contour, GMM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1259
509 A Novel Approach towards Segmentation of Breast Tumors from Screening Mammograms for Efficient Decision Support System

Authors: M.Suganthi, M.Madheswaran

Abstract:

This paper presents a novel approach to finding a priori interesting regions in mammograms. In order to delineate those regions of interest (ROI-s) in mammograms, which appear to be prominent, a topographic representation called the iso-level contour map consisting of iso-level contours at multiple intensity levels and region segmentation based-thresholding have been proposed. The simulation results indicate that the computed boundary gives the detection rate of 99.5% accuracy.

Keywords: Breast Cancer, Mammogram, and Segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441
508 The Influence of Audio on Perceived Quality of Segmentation

Authors: Silvio R. R. Sanches, Bianca C. Barbosa, Beatriz R. Brum, Cléber G.Corrêa

Abstract:

In order to evaluate the quality of a segmentation algorithm, the researchers use subjective or objective metrics. Although subjective metrics are more accurate than objective ones, objective metrics do not require user feedback to test an algorithm. Objective metrics require subjective experiments only during their development. Subjective experiments typically display to users some videos (generated from frames with segmentation errors) that simulate the environment of an application domain. This user feedback is crucial information for metric definition. In the subjective experiments applied to develop some state-of-the-art metrics used to test segmentation algorithms, the videos displayed during the experiments did not contain audio. Audio is an essential component in applications such as videoconference and augmented reality. If the audio influences the user’s perception, using only videos without audio in subjective experiments can compromise the efficiency of an objective metric generated using data from these experiments. This work aims to identify if the audio influences the user’s perception of segmentation quality in background substitution applications with audio. The proposed approach used a subjective method based on formal video quality assessment methods. The results showed that audio influences the quality of segmentation perceived by a user.

Keywords: Background substitution, influence of audio, segmentation evaluation, segmentation quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 298
507 The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition

Authors: Fawaz S. Al-Anzi, Dia AbuZeina

Abstract:

Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature extraction process, which parameterizes the speech signal to produce the corresponding feature vectors. Feature extraction process aims at approximating the linguistic content that is conveyed by the input speech signal. In speech processing field, there are several methods to extract speech features, however, Mel Frequency Cepstral Coefficients (MFCC) is the popular technique. It has been long observed that the MFCC is dominantly used in the well-known recognizers such as the Carnegie Mellon University (CMU) Sphinx and the Markov Model Toolkit (HTK). Hence, this paper focuses on the MFCC method as the standard choice to identify the different speech segments in order to obtain the language phonemes for further training and decoding steps. Due to MFCC good performance, the previous studies show that the MFCC dominates the Arabic ASR research. In this paper, we demonstrate MFCC as well as the intermediate steps that are performed to get these coefficients using the HTK toolkit.

Keywords: Speech recognition, acoustic features, Mel Frequency Cepstral Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1928