Search results for: speech enhancement.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 743

Search results for: speech enhancement.

653 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses

Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas

Abstract:

We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.

Keywords: Transient noise pulses, noise reduction, dynamic time warping, speech recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1946
652 Design of a DCT-based Image Compression with Efficient Enhancement Filter

Authors: Yen-Yu Chen, Pao-Ching Chu, Ya-Ling Tsai

Abstract:

The algorithm represents the DCT coefficients to concentrate signal energy and proposes combination and dictator to eliminate the correlation in the same level subband for encoding the DCT-based images. This work adopts DCT and modifies the SPIHT algorithm to encode DCT coefficients. The proposed algorithm also provides the enhancement function in low bit rate in order to improve the perceptual quality. Experimental results indicate that the proposed technique improves the quality of the reconstructed image in terms of both PSNR and the perceptual results close to JPEG2000 at the same bit rate.

Keywords: JPEG 2000, enhancement filter

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1693
651 Unit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis

Authors: Mohamed Ali KAMMOUN, Ahmed Ben HAMIDA

Abstract:

In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection. In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost. In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora.

Keywords: Unit selection, Corpus-based Speech Synthesis, Bigram model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441
650 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: Bilingual, children who stutter, children with language impairment, Hidden Markov Models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1029
649 Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition

Authors: Wernhuar Tarng, Yuan-Yuan Chen, Chien-Lung Li, Kun-Rong Hsie, Mingteh Chen

Abstract:

An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.

Keywords: Smart phones, emotional speech recognition, socialnetworks, support vector machines, time-frequency parameter, Mel-scale frequency cepstral coefficients (MFCC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1842
648 Voice Disorders Identification Using Hybrid Approach: Wavelet Analysis and Multilayer Neural Networks

Authors: L. Salhi, M. Talbi, A. Cherif

Abstract:

This paper presents a new strategy of identification and classification of pathological voices using the hybrid method based on wavelet transform and neural networks. After speech acquisition from a patient, the speech signal is analysed in order to extract the acoustic parameters such as the pitch, the formants, Jitter, and shimmer. Obtained results will be compared to those normal and standard values thanks to a programmable database. Sounds are collected from normal people and patients, and then classified into two different categories. Speech data base is consists of several pathological and normal voices collected from the national hospital “Rabta-Tunis". Speech processing algorithm is conducted in a supervised mode for discrimination of normal and pathology voices and then for classification between neural and vocal pathologies (Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation results will be presented in function of the disease and will be compared with the clinical diagnosis in order to have an objective evaluation of the developed tool.

Keywords: Formants, Neural Networks, Pathological Voices, Pitch, Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2842
647 Preliminary Study of the Phonological Development in Three- and Four-Year-Old Bulgarian Children

Authors: Tsvetomira Braynova, Miglena Simonska

Abstract:

The article presents the results of a research of phonological processes in three- and four-year-old children. A test, created for the purpose of the study, was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the research is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, elision of sound, metathesis of sound, elision of syllable, elision of consonants clustered in a syllable. Measuring the correlation between average length of repeated speech and average length of generated speech, the analysis does not prove that the more words a child can repeat in part “repeated speech”, the more words they can be expected to generate in part “generating sentence”. The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.

Keywords: Articulation, phonology, speech, language development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 384
646 An Adaptive Mammographic Image Enhancement in Orthogonal Polynomials Domain

Authors: R. Krishnamoorthy, N. Amudhavalli, M.K. Sivakkolunthu

Abstract:

X-ray mammography is the most effective method for the early detection of breast diseases. However, the typical diagnostic signs such as microcalcifications and masses are difficult to detect because mammograms are of low-contrast and noisy. In this paper, a new algorithm for image denoising and enhancement in Orthogonal Polynomials Transformation (OPT) is proposed for radiologists to screen mammograms. In this method, a set of OPT edge coefficients are scaled to a new set by a scale factor called OPT scale factor. The new set of coefficients is then inverse transformed resulting in contrast improved image. Applications of the proposed method to mammograms with subtle lesions are shown. To validate the effectiveness of the proposed method, we compare the results to those obtained by the Histogram Equalization (HE) and the Unsharp Masking (UM) methods. Our preliminary results strongly suggest that the proposed method offers considerably improved enhancement capability over the HE and UM methods.

Keywords: mammograms, image enhancement, orthogonalpolynomials, contrast improvement

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2011
645 Continuous Feature Adaptation for Non-Native Speech Recognition

Authors: Y. Deng, X. Li, C. Kwan, B. Raj, R. Stern

Abstract:

The current speech interfaces in many military applications may be adequate for native speakers. However, the recognition rate drops quite a lot for non-native speakers (people with foreign accents). This is mainly because the nonnative speakers have large temporal and intra-phoneme variations when they pronounce the same words. This problem is also complicated by the presence of large environmental noise such as tank noise, helicopter noise, etc. In this paper, we proposed a novel continuous acoustic feature adaptation algorithm for on-line accent and environmental adaptation. Implemented by incremental singular value decomposition (SVD), the algorithm captures local acoustic variation and runs in real-time. This feature-based adaptation method is then integrated with conventional model-based maximum likelihood linear regression (MLLR) algorithm. Extensive experiments have been performed on the NATO non-native speech corpus with baseline acoustic model trained on native American English. The proposed feature-based adaptation algorithm improved the average recognition accuracy by 15%, while the MLLR model based adaptation achieved 11% improvement. The corresponding word error rate (WER) reduction was 25.8% and 2.73%, as compared to that without adaptation. The combined adaptation achieved overall recognition accuracy improvement of 29.5%, and WER reduction of 31.8%, as compared to that without adaptation.

Keywords: speaker adaptation; environment adaptation; robust speech recognition; SVD; non-native speech recognition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3217
644 Automatic Detection of Syllable Repetition in Read Speech for Objective Assessment of Stuttered Disfluencies

Authors: K. M. Ravikumar, Balakrishna Reddy, R. Rajagopal, H. C. Nagaraj

Abstract:

Automatic detection of syllable repetition is one of the important parameter in assessing the stuttered speech objectively. The existing method which uses artificial neural network (ANN) requires high levels of agreement as prerequisite before attempting to train and test ANNs to separate fluent and nonfluent. We propose automatic detection method for syllable repetition in read speech for objective assessment of stuttered disfluencies which uses a novel approach and has four stages comprising of segmentation, feature extraction, score matching and decision logic. Feature extraction is implemented using well know Mel frequency Cepstra coefficient (MFCC). Score matching is done using Dynamic Time Warping (DTW) between the syllables. The Decision logic is implemented by Perceptron based on the score given by score matching. Although many methods are available for segmentation, in this paper it is done manually. Here the assessment by human judges on the read speech of 10 adults who stutter are described using corresponding method and the result was 83%.

Keywords: Assessment, DTW, MFCC, Objective, Perceptron, Stuttering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2811
643 Improved Posterized Color Images based on Color Quantization and Contrast Enhancement

Authors: Oh-Yeol Kwon, Sung-Il Chien

Abstract:

A conventional image posterization method occasionally fails to preserve the shape and color of objects due to the uneffective color reduction. This paper proposes a new image posterizartion method by using modified color quantization for preserving the shape and color of objects and color contrast enhancement for improving lightness contrast and saturation. Experiment results show that our proposed method can provide visually more satisfactory posterization result than that of the conventional method.

Keywords: Color contrast enhancement, color quantization, color segmentation, image posterization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2675
642 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: Clustering algorithm, potential function, speech signal, the UBSS model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 679
641 Enhancement of Low Contrast Satellite Images using Discrete Cosine Transform and Singular Value Decomposition

Authors: A. K. Bhandari, A. Kumar, P. K. Padhy

Abstract:

In this paper, a novel contrast enhancement technique for contrast enhancement of a low-contrast satellite image has been proposed based on the singular value decomposition (SVD) and discrete cosine transform (DCT). The singular value matrix represents the intensity information of the given image and any change on the singular values change the intensity of the input image. The proposed technique converts the image into the SVD-DCT domain and after normalizing the singular value matrix; the enhanced image is reconstructed by using inverse DCT. The visual and quantitative results suggest that the proposed SVD-DCT method clearly shows the increased efficiency and flexibility of the proposed method over the exiting methods such as Linear Contrast Stretching technique, GHE technique, DWT-SVD technique, DWT technique, Decorrelation Stretching technique, Gamma Correction method based techniques.

Keywords: Singular Value Decomposition (SVD), discretecosine transforms (DCT), image equalization and satellite imagecontrast enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3838
640 Bangla Vowel Characterization Based on Analysis by Synthesis

Authors: Syed Akhter Hossain, M. Lutfar Rahman, Farruk Ahmed

Abstract:

Bangla Vowel characterization determines the spectral properties of Bangla vowels for efficient synthesis as well as recognition of Bangla vowels. In this paper, Bangla vowels in isolated word have been analyzed based on speech production model within the framework of Analysis-by-Synthesis. This has led to the extraction of spectral parameters for the production model in order to produce different Bangla vowel sounds. The real and synthetic spectra are compared and a weighted square error has been computed along with the error in the formant bandwidths for efficient representation of Bangla vowels. The extracted features produced good representation of targeted Bangla vowel. Such a representation also plays essential role in low bit rate speech coding and vocoders.

Keywords: Speech, vowel, formant, synthesis, spectrum, LPC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2371
639 Speech Recognition Using Scaly Neural Networks

Authors: Akram M. Othman, May H. Riadh

Abstract:

This research work is aimed at speech recognition using scaly neural networks. A small vocabulary of 11 words were established first, these words are “word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2". These chosen words involved with executing some computer functions such as opening a file, print certain text document, cutting, copying, pasting, editing and exit. It introduced to the computer then subjected to feature extraction process using LPC (linear prediction coefficients). These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system; those are used for information retrieval. The system components are consist of three parts, speech processing and feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 79.5-88% successful, which is quite acceptable, considering the variation to surrounding, state of the person, and the microphone type.

Keywords: Feature extraction, Liner prediction coefficients, neural network, Speech Recognition, Scaly ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1737
638 Enhancement of Methane Productivity of Anaerobic Reactors of Wastewater Treatment Plants

Authors: Aare Kuusik, E. Loigu, O. Sokk, Argo Kuusik

Abstract:

This paper describes technological possibilities to enhance methane productionin the anaerobic stabilization of wastewater treatment plant excess sludge. This objective can be achieved by the addition of waste residues: crude glycerol from biodiesel production and residues from fishery. The addition ofglycerol in an amount by weight of 2 – 5% causes enhancement of methane production of about 250 – 400%. At the same time the percentage increase of total solids concentration in the outgoing sludge is ten or more times less. The containment of methane in biogas is higher in case of admixed substrate.

Keywords: Enhancement of methane production, fishery residues, waste glycerol

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1715
637 Microstrip Patch Antenna Enhancement Techniques

Authors: Ahmad H. Abdelgwad

Abstract:

Microstrip patch antennas are widely used in many wireless communication applications because of their various advantages such as light weight, compact size, inexpensive, ease of fabrication and high reliability. However, narrow bandwidth and low gain are the major drawbacks of microstrip antennas. The radiation properties of microstrip antenna is affected by many designing factors like feeding techniques, manufacturing substrate, patch and ground structure. This manuscript presents a review of the most popular gain and bandwidth enhancement methods of microstrip antenna and reports a brief description of its feeding techniques.

Keywords: Gain and bandwidth enhancement, slotted patch, parasitic patch, electromagnetic band gap, defected ground, feeding techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1819
636 Improved Lung Nodule Visualization on Chest Radiographs using Digital Filtering and Contrast Enhancement

Authors: Benjamin Y. M. Kwan, Hon Keung Kwan

Abstract:

Early detection of lung cancer through chest radiography is a widely used method due to its relatively affordable cost. In this paper, an approach to improve lung nodule visualization on chest radiographs is presented. The approach makes use of linear phase high-frequency emphasis filter for digital filtering and histogram equalization for contrast enhancement to achieve improvements. Results obtained indicate that a filtered image can reveal sharper edges and provide more details. Also, contrast enhancement offers a way to further enhance the global (or local) visualization by equalizing the histogram of the pixel values within the whole image (or a region of interest). The work aims to improve lung nodule visualization of chest radiographs to aid detection of lung cancer which is currently the leading cause of cancer deaths worldwide.

Keywords: Chest radiographs, Contrast enhancement, Digital filtering, Lung nodule detection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729
635 Image Enhancement using α-Trimmed Mean ε-Filters

Authors: Mahdi Shaneh, Arash Golibagh Mahyari

Abstract:

Image enhancement is the most important challenging preprocessing for almost all applications of Image Processing. By now, various methods such as Median filter, α-trimmed mean filter, etc. have been suggested. It was proved that the α-trimmed mean filter is the modification of median and mean filters. On the other hand, ε-filters have shown excellent performance in suppressing noise. In spite of their simplicity, they achieve good results. However, conventional ε-filter is based on moving average. In this paper, we suggested a new ε-filter which utilizes α-trimmed mean. We argue that this new method gives better outcomes compared to previous ones and the experimental results confirmed this claim.

Keywords: Image enhancement, median filter, ε-filter – α-trimmed mean filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5502
634 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach

Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik

Abstract:

We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.

Keywords: Noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 990
633 A Smart-Visio Microphone for Audio-Visual Speech Recognition “Vmike“

Authors: Y. Ni, K. Sebri

Abstract:

The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.

Keywords: Audio-Visual Speech recognition, CMOS Smartsensor, On-Chip image processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1826
632 Computationally Efficient Signal Quality Improvement Method for VoIP System

Authors: H. P. Singh, S. Singh

Abstract:

The voice signal in Voice over Internet protocol (VoIP) system is processed through the best effort policy based IP network, which leads to the network degradations including delay, packet loss jitter. The work in this paper presents the implementation of finite impulse response (FIR) filter for voice quality improvement in the VoIP system through distributed arithmetic (DA) algorithm. The VoIP simulations are conducted with AMR-NB 6.70 kbps and G.729a speech coders at different packet loss rates and the performance of the enhanced VoIP signal is evaluated using the perceptual evaluation of speech quality (PESQ) measurement for narrowband signal. The results show reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal.

Keywords: VoIP, Signal Quality, Distributed Arithmetic, Packet Loss, Speech Coder.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1830
631 Extracting Tongue Shape Dynamics from Magnetic Resonance Image Sequences

Authors: María S. Avila-García, John N. Carter, Robert I. Damper

Abstract:

An important problem in speech research is the automatic extraction of information about the shape and dimensions of the vocal tract during real-time speech production. We have previously developed Southampton dynamic magnetic resonance imaging (SDMRI) as an approach to the solution of this problem.However, the SDMRI images are very noisy so that shape extraction is a major challenge. In this paper, we address the problem of tongue shape extraction, which poses difficulties because this is a highly deforming non-parametric shape. We show that combining active shape models with the dynamic Hough transform allows the tongue shape to be reliably tracked in the image sequence.

Keywords: Vocal tract imaging, speech production, active shapemodels, dynamic Hough transform, object tracking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1735
630 Investigation of Enhancement of Heat Transfer in Natural Convection Utilizing of Nanofluids

Authors: S. Etaig, R. Hasan, N. Perera

Abstract:

This paper analyses the heat transfer performance and fluid flow using different nanofluids in a square enclosure. The energy equation and Navier-Stokes equation are solved numerically using finite volume scheme. The effect of volume fraction concentration on the enhancement of heat transfer has been studied icorporating the Brownian motion; the influence of effective thermal conductivity on the enhancement was also investigated for a range of volume fraction concentration. The velocity profile for different Rayleigh number. Water-Cu, water AL2O3 and water-TiO2 were tested.

Keywords: Computational fluid Dynamics, Natural convection, Nanofluid and Thermal conductivity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1838
629 Numerical and Experimental Study of Heat Transfer Enhancement with Metal Foams and Ultrasounds

Authors: L. Slimani, A. Bousri, A. Hamadouche, H. Ben Hamed

Abstract:

The aim of this experimental and numerical study is to analyze the effects of acoustic streaming generated by 40 kHz ultrasonic waves on heat transfer in forced convection, with and without 40 PPI aluminum metal foam. Preliminary dynamic and thermal studies were done with COMSOL Multiphase, to see heat transfer enhancement degree by inserting a 40PPI metal foam (10 × 2 × 3 cm) on a heat sink, after having determined experimentally its permeability and Forchheimer's coefficient. The results obtained numerically are in accordance with those obtained experimentally, with an enhancement factor of 205% for a velocity of 0.4 m/s compared to an empty channel. The influence of 40 kHz ultrasound on heat transfer was also tested with and without metallic foam. Results show a remarkable increase in Nusselt number in an empty channel with an enhancement factor of 37,5%, while no influence of ultrasound on heat transfer in metal foam presence.

Keywords: Enhancing heat transfer, metal foam, ultrasound, acoustic streaming, laminar flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 491
628 Influence of Loudness Compression on Hearing with Bone Anchored Hearing Implants

Authors: Anja Kurz, Marc Flynn, Tobias Good, Marco Caversaccio, Martin Kompis

Abstract:

Bone Anchored Hearing Implants (BAHI) are  routinely used in patients with conductive or mixed hearing loss, e.g.  if conventional air conduction hearing aids cannot be used. New  sound processors and new fitting software now allow the adjustment  of parameters such as loudness compression ratios or maximum  power output separately. Today it is unclear, how the choice of these  parameters influences aided speech understanding in BAHI users.  In this prospective experimental study, the effect of varying the  compression ratio and lowering the maximum power output in a  BAHI were investigated.  Twelve experienced adult subjects with a mixed hearing loss  participated in this study. Four different compression ratios (1.0; 1.3;  1.6; 2.0) were tested along with two different maximum power output  settings, resulting in a total of eight different programs. Each  participant tested each program during two weeks. A blinded Latin  square design was used to minimize bias.  For each of the eight programs, speech understanding in quiet and  in noise was assessed. For speech in quiet, the Freiburg number test  and the Freiburg monosyllabic word test at 50, 65, and 80 dB SPL  were used. For speech in noise, the Oldenburg sentence test was  administered.  Speech understanding in quiet and in noise was improved  significantly in the aided condition in any program, when compared  to the unaided condition. However, no significant differences were  found between any of the eight programs. In contrast, on a subjective  level there was a significant preference for medium compression  ratios of 1.3 to 1.6 and higher maximum power output.

 

Keywords: Bone Anchored Hearing Implant, Compression, Maximum Power Output, Speech understanding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2066
627 Comparison of Fricative Vocal Tract Transfer Functions Derived using Two Different Segmentation Techniques

Authors: K. S. Subari, C. H. Shadle, A. Barney, R. I. Damper

Abstract:

The acoustic and articulatory properties of fricative speech sounds are being studied using magnetic resonance imaging (MRI) and acoustic recordings from a single subject. Area functions were derived from a complete set of axial and coronal MR slices using two different methods: the Mermelstein technique and the Blum transform. Area functions derived from the two techniques were shown to differ significantly in some cases. Such differences will lead to different acoustic predictions and it is important to know which is the more accurate. The vocal tract acoustic transfer function (VTTF) was derived from these area functions for each fricative and compared with measured speech signals for the same fricative and same subject. The VTTFs for /f/ in two vowel contexts and the corresponding acoustic spectra are derived here; the Blum transform appears to show a better match between prediction and measurement than the Mermelstein technique.

Keywords: Area functions, fricatives, vocal tract transferfunction, MRI, speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1652
626 An Image Enhancement Method Based on Curvelet Transform for CBCT-Images

Authors: Shahriar Farzam, Maryam Rastgarpour

Abstract:

Image denoising plays extremely important role in digital image processing. Enhancement of clinical image research based on Curvelet has been developed rapidly in recent years. In this paper, we present a method for image contrast enhancement for cone beam CT (CBCT) images based on fast discrete curvelet transforms (FDCT) that work through Unequally Spaced Fast Fourier Transform (USFFT). These transforms return a table of Curvelet transform coefficients indexed by a scale parameter, an orientation and a spatial location. Accordingly, the coefficients obtained from FDCT-USFFT can be modified in order to enhance contrast in an image. Our proposed method first uses a two-dimensional mathematical transform, namely the FDCT through unequal-space fast Fourier transform on input image and then applies thresholding on coefficients of Curvelet to enhance the CBCT images. Consequently, applying unequal-space fast Fourier Transform leads to an accurate reconstruction of the image with high resolution. The experimental results indicate the performance of the proposed method is superior to the existing ones in terms of Peak Signal to Noise Ratio (PSNR) and Effective Measure of Enhancement (EME).

Keywords: Curvelet transform, image enhancement, CBCT, image denoising.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1260
625 Automotive 3-Microphone Noise Canceller in a Frequently Moving Noise Source Environment

Authors: Z. Qi, T. J. Moir

Abstract:

A combined three-microphone voice activity detector (VAD) and noise-canceling system is studied to enhance speech recognition in an automobile environment. A previous experiment clearly shows the ability of the composite system to cancel a single noise source outside of a defined zone. This paper investigates the performance of the composite system when there are frequently moving noise sources (noise sources are coming from different locations but are not always presented at the same time) e.g. there is other passenger speech or speech from a radio when a desired speech is presented. To work in a frequently moving noise sources environment, whilst a three-microphone voice activity detector (VAD) detects voice from a “VAD valid zone", the 3-microphone noise canceller uses a “noise canceller valid zone" defined in freespace around the users head. Therefore, a desired voice should be in the intersection of the noise canceller valid zone and VAD valid zone. Thus all noise is suppressed outside this intersection of area. Experiments are shown for a real environment e.g. all results were recorded in a car by omni-directional electret condenser microphones.

Keywords: Signal processing, voice activity detection, noise canceller, microphone array beam forming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1611
624 Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis

Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze

Abstract:

The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.

Keywords: Auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2323