Search results for: perceptual speech coding
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 547

Search results for: perceptual speech coding

427 End Point Detection for Wavelet Based Speech Compression

Authors: Jalal Karam

Abstract:

In real-field applications, the correct determination of voice segments highly improves the overall system accuracy and minimises the total computation time. This paper presents reliable measures of speech compression by detcting the end points of the speech signals prior to compressing them. The two different compession schemes used are the Global threshold and the Level- Dependent threshold techniques. The performance of the proposed method is tested wirh the Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error parameter measures.

Keywords: Wavelets, End-points Detection, Compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1343
426 Assamese Numeral Speech Recognition using Multiple Features and Cooperative LVQ -Architectures

Authors: Manash Pratim Sarma, Kandarpa Kumar Sarma

Abstract:

A set of Artificial Neural Network (ANN) based methods for the design of an effective system of speech recognition of numerals of Assamese language captured under varied recording conditions and moods is presented here. The work is related to the formulation of several ANN models configured to use Linear Predictive Code (LPC), Principal Component Analysis (PCA) and other features to tackle mood and gender variations uttering numbers as part of an Automatic Speech Recognition (ASR) system in Assamese. The ANN models are designed using a combination of Self Organizing Map (SOM) and Multi Layer Perceptron (MLP) constituting a Learning Vector Quantization (LVQ) block trained in a cooperative environment to handle male and female speech samples of numerals of Assamese- a language spoken by a sizable population in the North-Eastern part of India. The work provides a comparative evaluation of several such combinations while subjected to handle speech samples with gender based differences captured by a microphone in four different conditions viz. noiseless, noise mixed, stressed and stress-free.

Keywords: Assamese, Recognition, LPC, Spectral, ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1951
425 SMaTTS: Standard Malay Text to Speech System

Authors: Othman O. Khalifa, Zakiah Hanim Ahmad, Teddy Surya Gunawan

Abstract:

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.

Keywords: Natural Language Processing, Text-To-Speech (TTS), Diphone, source filter, low-/ high- level synthesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1932
424 Comparative Study of Filter Characteristics as Statistical Vocal Correlates of Clinical Psychiatric State in Human

Authors: Thaweesak Yingthawornsuk, Chusak Thanawattano

Abstract:

Acoustical properties of speech have been shown to be related to mental states of speaker with symptoms: depression and remission. This paper describes way to address the issue of distinguishing depressed patients from remitted subjects based on measureable acoustics change of their spoken sound. The vocal-tract related frequency characteristics of speech samples from female remitted and depressed patients were analyzed via speech processing techniques and consequently, evaluated statistically by cross-validation with Support Vector Machine. Our results comparatively show the classifier's performance with effectively correct separation of 93% determined from testing with the subjectbased feature model and 88% from the frame-based model based on the same speech samples collected from hospital visiting interview sessions between patients and psychiatrists.

Keywords: Depression, SVM, Vocal Extract, Vocal Tract

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1508
423 Speech Activated Automation

Authors: Rui Antunes

Abstract:

This article presents a simple way to perform programmed voice commands for the interface with commercial Digital and Analogue Input/Output PCI cards, used in Robotics and Automation applications. Robots and Automation equipment can "listen" to voice commands and perform several different tasks, approaching to the human behavior, and improving the human- machine interfaces for the Automation Industry. Since most PCI Digital and Analogue Input/Output cards are sold with several DLLs included (for use with different programming languages), it is possible to add speech recognition capability, using a standard speech recognition engine, compatible with the programming languages used. It was created in this work a Visual Basic 6 (the world's most popular language) application, that listens to several voice commands, and is capable to communicate directly with several standard 128 Digital I/O PCI Cards, used to control complete Automation Systems, with up to (number of boards used) x 128 Sensors and/or Actuators.

Keywords: Speech Recognition, Automation, Robotics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1794
422 Codebook Generation for Vector Quantization on Orthogonal Polynomials based Transform Coding

Authors: R. Krishnamoorthi, N. Kannan

Abstract:

In this paper, a new algorithm for generating codebook is proposed for vector quantization (VQ) in image coding. The significant features of the training image vectors are extracted by using the proposed Orthogonal Polynomials based transformation. We propose to generate the codebook by partitioning these feature vectors into a binary tree. Each feature vector at a non-terminal node of the binary tree is directed to one of the two descendants by comparing a single feature associated with that node to a threshold. The binary tree codebook is used for encoding and decoding the feature vectors. In the decoding process the feature vectors are subjected to inverse transformation with the help of basis functions of the proposed Orthogonal Polynomials based transformation to get back the approximated input image training vectors. The results of the proposed coding are compared with the VQ using Discrete Cosine Transform (DCT) and Pairwise Nearest Neighbor (PNN) algorithm. The new algorithm results in a considerable reduction in computation time and provides better reconstructed picture quality.

Keywords: Orthogonal Polynomials, Image Coding, Vector Quantization, TSVQ, Binary Tree Classifier

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2099
421 Emotion Recognition Using Neural Network: A Comparative Study

Authors: Nermine Ahmed Hendy, Hania Farag

Abstract:

Emotion recognition is an important research field that finds lots of applications nowadays. This work emphasizes on recognizing different emotions from speech signal. The extracted features are related to statistics of pitch, formants, and energy contours, as well as spectral, perceptual and temporal features, jitter, and shimmer. The Artificial Neural Networks (ANN) was chosen as the classifier. Working on finding a robust and fast ANN classifier suitable for different real life application is our concern. Several experiments were carried out on different ANN to investigate the different factors that impact the classification success rate. Using a database containing 7 different emotions, it will be shown that with a proper and careful adjustment of features format, training data sorting, number of features selected and even the ANN type and architecture used, a success rate of 85% or even more can be achieved without increasing the system complicity and the computation time

Keywords: Classification, emotion recognition, features extraction, feature selection, neural network

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4645
420 Virtual Speaking Head for Hearing Impaired Students

Authors: Eva Pajorová, Ladislav Hluchý

Abstract:

Developed tool is one of system tools for easier access to various scientific areas and real time interactive learning between lecturer and for hearing impaired students. There is no demand for the lecturer to know Sign Language (SL). Instead, the new software tools will perform the translation of the regular speech into SL, after which it will be transferred to the student. On the other side, the questions of the student (in SL) will be translated and transferred to the lecturer in text or speech. One of those tools is presented tool. It-s too for developing the correct Speech Visemes as a root of total communication method for hearing impared students.

Keywords: Impared people, sing language, communication methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1805
419 Noise Estimation for Speech Enhancement in Non-Stationary Environments-A New Method

Authors: Ch.V.Rama Rao, Gowthami., Harsha., Rajkumar., M.B.Rama Murthy, K.Srinivasa Rao, K.AnithaSheela

Abstract:

This paper presents a new method for estimating the nonstationary noise power spectral density given a noisy signal. The method is based on averaging the noisy speech power spectrum using time and frequency dependent smoothing factors. These factors are adjusted based on signal-presence probability in individual frequency bins. Signal presence is determined by computing the ratio of the noisy speech power spectrum to its local minimum, which is updated continuously by averaging past values of the noisy speech power spectra with a look-ahead factor. This method adapts very quickly to highly non-stationary noise environments. The proposed method achieves significant improvements over a system that uses voice activity detector (VAD) in noise estimation.

Keywords: Noise estimation, Non-stationary noise, Speechenhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2298
418 Automatic Distance Compensation for Robust Voice-based Human-Computer Interaction

Authors: Randy Gomez, Keisuke Nakamura, Kazuhiro Nakadai

Abstract:

Distant-talking voice-based HCI system suffers from performance degradation due to mismatch between the acoustic speech (runtime) and the acoustic model (training). Mismatch is caused by the change in the power of the speech signal as observed at the microphones. This change is greatly influenced by the change in distance, affecting speech dynamics inside the room before reaching the microphones. Moreover, as the speech signal is reflected, its acoustical characteristic is also altered by the room properties. In general, power mismatch due to distance is a complex problem. This paper presents a novel approach in dealing with distance-induced mismatch by intelligently sensing instantaneous voice power variation and compensating model parameters. First, the distant-talking speech signal is processed through microphone array processing, and the corresponding distance information is extracted. Distance-sensitive Gaussian Mixture Models (GMMs), pre-trained to capture both speech power and room property are used to predict the optimal distance of the speech source. Consequently, pre-computed statistic priors corresponding to the optimal distance is selected to correct the statistics of the generic model which was frozen during training. Thus, model combinatorics are post-conditioned to match the power of instantaneous speech acoustics at runtime. This results to an improved likelihood in predicting the correct speech command at farther distances. We experiment using real data recorded inside two rooms. Experimental evaluation shows voice recognition performance using our method is more robust to the change in distance compared to the conventional approach. In our experiment, under the most acoustically challenging environment (i.e., Room 2: 2.5 meters), our method achieved 24.2% improvement in recognition performance against the best-performing conventional method.

Keywords: Human Machine Interaction, Human Computer Interaction, Voice Recognition, Acoustic Model Compensation, Acoustic Speech Enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1843
417 Absence of Developmental Change in Epenthetic Vowel Duration in Japanese Speakers’ English

Authors: Takayuki Konishi, Kakeru Yazawa, Mariko Kondo

Abstract:

This study examines developmental change in the production of epenthetic vowels by Japanese learners of English in relation to acquisition of L2 English speech rhythm. Seventy-two Japanese learners of English in the J-AESOP corpus were divided into lower- and higher-level learners according to their proficiency score and the frequency of vowel epenthesis. Three learners were excluded because no vowel epenthesis was observed in their utterances. The analysis of their read English speech data showed no statistical difference between lower- and higher-level learners, implying the absence of any developmental change in durations of epenthetic vowels. This result, together with the findings of previous studies, will be discussed in relation to the transfer of L1 phonology and manifestation of L2 English rhythm.

Keywords: Vowel epenthesis, Japanese learners of English, L2 speech corpus, speech rhythm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1084
416 Analysis of Cooperative Hybrid ARQ with Adaptive Modulation and Coding on a Correlated Fading Channel Environment

Authors: Ibrahim Ozkan

Abstract:

In this study, a cross-layer design which combines adaptive modulation and coding (AMC) and hybrid automatic repeat request (HARQ) techniques for a cooperative wireless network is investigated analytically. Previous analyses of such systems in the literature are confined to the case where the fading channel is independent at each retransmission, which can be unrealistic unless the channel is varying very fast. On the other hand, temporal channel correlation can have a significant impact on the performance of HARQ systems. In this study, utilizing a Markov channel model which accounts for the temporal correlation, the performance of non-cooperative and cooperative networks are investigated in terms of packet loss rate and throughput metrics for Chase combining HARQ strategy.

Keywords: Cooperative network, adaptive modulation and coding, hybrid ARQ, correlated fading.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 540
415 The Use of Information for Inventory Decision in the Healthcare Industry

Authors: H. L. Chan, T. M. Choi, C. L. Hui, S. F. Ng

Abstract:

In this study, we explore the use of information for inventory decision in the healthcare organization (HO). We consider the scenario when the HO can make use of the information collected from some correlated products to enhance its inventory planning. Motivated by our real world observations that HOs adopt RFID and bar-coding system for information collection purpose, we examine the effectiveness of these systems for inventory planning with Bayesian information updating. We derive the optimal ordering decision and study the issue of Pareto improvement in the supply chain. Our analysis demonstrates that RFID system will outperform the bar-coding system when the RFID system installation cost and the tag cost reduce to a level that is comparable with that of the barcoding system. We also show how an appropriately set wholesale pricing contract can achieve Pareto improvement in the HO supply chain.

Keywords: Efficient consumer response program, healthcare, inventory management, RFID system, bar-coding system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1868
414 Accent Identification by Clustering and Scoring Formants

Authors: Dejan Stantic, Jun Jo

Abstract:

There have been significant improvements in automatic voice recognition technology. However, existing systems still face difficulties, particularly when used by non-native speakers with accents. In this paper we address a problem of identifying the English accented speech of speakers from different backgrounds. Once an accent is identified the speech recognition software can utilise training set from appropriate accent and therefore improve the efficiency and accuracy of the speech recognition system. We introduced the Q factor, which is defined by the sum of relationships between frequencies of the formants. Four different accents were considered and experimented for this research. A scoring method was introduced in order to effectively analyse accents. The proposed concept indicates that the accent could be identified by analysing their formants.

Keywords: Accent Identification, Formants, Q Factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2047
413 Delineating Students’ Speaking Anxieties and Assessment Gaps in Online Speech Performances

Authors: Mary Jane B. Suarez

Abstract:

Speech anxiety is innumerable in any traditional communication classes especially for ESL students. The speech anxiety intensifies when communication skills assessments have taken its toll in an online mode of learning due to the perils of the COVID-19 virus. Teachers and students have experienced vast ambiguity on how to realize a still effective way to teach and learn various speaking skills amidst the pandemic. This mixed method study determined the factors that affected the public speaking skills of students in online performances, delineated the assessment gaps in assessing speaking skills in an online setup, and recommended ways to address students’ speech anxieties. Using convergent parallel design, quantitative data were gathered by examining the desired learning competencies of the English course including a review of the teacher’s class record to analyze how students’ performances reflected a significantly high level of anxiety in online speech delivery. Focus group discussion was also conducted for qualitative data describing students’ public speaking anxiety and assessment gaps. Results showed a significantly high level of students’ speech anxiety affected by time constraints, use of technology, lack of audience response, being conscious of making mistakes, and the use of English as a second language. The study presented recommendations to redesign curricular assessments of English teachers and to have a robust diagnosis of students’ speaking anxiety to better cater to the needs of learners in attempt to bridge any gaps in cultivating public speaking skills of students as educational institutions segue from the pandemic to the post-pandemic milieu.

Keywords: Blended learning, communication skills assessment, online speech delivery, public speaking anxiety, speech anxiety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 116
412 Environmentally Adaptive Acoustic Echo Suppression for Barge-in Speech Recognition

Authors: Jong Han Joo, Jeong Hun Lee, Young Sun Kim, Jae Young Kang, Seung Ho Choi

Abstract:

In this study, we propose a novel technique for acoustic echo suppression (AES) during speech recognition under barge-in conditions. Conventional AES methods based on spectral subtraction apply fixed weights to the estimated echo path transfer function (EPTF) at the current signal segment and to the EPTF estimated until the previous time interval. However, the effects of echo path changes should be considered for eliminating the undesired echoes. We describe a new approach that adaptively updates weight parameters in response to abrupt changes in the acoustic environment due to background noises or double-talk. Furthermore, we devised a voice activity detector and an initial time-delay estimator for barge-in speech recognition in communication networks. The initial time delay is estimated using log-spectral distance measure, as well as cross-correlation coefficients. The experimental results show that the developed techniques can be successfully applied in barge-in speech recognition systems.

Keywords: Acoustic echo suppression, barge-in, speech recognition, echo path transfer function, initial delay estimator, voice activity detector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2269
411 Shape Error Concealment for Shape Independent Transform Coding

Authors: Sandra Ondrušová, Jaroslav Polec

Abstract:

Arbitrarily shaped video objects are an important concept in modern video coding methods. The techniques presently used are not based on image elements but rather video objects having an arbitrary shape. In this paper, spatial shape error concealment techniques to be used for object-based image in error-prone environments are proposed. We consider a geometric shape representation consisting of the object boundary, which can be extracted from the α-plane. Three different approaches are used to replace a missing boundary segment: Bézier interpolation, Bézier approximation and NURBS approximation. Experimental results on object shape with different concealment difficulty demonstrate the performance of the proposed methods. Comparisons with proposed methods are also presented.

Keywords: error concealment, shape coding, object-based image, NURBS, Bézier curves.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1237
410 Motion Estimator Architecture with Optimized Number of Processing Elements for High Efficiency Video Coding

Authors: Seongsoo Lee

Abstract:

Motion estimation occupies the heaviest computation in HEVC (high efficiency video coding). Many fast algorithms such as TZS (test zone search) have been proposed to reduce the computation. Still the huge computation of the motion estimation is a critical issue in the implementation of HEVC video codec. In this paper, motion estimator architecture with optimized number of PEs (processing element) is presented by exploiting early termination. It also reduces hardware size by exploiting parallel processing. The presented motion estimator architecture has 8 PEs, and it can efficiently perform TZS with very high utilization of PEs.

Keywords: Motion estimation, test zone search, high efficiency video coding, processing element, optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1512
409 An ICA Algorithm for Separation of Convolutive Mixture of Speech Signals

Authors: Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano

Abstract:

This paper describes Independent Component Analysis (ICA) based fixed-point algorithm for the blind separation of the convolutive mixture of speech, picked-up by a linear microphone array. The proposed algorithm extracts independent sources by non- Gaussianizing the Time-Frequency Series of Speech (TFSS) in a deflationary way. The degree of non-Gaussianization is measured by negentropy. The relative performances of algorithm under random initialization and Null beamformer (NBF) based initialization are studied. It has been found that an NBF based initial value gives speedy convergence as well as better separation performance

Keywords: Blind signal separation, independent component analysis, negentropy, convolutive mixture.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1730
408 An Advanced Method for Speech Recognition

Authors: Meysam Mohamad pour, Fardad Farokhi

Abstract:

In this paper in consideration of each available techniques deficiencies for speech recognition, an advanced method is presented that-s able to classify speech signals with the high accuracy (98%) at the minimum time. In the presented method, first, the recorded signal is preprocessed that this section includes denoising with Mels Frequency Cepstral Analysis and feature extraction using discrete wavelet transform (DWT) coefficients; Then these features are fed to Multilayer Perceptron (MLP) network for classification. Finally, after training of neural network effective features are selected with UTA algorithm.

Keywords: Multilayer perceptron (MLP) neural network, Discrete Wavelet Transform (DWT) , Mels Scale Frequency Filter , UTA algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2309
407 Wavelet Compression of ECG Signals Using SPIHT Algorithm

Authors: Mohammad Pooyan, Ali Taheri, Morteza Moazami-Goudarzi, Iman Saboori

Abstract:

In this paper we present a novel approach for wavelet compression of electrocardiogram (ECG) signals based on the set partitioning in hierarchical trees (SPIHT) coding algorithm. SPIHT algorithm has achieved prominent success in image compression. Here we use a modified version of SPIHT for one dimensional signals. We applied wavelet transform with SPIHT coding algorithm on different records of MIT-BIH database. The results show the high efficiency of this method in ECG compression.

Keywords: ECG compression, wavelet, SPIHT.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2353
406 The Effect of Different Compression Schemes on Speech Signals

Authors: Jalal Karam, Raed Saad

Abstract:

This paper studies the effect of different compression constraints and schemes presented in a new and flexible paradigm to achieve high compression ratios and acceptable signal to noise ratios of Arabic speech signals. Compression parameters are computed for variable frame sizes of a level 5 to 7 Discrete Wavelet Transform (DWT) representation of the signals for different analyzing mother wavelet functions. Results are obtained and compared for Global threshold and level dependent threshold techniques. The results obtained also include comparisons with Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error.

Keywords: Speech Compression, Wavelets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1691
405 Highly Scalable, Reversible and Embedded Image Compression System

Authors: Federico Pérez González, Iñaki Goiricelaia Ordorika, Pedro Iriondo Bengoa

Abstract:

A new method for low complexity image coding is presented, that permits different settings and great scalability in the generation of the final bit stream. This coding presents a continuoustone still image compression system that groups loss and lossless compression making use of finite arithmetic reversible transforms. Both transformation in the space of color and wavelet transformation are reversible. The transformed coefficients are coded by means of a coding system in depending on a subdivision into smaller components (CFDS) similar to the bit importance codification. The subcomponents so obtained are reordered by means of a highly configure alignment system depending on the application that makes possible the re-configure of the elements of the image and obtaining different levels of importance from which the bit stream will be generated. The subcomponents of each level of importance are coded using a variable length entropy coding system (VBLm) that permits the generation of an embedded bit stream. This bit stream supposes itself a bit stream that codes a compressed still image. However, the use of a packing system on the bit stream after the VBLm allows the realization of a final highly scalable bit stream from a basic image level and one or several enhance levels.

Keywords: Image compression, wavelet transform, highlyscalable, reversible transform, embedded, subcomponents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1361
404 Real Time Object Tracking in H.264/ AVC Using Polar Vector Median and Block Coding Modes

Authors: T. Kusuma, K. Ashwini

Abstract:

This paper presents a real time video surveillance system which is capable of tracking multiple real time objects using Polar Vector Median (PVM) and Block Coding Modes (BCM) with Global Motion Compensation (GMC). This strategy works in the packed area and furthermore utilizes the movement vectors and BCM from the compressed bit stream to perform real time object tracking. We propose to do this in view of the neighboring Motion Vectors (MVs) using a method called PVM. Since GM adds to the object’s native motion, for accurate tracking, it is important to remove GM from the MV field prior to further processing. The proposed method is tested on a number of standard sequences and the results show its advantages over some of the current modern methods.

Keywords: Block coding mode, global motion compensation, object tracking, polar vector median, video surveillance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 701
403 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses

Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas

Abstract:

We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.

Keywords: Transient noise pulses, noise reduction, dynamic time warping, speech recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1862
402 Unit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis

Authors: Mohamed Ali KAMMOUN, Ahmed Ben HAMIDA

Abstract:

In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection. In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost. In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora.

Keywords: Unit selection, Corpus-based Speech Synthesis, Bigram model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1400
401 Comparison between Turbo Code and Convolutional Product Code (CPC) for Mobile WiMAX

Authors: Ahmed Ebian, Mona Shokair, Kamal Awadalla

Abstract:

Mobile WiMAX is a broadband wireless solution that enables convergence of mobile and fixed broadband networks through a common wide area broadband radio access technology and flexible network architecture. It adopts Orthogonal Frequency Division Multiple Access (OFDMA) for improved multi-path performance in Non-Line-Of-Sight (NLOS) environments. Scalable OFDMA (SOFDMA) is introduced in the IEEE 802e[1]. WIMAX system uses one of different types of channel coding but The mandatory channel coding scheme is based on binary nonrecursive Convolutional Coding (CC). There are other several optional channel coding schemes such as block turbo codes, convolutional turbo codes, and low density parity check (LDPC). In this paper a comparison between the performance of WIMAX using turbo code and using convolutional product code (CPC) [2] is made. Also a combination between them had been done. The CPC gives good results at different SNR values compared to both the turbo system, and the combination between them. For example, at BER equal to 10-2 for 128 subcarriers, the amount of improvement in SNR equals approximately 3 dB higher than turbo code and equals approximately 2dB higher than the combination respectively. Several results are obtained at different modulating schemes (16QAM and 64QAM) and different numbers of sub-carriers (128 and 512).

Keywords: Turbo Code, Convolutional Product Code (CPC), Convolutional Product Code (CPC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3345
400 Reversible, Embedded and Highly Scalable Image Compression System

Authors: Federico Pérez González, Iñaki Goirizelaia Ordorika, Pedro Iriondo Bengoa

Abstract:

In this work a new method for low complexity image coding is presented, that permits different settings and great scalability in the generation of the final bit stream. This coding presents a continuous-tone still image compression system that groups loss and lossless compression making use of finite arithmetic reversible transforms. Both transformation in the space of color and wavelet transformation are reversible. The transformed coefficients are coded by means of a coding system in depending on a subdivision into smaller components (CFDS) similar to the bit importance codification. The subcomponents so obtained are reordered by means of a highly configure alignment system depending on the application that makes possible the re-configure of the elements of the image and obtaining different importance levels from which the bit stream will be generated. The subcomponents of each importance level are coded using a variable length entropy coding system (VBLm) that permits the generation of an embedded bit stream. This bit stream supposes itself a bit stream that codes a compressed still image. However, the use of a packing system on the bit stream after the VBLm allows the realization of a final highly scalable bit stream from a basic image level and one or several improvement levels.

Keywords: Image compression, wavelet transform, highly scalable, reversible transform, embedded, subcomponents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1254
399 Puff Noise Detection and Cancellation for Robust Speech Recognition

Authors: Sangjun Park, Jungpyo Hong, Byung-Ok Kang, Yun-keun Lee, Minsoo Hahn

Abstract:

In this paper, an algorithm for detecting and attenuating puff noises frequently generated under the mobile environment is proposed. As a baseline system, puff detection system is designed based on Gaussian Mixture Model (GMM), and 39th Mel Frequency Cepstral Coefficient (MFCC) is extracted as feature parameters. To improve the detection performance, effective acoustic features for puff detection are proposed. In addition, detected puff intervals are attenuated by high-pass filtering. The speech recognition rate was measured for evaluation and confusion matrix and ROC curve are used to confirm the validity of the proposed system.

Keywords: Gaussian mixture model, puff detection and cancellation, speech enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2184
398 A New H.264-Based Rate Control Algorithm for Stereoscopic Video Coding

Authors: Yi Liao, Wencheng Yang, Gangyi Jiang

Abstract:

According to investigating impact of complexity of stereoscopic frame pairs on stereoscopic video coding and transmission, a new rate control algorithm is presented. The proposed rate control algorithm is performed on three levels: stereoscopic group of pictures (SGOP) level, stereoscopic frame (SFrame) level and frame level. A temporal-spatial frame complexity model is firstly established, in the bits allocation stage, the frame complexity, position significance and reference property between the left and right frames are taken into account. Meanwhile, the target buffer is set according to the frame complexity. Experimental results show that the proposed method can efficiently control the bitrates, and it outperforms the fixed quantization parameter method from the rate distortion perspective, and average PSNR gain between rate-distortion curves (BDPSNR) is 0.21dB.

Keywords: Stereoscopic video coding, rate control, stereoscopic group of pictures, complexity of stereoscopic frame pairs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1676