Search results for: speech signal analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 28501

Search results for: speech signal analysis

28411 On Overcoming Common Oral Speech Problems through Authentic Films

Authors: Tamara Matevosyan

Abstract:

The present paper discusses the main problems that students face while developing oral skills through authentic films. It states that special attention should be paid not only to the study of verbal speech but also to non-verbal communication. Authentic films serve as an important tool to understand both native speaker’s gestures and their culture of pausing while speaking. Various phonetic difficulties causing phonetic interference in actual speech are covered in the paper emphasizing the role of authentic films in overcoming them.

Keywords: compressive speech, filled pauses, unfilled pauses, pausing culture

Procedia PDF Downloads 322
28410 The Communicative Nature of Linguistic Interference in Learning and Teaching of Slavic Languages

Authors: Kseniia Fedorova

Abstract:

The article is devoted to interlinguistic homonymy and enantiosemy analysis. These phenomena belong to the process of linguistic interference, which leads to violation of the communicative utterances integrity and causes misunderstanding between foreign interlocutors - native speakers of different Slavic languages. More attention is paid to investigation of non-typical speech situations, which occurred spontaneously or created by somebody intentionally being based on described phenomenon mechanism. The classification of typical students' mistakes connected with the paradox of interference is being represented in the article. The survey contributes to speech act theory, contemporary linguodidactics, translation science and comparative lexicology of Slavonic languages.

Keywords: adherent enantiosemy, interference, interslavonic homonymy, speech act

Procedia PDF Downloads 223
28409 Analysis of Injection-Lock in Oscillators versus Phase Variation of Injected Signal

Authors: M. Yousefi, N. Nasirzadeh

Abstract:

In this paper, behavior of an oscillator under injection of another signal has been investigated. Also, variation of output signal amplitude versus injected signal phase variation, the effect of varying the amplitude of injected signal and quality factor of the oscillator has been investigated. The results show that the locking time depends on phase and the best locking time happens at 180-degrees phase. Also, the effect of injected lock has been discussed. Simulations show that the locking time decreases with signal injection to bulk. Locking time has been investigated versus various phase differences. The effect of phase and amplitude changes on locking time of a typical LC oscillator in 180 nm technology has been investigated.

Keywords: analysis, oscillator, injection-lock oscillator, phase modulation

Procedia PDF Downloads 329
28408 Development of a Tesla Music Coil from Signal Processing

Authors: Samaniego Campoverde José Enrique, Rosero Muñoz Jorge Enrique, Luzcando Narea Lorena Elizabeth

Abstract:

This paper presents a practical and theoretical model for the operation of the Tesla coil using digital signal processing. The research is based on the analysis of ten scientific papers exploring the development and operation of the Tesla coil. Starting from the Testa coil, several modifications were carried out on the Tesla coil, with the aim of amplifying the digital signal by making use of digital signal processing. To achieve this, an amplifier with a transistor and digital filters provided by MATLAB software were used, which were chosen according to the characteristics of the signals in question.

Keywords: tesla coil, digital signal process, equalizer, graphical environment

Procedia PDF Downloads 89
28407 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 269
28406 Comparison Study of Machine Learning Classifiers for Speech Emotion Recognition

Authors: Aishwarya Ravindra Fursule, Shruti Kshirsagar

Abstract:

In the intersection of artificial intelligence and human-centered computing, this paper delves into speech emotion recognition (SER). It presents a comparative analysis of machine learning models such as K-Nearest Neighbors (KNN),logistic regression, support vector machines (SVM), decision trees, ensemble classifiers, and random forests, applied to SER. The research employs four datasets: Crema D, SAVEE, TESS, and RAVDESS. It focuses on extracting salient audio signal features like Zero Crossing Rate (ZCR), Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), root mean square (RMS) value, and MelSpectogram. These features are used to train and evaluate the models’ ability to recognize eight types of emotions from speech: happy, sad, neutral, angry, calm, disgust, fear, and surprise. Among the models, the Random Forest algorithm demonstrated superior performance, achieving approximately 79% accuracy. This suggests its suitability for SER within the parameters of this study. The research contributes to SER by showcasing the effectiveness of various machine learning algorithms and feature extraction techniques. The findings hold promise for the development of more precise emotion recognition systems in the future. This abstract provides a succinct overview of the paper’s content, methods, and results.

Keywords: comparison, ML classifiers, KNN, decision tree, SVM, random forest, logistic regression, ensemble classifiers

Procedia PDF Downloads 22
28405 Performance Analysis of VoIP Coders for Different Modulations Under Pervasive Environment

Authors: Jasbinder Singh, Harjit Pal Singh, S. A. Khan

Abstract:

The work, in this paper, presents the comparison of encoded speech signals by different VoIP narrow-band and wide-band codecs for different modulation schemes. The simulation results indicate that codec has an impact on the speech quality and also effected by modulation schemes.

Keywords: VoIP, coders, modulations, BER, MOS

Procedia PDF Downloads 487
28404 Absence of Developmental Change in Epenthetic Vowel Duration in Japanese Speakers’ English

Authors: Takayuki Konishi, Kakeru Yazawa, Mariko Kondo

Abstract:

This study examines developmental change in the production of epenthetic vowels by Japanese learners of English in relation to acquisition of L2 English speech rhythm. Seventy-two Japanese learners of English in the J-AESOP corpus were divided into lower- and higher-level learners according to their proficiency score and the frequency of vowel epenthesis. Three learners were excluded because no vowel epenthesis was observed in their utterances. The analysis of their read English speech data showed no statistical difference between lower- and higher-level learners, implying the absence of any developmental change in durations of epenthetic vowels. This result, together with the findings of previous studies, will be discussed in relation to the transfer of L1 phonology and manifestation of L2 English rhythm.

Keywords: vowel epenthesis, Japanese learners of English, L2 speech corpus, speech rhythm

Procedia PDF Downloads 247
28403 Lip Localization Technique for Myanmar Consonants Recognition Based on Lip Movements

Authors: Thein Thein, Kalyar Myo San

Abstract:

Lip reading system is one of the different supportive technologies for hearing impaired, or elderly people or non-native speakers. For normal hearing persons in noisy environments or in conditions where the audio signal is not available, lip reading techniques can be used to increase their understanding of spoken language. Hearing impaired persons have used lip reading techniques as important tools to find out what was said by other people without hearing voice. Thus, visual speech information is important and become active research area. Using visual information from lip movements can improve the accuracy and robustness of a speech recognition system and the need for lip reading system is ever increasing for every language. However, the recognition of lip movement is a difficult task because of the region of interest (ROI) is nonlinear and noisy. Therefore, this paper proposes method to detect the accurate lips shape and to localize lip movement towards automatic lip tracking by using the combination of Otsu global thresholding technique and Moore Neighborhood Tracing Algorithm. Proposed method shows how accurate lip localization and tracking which is useful for speech recognition. In this work of study and experiments will be carried out the automatic lip localizing the lip shape for Myanmar consonants using the only visual information from lip movements which is useful for visual speech of Myanmar languages.

Keywords: lip reading, lip localization, lip tracking, Moore neighborhood tracing algorithm

Procedia PDF Downloads 332
28402 Tensor Deep Stacking Neural Networks and Bilinear Mapping Based Speech Emotion Classification Using Facial Electromyography

Authors: P. S. Jagadeesh Kumar, Yang Yung, Wenli Hu

Abstract:

Speech emotion classification is a dominant research field in finding a sturdy and profligate classifier appropriate for different real-life applications. This effort accentuates on classifying different emotions from speech signal quarried from the features related to pitch, formants, energy contours, jitter, shimmer, spectral, perceptual and temporal features. Tensor deep stacking neural networks were supported to examine the factors that influence the classification success rate. Facial electromyography signals were composed of several forms of focuses in a controlled atmosphere by means of audio-visual stimuli. Proficient facial electromyography signals were pre-processed using moving average filter, and a set of arithmetical features were excavated. Extracted features were mapped into consistent emotions using bilinear mapping. With facial electromyography signals, a database comprising diverse emotions will be exposed with a suitable fine-tuning of features and training data. A success rate of 92% can be attained deprived of increasing the system connivance and the computation time for sorting diverse emotional states.

Keywords: speech emotion classification, tensor deep stacking neural networks, facial electromyography, bilinear mapping, audio-visual stimuli

Procedia PDF Downloads 224
28401 Italian Speech Vowels Landmark Detection through the Legacy Tool 'xkl' with Integration of Combined CNNs and RNNs

Authors: Kaleem Kashif, Tayyaba Anam, Yizhi Wu

Abstract:

This paper introduces a methodology for advancing Italian speech vowels landmark detection within the distinctive feature-based speech recognition domain. Leveraging the legacy tool 'xkl' by integrating combined convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the study presents a comprehensive enhancement to the 'xkl' legacy software. This integration incorporates re-assigned spectrogram methodologies, enabling meticulous acoustic analysis. Simultaneously, our proposed model, integrating combined CNNs and RNNs, demonstrates unprecedented precision and robustness in landmark detection. The augmentation of re-assigned spectrogram fusion within the 'xkl' software signifies a meticulous advancement, particularly enhancing precision related to vowel formant estimation. This augmentation catalyzes unparalleled accuracy in landmark detection, resulting in a substantial performance leap compared to conventional methods. The proposed model emerges as a state-of-the-art solution in the distinctive feature-based speech recognition systems domain. In the realm of deep learning, a synergistic integration of combined CNNs and RNNs is introduced, endowed with specialized temporal embeddings, harnessing self-attention mechanisms, and positional embeddings. The proposed model allows it to excel in capturing intricate dependencies within Italian speech vowels, rendering it highly adaptable and sophisticated in the distinctive feature domain. Furthermore, our advanced temporal modeling approach employs Bayesian temporal encoding, refining the measurement of inter-landmark intervals. Comparative analysis against state-of-the-art models reveals a substantial improvement in accuracy, highlighting the robustness and efficacy of the proposed methodology. Upon rigorous testing on a database (LaMIT) speech recorded in a silent room by four Italian native speakers, the landmark detector demonstrates exceptional performance, achieving a 95% true detection rate and a 10% false detection rate. A majority of missed landmarks were observed in proximity to reduced vowels. These promising results underscore the robust identifiability of landmarks within the speech waveform, establishing the feasibility of employing a landmark detector as a front end in a speech recognition system. The synergistic integration of re-assigned spectrogram fusion, CNNs, RNNs, and Bayesian temporal encoding not only signifies a significant advancement in Italian speech vowels landmark detection but also positions the proposed model as a leader in the field. The model offers distinct advantages, including unparalleled accuracy, adaptability, and sophistication, marking a milestone in the intersection of deep learning and distinctive feature-based speech recognition. This work contributes to the broader scientific community by presenting a methodologically rigorous framework for enhancing landmark detection accuracy in Italian speech vowels. The integration of cutting-edge techniques establishes a foundation for future advancements in speech signal processing, emphasizing the potential of the proposed model in practical applications across various domains requiring robust speech recognition systems.

Keywords: landmark detection, acoustic analysis, convolutional neural network, recurrent neural network

Procedia PDF Downloads 33
28400 Correlation between Speech Emotion Recognition Deep Learning Models and Noises

Authors: Leah Lee

Abstract:

This paper examines the correlation between deep learning models and emotions with noises to see whether or not noises mask emotions. The deep learning models used are plain convolutional neural networks (CNN), auto-encoder, long short-term memory (LSTM), and Visual Geometry Group-16 (VGG-16). Emotion datasets used are Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Crowd-sourced Emotional Multimodal Actors Dataset (CREMA-D), Toronto Emotional Speech Set (TESS), and Surrey Audio-Visual Expressed Emotion (SAVEE). To make it four times bigger, audio set files, stretch, and pitch augmentations are utilized. From the augmented datasets, five different features are extracted for inputs of the models. There are eight different emotions to be classified. Noise variations are white noise, dog barking, and cough sounds. The variation in the signal-to-noise ratio (SNR) is 0, 20, and 40. In summation, per a deep learning model, nine different sets with noise and SNR variations and just augmented audio files without any noises will be used in the experiment. To compare the results of the deep learning models, the accuracy and receiver operating characteristic (ROC) are checked.

Keywords: auto-encoder, convolutional neural networks, long short-term memory, speech emotion recognition, visual geometry group-16

Procedia PDF Downloads 50
28399 Grammatical and Lexical Cohesion in the Japan’s Prime Minister Shinzo Abe’s Speech Text ‘Nihon wa Modottekimashita’

Authors: Nadya Inda Syartanti

Abstract:

This research aims to identify, classify, and analyze descriptively the aspects of grammatical and lexical cohesion in the speech text of Japan’s Prime Minister Shinzo Abe entitled Nihon wa Modotte kimashita delivered in Washington DC, the United States on February 23, 2013, as a research data source. The method used is qualitative research, which uses descriptions through words that are applied by analyzing aspects of grammatical and lexical cohesion proposed by Halliday and Hasan (1976). The aspects of grammatical cohesion consist of references (personal, demonstrative, interrogative pronouns), substitution, ellipsis, and conjunction. In contrast, lexical cohesion consists of reiteration (repetition, synonym, antonym, hyponym, meronym) and collocation. Data classification is based on the 6 aspects of the cohesion. Through some aspects of cohesion, this research tries to find out the frequency of using grammatical and lexical cohesion in Shinzo Abe's speech text entitled Nihon wa Modotte kimashita. The results of this research are expected to help overcome the difficulty of understanding speech texts in Japanese. Therefore, this research can be a reference for learners, researchers, and anyone who is interested in the field of discourse analysis.

Keywords: cohesion, grammatical cohesion, lexical cohesion, speech text, Shinzo Abe

Procedia PDF Downloads 136
28398 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 168
28397 Wavelet Based Residual Method of Detecting GSM Signal Strength Fading

Authors: Danladi Ali, Onah Festus Iloabuchi

Abstract:

In this paper, GSM signal strength was measured in order to detect the type of the signal fading phenomenon using one-dimensional multilevel wavelet residual method and neural network clustering to determine the average GSM signal strength received in the study area. The wavelet residual method predicted that the GSM signal experienced slow fading and attenuated with MSE of 3.875dB. The neural network clustering revealed that mostly -75dB, -85dB and -95dB were received. This means that the signal strength received in the study is a weak signal.

Keywords: one-dimensional multilevel wavelets, path loss, GSM signal strength, propagation, urban environment

Procedia PDF Downloads 319
28396 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System

Authors: Tadesse Anberbir, Bankole Felix, Tomio Takara

Abstract:

In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation, but neither is shown in orthography. In this paper, to proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test, and we achieved an average Mean Opinion Score (MOS) 3.4 (68%), which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.

Keywords: amharic, gemination, Speech synthesis, morphology, epenthesis

Procedia PDF Downloads 59
28395 Text-to-Speech in Azerbaijani Language via Transfer Learning in a Low Resource Environment

Authors: Dzhavidan Zeinalov, Bugra Sen, Firangiz Aslanova

Abstract:

Most text-to-speech models cannot operate well in low-resource languages and require a great amount of high-quality training data to be considered good enough. Yet, with the improvements made in ASR systems, it is now much easier than ever to collect data for the design of custom text-to-speech models. In this work, our work on using the ASR model to collect data to build a viable text-to-speech system for one of the leading financial institutions of Azerbaijan will be outlined. NVIDIA’s implementation of the Tacotron 2 model was utilized along with the HiFiGAN vocoder. As for the training, the model was first trained with high-quality audio data collected from the Internet, then fine-tuned on the bank’s single speaker call center data. The results were then evaluated by 50 different listeners and got a mean opinion score of 4.17, displaying that our method is indeed viable. With this, we have successfully designed the first text-to-speech model in Azerbaijani and publicly shared 12 hours of audiobook data for everyone to use.

Keywords: Azerbaijani language, HiFiGAN, Tacotron 2, text-to-speech, transfer learning, whisper

Procedia PDF Downloads 17
28394 Classification of Cochannel Signals Using Cyclostationary Signal Processing and Deep Learning

Authors: Bryan Crompton, Daniel Giger, Tanay Mehta, Apurva Mody

Abstract:

The task of classifying radio frequency (RF) signals has seen recent success in employing deep neural network models. In this work, we present a combined signal processing and machine learning approach to signal classification for cochannel anomalous signals. The power spectral density and cyclostationary signal processing features of a captured signal are computed and fed into a neural net to produce a classification decision. Our combined signal preprocessing and machine learning approach allows for simpler neural networks with fast training times and small computational resource requirements for inference with longer preprocessing time.

Keywords: signal processing, machine learning, cyclostationary signal processing, signal classification

Procedia PDF Downloads 82
28393 Hate Speech Detection Using Machine Learning: A Survey

Authors: Edemealem Desalegn Kingawa, Kafte Tasew Timkete, Mekashaw Girmaw Abebe, Terefe Feyisa, Abiyot Bitew Mihretie, Senait Teklemarkos Haile

Abstract:

Currently, hate speech is a growing challenge for society, individuals, policymakers, and researchers, as social media platforms make it easy to anonymously create and grow online friends and followers and provide an online forum for debate about specific issues of community life, culture, politics, and others. Despite this, research on identifying and detecting hate speech is not satisfactory performance, and this is why future research on this issue is constantly called for. This paper provides a systematic review of the literature in this field, with a focus on approaches like word embedding techniques, machine learning, deep learning technologies, hate speech terminology, and other state-of-the-art technologies with challenges. In this paper, we have made a systematic review of the last six years of literature from Research Gate and Google Scholar. Furthermore, limitations, along with algorithm selection and use challenges, data collection, and cleaning challenges, and future research directions, are discussed in detail.

Keywords: Amharic hate speech, deep learning approach, hate speech detection review, Afaan Oromo hate speech detection

Procedia PDF Downloads 147
28392 A Stylistic Analysis of the Short Story ‘The Escape’ by Qaisra Shahraz

Authors: Huma Javed

Abstract:

Stylistics is a broad term that is concerned with both literature and linguistics, due to which the significance of the stylistics increases. This research aims to analyze Qaisra Shahraz's short story ‘The Escape’ from the stylistic analysis viewpoint. The focus of this study is on three aspects grammar category, lexical category, and figure of speech of the short story. The research designs for this article are both explorative and descriptive. The analysis of the data shows that the writer has used more nouns in the story as compared to other lexical items, which suggests that story has a descriptive style rather than narrative.

Keywords: The Escape, stylistics, grammatical category, lexical category, figure of speech

Procedia PDF Downloads 199
28391 FPGA Implementation of a Marginalized Particle Filter for Delineation of P and T Waves of ECG Signal

Authors: Jugal Bhandari, K. Hari Priya

Abstract:

The ECG signal provides important clinical information which could be used to pretend the diseases related to heart. Accordingly, delineation of ECG signal is an important task. Whereas delineation of P and T waves is a complex task. This paper deals with the Study of ECG signal and analysis of signal by means of Verilog Design of efficient filters and MATLAB tool effectively. It includes generation and simulation of ECG signal, by means of real time ECG data, ECG signal filtering and processing by analysis of different algorithms and techniques. In this paper, we design a basic particle filter which generates a dynamic model depending on the present and past input samples and then produces the desired output. Afterwards, the output will be processed by MATLAB to get the actual shape and accurate values of the ranges of P-wave and T-wave of ECG signal. In this paper, Questasim is a tool of mentor graphics which is being used for simulation and functional verification. The same design is again verified using Xilinx ISE which will be also used for synthesis, mapping and bit file generation. Xilinx FPGA board will be used for implementation of system. The final results of FPGA shall be verified with ChipScope Pro where the output data can be observed.

Keywords: ECG, MATLAB, Bayesian filtering, particle filter, Verilog hardware descriptive language

Procedia PDF Downloads 345
28390 Voice Signal Processing and Coding in MATLAB Generating a Plasma Signal in a Tesla Coil for a Security System

Authors: Juan Jimenez, Erika Yambay, Dayana Pilco, Brayan Parra

Abstract:

This paper presents an investigation of voice signal processing and coding using MATLAB, with the objective of generating a plasma signal on a Tesla coil within a security system. The approach focuses on using advanced voice signal processing techniques to encode and modulate the audio signal, which is then amplified and applied to a Tesla coil. The result is the creation of a striking visual effect of voice-controlled plasma with specific applications in security systems. The article explores the technical aspects of voice signal processing, the generation of the plasma signal, and its relationship to security. The implications and creative potential of this technology are discussed, highlighting its relevance at the forefront of research in signal processing and visual effect generation in the field of security systems.

Keywords: voice signal processing, voice signal coding, MATLAB, plasma signal, Tesla coil, security system, visual effects, audiovisual interaction

Procedia PDF Downloads 60
28389 Analysis of the Impact of Refractivity on Ultra High Frequency Signal Strength over Gusau, North West, Nigeria

Authors: B. G. Ayantunji, B. Musa, H. Mai-Unguwa, L. A. Sunmonu, A. S. Adewumi, L. Sa'ad, A. Kado

Abstract:

For achieving reliable and efficient communication system, both terrestrial and satellite communication, surface refractivity is critical in planning and design of radio links. This study analyzed the impact of atmospheric parameters on Ultra High Frequency (UHF) signal strength over Gusau, North West, Nigeria. The analysis exploited meteorological data measured simultaneously with UHF signal strength for the month of June 2017 using a Davis Vantage Pro2 automatic weather station and UHF signal strength measuring devices respectively. The instruments were situated at the premise of Federal University, Gusau (6° 78' N, 12° 13' E). The refractivity values were computed using ITU-R model. The result shows that the refractivity value attained the highest value of 366.28 at 2200hr and a minimum value of 350.66 at 2100hr local time. The correlation between signal strength and refractivity is 0.350; Humidity is 0.532 and a negative correlation of -0.515 for temperature.

Keywords: refractivity, UHF (ultra high frequency) signal strength, free space, automatic weather station

Procedia PDF Downloads 172
28388 All Optical Wavelength Conversion Based On Four Wave Mixing in Optical Fiber

Authors: Surinder Singh, Gursewak Singh Lovkesh

Abstract:

We have designed wavelength conversion based on four wave mixing in an optical fiber at 10 Gb/s. The power of converted signal increases with increase in signal power. The converted signal power is investigated as a function of input signal power and pump power. On comparison of converted signal power at different value of input signal power, we observe that best converted signal power is obtained at -2 dBm input signal power for both up conversion as well as for down conversion. Further, FWM efficiency, quality factor is observed for increase in input signal power and optical fiber length.

Keywords: FWM, optical fiiber, wavelngth converter, quality

Procedia PDF Downloads 555
28387 Wavelet Based Signal Processing for Fault Location in Airplane Cable

Authors: Reza Rezaeipour Honarmandzad

Abstract:

Wavelet analysis is an exciting method for solving difficult problems in mathematics, physics, and engineering, with modern applications as diverse as wave propagation, data compression, signal processing, image processing, pattern recognition, etc. Wavelets allow complex information such as signals, images and patterns to be decomposed into elementary forms at different positions and scales and subsequently reconstructed with high precision. In this paper a wavelet-based signal processing algorithm for airplane cable fault location is proposed. An orthogonal discrete wavelet decomposition and reconstruction algorithm is used to eliminate the noise in the aircraft cable fault signal. The experiment result has shown that the character of emission pulse and reflect pulse used to test the aircraft cable fault point are reserved and the high-frequency noise are eliminated by means of the proposed algorithm in this paper.

Keywords: wavelet analysis, signal processing, orthogonal discrete wavelet, noise, aircraft cable fault signal

Procedia PDF Downloads 494
28386 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System

Authors: Tadesse Anberbir, Felix Bankole, Tomio Takara, Girma Mamo

Abstract:

In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. In this paper, we proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions, and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test and we achieved an average Mean Opinion Score (MOS) 3.4 (68%) which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.

Keywords: Amharic, gemination, speech synthesis, morphology, epenthesis

Procedia PDF Downloads 56
28385 An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures.

Keywords: Filipino language, Hidden Markov Model, HTK system, speech recognition

Procedia PDF Downloads 450
28384 Recognition by the Voice and Speech Features of the Emotional State of Children by Adults and Automatically

Authors: Elena E. Lyakso, Olga V. Frolova, Yuri N. Matveev, Aleksey S. Grigorev, Alexander S. Nikolaev, Viktor A. Gorodnyi

Abstract:

The study of the children’s emotional sphere depending on age and psychoneurological state is of great importance for the design of educational programs for children and their social adaptation. Atypical development may be accompanied by violations or specificities of the emotional sphere. To study characteristics of the emotional state reflection in the voice and speech features of children, the perceptual study with the participation of adults and the automatic recognition of speech were conducted. Speech of children with typical development (TD), with Down syndrome (DS), and with autism spectrum disorders (ASD) aged 6-12 years was recorded. To obtain emotional speech in children, model situations were created, including a dialogue between the child and the experimenter containing questions that can cause various emotional states in the child and playing with a standard set of toys. The questions and toys were selected, taking into account the child’s age, developmental characteristics, and speech skills. For the perceptual experiment by adults, test sequences containing speech material of 30 children: TD, DS, and ASD were created. The listeners were 100 adults (age 19.3 ± 2.3 years). The listeners were tasked with determining the children’s emotional state as “comfort – neutral – discomfort” while listening to the test material. Spectrographic analysis of speech signals was conducted. For automatic recognition of the emotional state, 6594 speech files containing speech material of children were prepared. Automatic recognition of three states, “comfort – neutral – discomfort,” was performed using automatically extracted from the set of acoustic features - the Geneva Minimalistic Acoustic Parameter Set (GeMAPS) and the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS). The results showed that the emotional state is worse determined by the speech of TD children (comfort – 58% of correct answers, discomfort – 56%). Listeners better recognized discomfort in children with ASD and DS (78% of answers) than comfort (70% and 67%, respectively, for children with DS and ASD). The neutral state is better recognized by the speech of children with ASD (67%) than by the speech of children with DS (52%) and TD children (54%). According to the automatic recognition data using the acoustic feature set GeMAPSv01b, the accuracy of automatic recognition of emotional states for children with ASD is 0.687; children with DS – 0.725; TD children – 0.641. When using the acoustic feature set eGeMAPSv01b, the accuracy of automatic recognition of emotional states for children with ASD is 0.671; children with DS – 0.717; TD children – 0.631. The use of different models showed similar results, with better recognition of emotional states by the speech of children with DS than by the speech of children with ASD. The state of comfort is automatically determined better by the speech of TD children (precision – 0.546) and children with ASD (0.523), discomfort – children with DS (0.504). The data on the specificities of recognition by adults of the children’s emotional state by their speech may be used in recruitment for working with children with atypical development. Automatic recognition data can be used to create alternative communication systems and automatic human-computer interfaces for social-emotional learning. Acknowledgment: This work was financially supported by the Russian Science Foundation (project 18-18-00063).

Keywords: autism spectrum disorders, automatic recognition of speech, child’s emotional speech, Down syndrome, perceptual experiment

Procedia PDF Downloads 166
28383 Detection of Phoneme [S] Mispronounciation for Sigmatism Diagnosis in Adults

Authors: Michal Krecichwost, Zauzanna Miodonska, Pawel Badura

Abstract:

The diagnosis of sigmatism is mostly based on the observation of articulatory organs. It is, however, not always possible to precisely observe the vocal apparatus, in particular in the oral cavity of the patient. Speech processing can allow to objectify the therapy and simplify the verification of its progress. In the described study the methodology for classification of incorrectly pronounced phoneme [s] is proposed. The recordings come from adults. They were registered with the speech recorder at the sampling rate of 44.1 kHz and the resolution of 16 bit. The database of pathological and normative speech has been collected for the study including reference assessments provided by the speech therapy experts. Ten adult subjects were asked to simulate a certain type of stigmatism under the speech therapy expert supervision. In the recordings, the analyzed phone [s] was surrounded by vowels, viz: ASA, ESE, ISI, SPA, USU, YSY. Thirteen MFCC (mel-frequency cepstral coefficients) and RMS (root mean square) values are calculated within each frame being a part of the analyzed phoneme. Additionally, 3 fricative formants along with corresponding amplitudes are determined for the entire segment. In order to aggregate the information within the segment, the average value of each MFCC coefficient is calculated. All features of other types are aggregated by means of their 75th percentile. The proposed method of features aggregation reduces the size of the feature vector used in the classification. Binary SVM (support vector machine) classifier is employed at the phoneme recognition stage. The first group consists of pathological phones, while the other of the normative ones. The proposed feature vector yields classification sensitivity and specificity measures above 90% level in case of individual logo phones. The employment of a fricative formants-based information improves the sole-MFCC classification results average of 5 percentage points. The study shows that the employment of specific parameters for the selected phones improves the efficiency of pathology detection referred to the traditional methods of speech signal parameterization.

Keywords: computer-aided pronunciation evaluation, sibilants, sigmatism diagnosis, speech processing

Procedia PDF Downloads 259
28382 Transient Signal Generator For Fault Indicator Testing

Authors: Mohamed Shaban, Ali Alfallah

Abstract:

This paper describes an application for testing of a fault indicator but it could be used for other network protection testing. The application is created in the LabVIEW environment and consists of three parts. The first part of the application is determined for transient phenomenon generation and imitates voltage and current transient signal at ground fault originate. The second part allows to set sequences of trend for each current and voltage output signal, up to six trends for each phase. The last part of the application generates harmonic signal with continuously controllable amplitude of current or voltage output signal and phase shift of each signal can be changed there. Further any sub-harmonics and upper harmonics can be added to selected current output signal

Keywords: signal generator-fault indicator, harmonic signal generator, voltage output

Procedia PDF Downloads 474