Search results for: voice activity detection (VAD)
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9936

Search results for: voice activity detection (VAD)

9936 A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30dB SNR as a reference for voice activity.

Keywords: atomic decomposition, gabor, gammatone, matching pursuit, voice activity detection

Procedia PDF Downloads 290
9935 Graph Neural Networks and Rotary Position Embedding for Voice Activity Detection

Authors: YingWei Tan, XueFeng Ding

Abstract:

Attention-based voice activity detection models have gained significant attention in recent years due to their fast training speed and ability to capture a wide contextual range. The inclusion of multi-head style and position embedding in the attention architecture are crucial. Having multiple attention heads allows for differential focus on different parts of the sequence, while position embedding provides guidance for modeling dependencies between elements at various positions in the input sequence. In this work, we propose an approach by considering each head as a node, enabling the application of graph neural networks (GNN) to identify correlations among the different nodes. In addition, we adopt an implementation named rotary position embedding (RoPE), which encodes absolute positional information into the input sequence by a rotation matrix, and naturally incorporates explicit relative position information into a self-attention module. We evaluate the effectiveness of our method on a synthetic dataset, and the results demonstrate its superiority over the baseline CRNN in scenarios with low signal-to-noise ratio and noise, while also exhibiting robustness across different noise types. In summary, our proposed framework effectively combines the strengths of CNN and RNN (LSTM), and further enhances detection performance through the integration of graph neural networks and rotary position embedding.

Keywords: voice activity detection, CRNN, graph neural networks, rotary position embedding

Procedia PDF Downloads 71
9934 Voice Liveness Detection Using Kolmogorov Arnold Networks

Authors: Arth J. Shah, Madhu R. Kamble

Abstract:

Voice biometric liveness detection is customized to certify an authentication process of the voice data presented is genuine and not a recording or synthetic voice. With the rise of deepfakes and other equivalently sophisticated spoofing generation techniques, it’s becoming challenging to ensure that the person on the other end is a live speaker or not. Voice Liveness Detection (VLD) system is a group of security measures which detect and prevent voice spoofing attacks. Motivated by the recent development of the Kolmogorov-Arnold Network (KAN) based on the Kolmogorov-Arnold theorem, we proposed KAN for the VLD task. To date, multilayer perceptron (MLP) based classifiers have been used for the classification tasks. We aim to capture not only the compositional structure of the model but also to optimize the values of univariate functions. This study explains the mathematical as well as experimental analysis of KAN for VLD tasks, thereby opening a new perspective for scientists to work on speech and signal processing-based tasks. This study emerges as a combination of traditional signal processing tasks and new deep learning models, which further proved to be a better combination for VLD tasks. The experiments are performed on the POCO and ASVSpoof 2017 V2 database. We used Constant Q-transform, Mel, and short-time Fourier transform (STFT) based front-end features and used CNN, BiLSTM, and KAN as back-end classifiers. The best accuracy is 91.26 % on the POCO database using STFT features with the KAN classifier. In the ASVSpoof 2017 V2 database, the lowest EER we obtained was 26.42 %, using CQT features and KAN as a classifier.

Keywords: Kolmogorov Arnold networks, multilayer perceptron, pop noise, voice liveness detection

Procedia PDF Downloads 39
9933 Effect of Helium and Sulfur Hexafluoride Gas Inhalation on Voice Resonances

Authors: Pallavi Marathe

Abstract:

Voice is considered to be a unique biometric property of human beings. Unlike other biometric evidence, for example, fingerprints and retina scans, etc., voice can be easily changed or mimicked. The present paper talks about how the inhalation of helium and sulfur hexafluoride (SF6) gas affects the voice formant frequencies that are the resonant frequencies of the vocal tract. Helium gas is low-density gas; hence, the voice travels with a higher speed than that of air. On the other side in SF6 gas voice travels with lower speed than that of air due to its higher density. These results in decreasing the resonant frequencies of voice in helium and increasing in SF6. Results are presented with the help of Praat software, which is used for voice analysis.

Keywords: voice formants, helium, sulfur hexafluoride, gas inhalation

Procedia PDF Downloads 125
9932 Comparing Sounds of the Singing Voice

Authors: Christel Elisabeth Bonin

Abstract:

This experiment aims at showing that classical singing and belting have both different singing qualities, but singing with a speaking voice has no singing quality. For this purpose, a singing female voice was recorded on four different tone pitches, singing the vowel ‘a’ by using 3 different kinds of singing - classical trained voice, belting voice and speaking voice. The recordings have been entered in the Software Praat. Then the formants of each recorded tone were compared to each other and put in relationship to the singer’s formant. The visible results are taken as an indicator of comparable sound qualities of a classical trained female voice and a belting female voice concerning the concentration of overtones in F1 to F5 and a lack of sound quality in the speaking voice for singing purpose. The results also show that classical singing and belting are both valuable vocal techniques for singing due to their richness of overtones and that belting is not comparable to shouting or screaming. Singing with a speaking voice in contrast should not be called singing due to the lack of overtones which means by definition that there is no musical tone.

Keywords: formants, overtone, singer’s formant, singing voice, belting, classical singing, singing with the speaking voice

Procedia PDF Downloads 328
9931 SLIITBOT: Design of a Socially Assistive Robot for SLIIT

Authors: Chandimal Jayawardena, Ridmal Mendis, Manoji Tennakoon, Theekshana Wijayathilaka, Randima Marasinghe

Abstract:

This research paper defines the research area of the implementation of the socially assistive robot (SLIITBOT). It consists of the overall process implemented within the robot’s system and limitations, along with a literature survey. This project considers developing a socially assistive robot called SLIITBOT that will interact using its voice outputs and graphical user interface with people within the university and benefit them with updates and tasks. The robot will be able to detect a person when he/she enters the room, navigate towards the position the human is standing, welcome and greet the particular person with a simple conversation using its voice, introduce the services through its voice, and provide the person with services through an electronic input via an app while guiding the person with voice outputs.

Keywords: application, detection, dialogue, navigation

Procedia PDF Downloads 169
9930 The Voice Rehabilitation Program Following Ileocolon Flap Transfer for Voice Reconstruction after Laryngectomy

Authors: Chi-Wen Huang, Hung-Chi Chen

Abstract:

Total laryngectomy affects swallowing, speech functions and life quality in the head and neck cancer. Voice restoration plays an important role in social activities and communication. Several techniques have been developed for voice restoration and reported to improve the life quality. However, the rehabilitation program for voice reconstruction by using the ileocolon flap still unclear. A retrospective study was done, and the patients' data were drawn from the medical records between 2010 and 2016 who underwent voice reconstruction by ileocolon flap after laryngectomy. All of them were trained to swallow first; then, the voice rehabilitation was started. The outcome of voice was evaluated after 6 months using the 4-point scoring scale. In our result, 9.8% patients could give very clear voice so everyone could understand their speech, 61% patients could be understood well by families and friends, 20.2% patients could only talk with family, and 9% patients had difficulty to be understood. Moreover, the 57% patients did not need a second surgery, but in 43% patients voice was made clear by a second surgery. In this study, we demonstrated that the rehabilitation program after voice reconstruction with ileocolon flap for post-laryngectomy patients is important because the anatomical structure is different from the normal larynx.

Keywords: post-laryngectomy, ileocolon flap, rehabilitation, voice reconstruction

Procedia PDF Downloads 156
9929 The Effect of the Hemispheres of the Brain and the Tone of Voice on Persuasion

Authors: Rica Jell de Laza, Jose Alberto Fernandez, Andrea Marie Mendoza, Qristin Jeuel Regalado

Abstract:

This study investigates whether participants experience different levels of persuasion depending on the hemisphere of the brain and the tone of voice. The experiment was performed on 96 volunteer undergraduate students taking an introductory course in psychology. The participants took part in a 2 x 3 (Hemisphere: left, right x Tone of Voice: positive, neutral, negative) Mixed Factorial Design to measure how much a person was persuaded. Results showed that the hemisphere of the brain and the tone of voice used did not significantly affect the results individually. Furthermore, there was no interaction effect. Therefore, the hemispheres of the brain and the tone of voice employed play insignificant roles in persuading a person.

Keywords: dichotic listening, brain hemisphere, tone of voice, persuasion

Procedia PDF Downloads 306
9928 Experimental Study on the Heat Transfer Characteristics of the 200W Class Woofer Speaker

Authors: Hyung-Jin Kim, Dae-Wan Kim, Moo-Yeon Lee

Abstract:

The objective of this study is to experimentally investigate the heat transfer characteristics of 200 W class woofer speaker units with the input voice signals. The temperature and heat transfer characteristics of the 200 W class woofer speaker unit were experimentally tested with the several input voice signals such as 1500 Hz, 2500 Hz, and 5000 Hz respectively. From the experiments, it can be observed that the temperature of the woofer speaker unit including the voice-coil part increases with a decrease in input voice signals. Also, the temperature difference in measured points of the voice coil is increased with decrease of the input voice signals. In addition, the heat transfer characteristics of the woofer speaker in case of the input voice signal of 1500 Hz is 40% higher than that of the woofer speaker in case of the input voice signal of 5000 Hz at the measuring time of 200 seconds. It can be concluded from the experiments that initially the temperature of the voice signal increases rapidly with time, after a certain period of time it increases exponentially. Also during this time dependent temperature change, it can be observed that high voice signal is stable than low voice signal.

Keywords: heat transfer, temperature, voice coil, woofer speaker

Procedia PDF Downloads 360
9927 Advanced Mouse Cursor Control and Speech Recognition Module

Authors: Prasad Kalagura, B. Veeresh kumar

Abstract:

We constructed an interface system that would allow a similarly paralyzed user to interact with a computer with almost full functional capability. A real-time tracking algorithm is implemented based on adaptive skin detection and motion analysis. The clicking of the mouse is activated by the user's eye blinking through a sensor. The keyboard function is implemented by voice recognition kit.

Keywords: embedded ARM7 processor, mouse pointer control, voice recognition

Procedia PDF Downloads 578
9926 Blind Speech Separation Using SRP-PHAT Localization and Optimal Beamformer in Two-Speaker Environments

Authors: Hai Quang Hong Dam, Hai Ho, Minh Hoang Le Ngo

Abstract:

This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.

Keywords: blind speech separation, voice activity detector, SRP-PHAT, optimal beamformer

Procedia PDF Downloads 283
9925 Detection of Autistic Children's Voice Based on Artificial Neural Network

Authors: Royan Dawud Aldian, Endah Purwanti, Soegianto Soelistiono

Abstract:

In this research we have been developed an automatic investigation to classify normal children voice or autistic by using modern computation technology that is computation based on artificial neural network. The superiority of this computation technology is its capability on processing and saving data. In this research, digital voice features are gotten from the coefficient of linear-predictive coding with auto-correlation method and have been transformed in frequency domain using fast fourier transform, which used as input of artificial neural network in back-propagation method so that will make the difference between normal children and autistic automatically. The result of back-propagation method shows that successful classification capability for normal children voice experiment data is 100% whereas, for autistic children voice experiment data is 100%. The success rate using back-propagation classification system for the entire test data is 100%.

Keywords: autism, artificial neural network, backpropagation, linier predictive coding, fast fourier transform

Procedia PDF Downloads 461
9924 The Functions of the Student Voice and Student-Centred Teaching Practices in Classroom-Based Music Education

Authors: Sofia Douklia

Abstract:

The present context paper aims to present the important role of ‘student voice’ and the music teacher in the classroom, which contributes to more student-centered music education. The aim is to focus on the functions of the student voice through the music spectrum, which has been born in the music classroom, and the teacher’s methodologies and techniques used in the music classroom. The music curriculum, the principles of student-centered music education, and the role of students and teachers as music ambassadors have been considered the major music parameters of student voice. The student- voice is a worth-mentioning aspect of a student-centered education, and all teachers should consider and promote its existence in their classroom.

Keywords: student's voice, student-centered education, music ambassadors, music teachers

Procedia PDF Downloads 91
9923 Voice over IP Quality of Service Evaluation for Mobile Ad Hoc Network in an Indoor Environment for Different Voice Codecs

Authors: Lina Abou Haibeh, Nadir Hakem, Ousama Abu Safia

Abstract:

In this paper, the performance and quality of Voice over IP (VoIP) calls carried over a Mobile Ad Hoc Network (MANET) which has a number of SIP nodes registered on a SIP Proxy are analyzed. The testing campaigns are carried out in an indoor corridor structure having a well-defined channel’s characteristics and model for the different voice codecs, G.711, G.727 and G.723.1. These voice codecs are commonly used in VoIP technology. The calls’ quality are evaluated using four Quality of Service (QoS) metrics, namely, mean opinion score (MOS), jitter, delay, and packet loss. The relationship between the wireless channel’s parameters and the optimum codec is well-established. According to the experimental results, the voice codec G.711 has the best performance for the proposed MANET topology

Keywords: wireless channel modelling, Voip, MANET, session initiation protocol (SIP), QoS

Procedia PDF Downloads 228
9922 On Voice in English: An Awareness Raising Attempt on Passive Voice

Authors: Meral Melek Unver

Abstract:

This paper aims to explore ways to help English as a Foreign Language (EFL) learners notice and revise voice in English and raise their awareness of when and how to use active and passive voice to convey meaning in their written and spoken work. Because passive voice is commonly preferred in certain genres such as academic essays and news reports, despite the current trends promoting active voice, it is essential for learners to be fully aware of the meaning, use and form of passive voice to better communicate. The participants in the study are 22 EFL learners taking a one-year intensive English course at a university, who will receive English medium education (EMI) in their departmental studies in the following academic year. Data from students’ written and oral work was collected over a four-week period and the misuse or inaccurate use of passive voice was identified. The analysis of the data proved that they failed to make sensible decisions about when and how to use passive voice partly because the differences between their mother tongue and English and because they were not aware of the fact that active and passive voice would not alternate all the time. To overcome this, a Test-Teach-Test shape lesson, as opposed to a Present-Practice-Produce shape lesson, was designed and implemented to raise their awareness of the decisions they needed to make in choosing the voice and help them notice the meaning and use of passive voice through concept checking questions. The results first suggested that awareness raising activities on the meaning and use of voice in English would be beneficial in having accurate and meaningful outcomes from students. Also, helping students notice and renotice passive voice through carefully designed activities would help them internalize the use and form of it. As a result of the study, a number of activities are suggested to revise and notice passive voice as well as a short questionnaire to help EFL teachers to self-reflect on their teaching.

Keywords: voice in English, test-teach-test, passive voice, English language teaching

Procedia PDF Downloads 221
9921 Phone Number Spoofing Attack in VoLTE 4G

Authors: Joo-Hyung Oh

Abstract:

The number of service users of 4G VoLTE (voice over LTE) using LTE data networks is rapidly growing. VoLTE based on all-IP network enables clearer and higher-quality voice calls than 3G. It does, however, pose new challenges; a voice call through IP networks makes it vulnerable to security threats such as wiretapping and forged or falsified information. And in particular, stealing other users’ phone numbers and forging or falsifying call request messages from outgoing voice calls within VoLTE result in considerable losses that include user billing and voice phishing to acquaintances. This paper focuses on the threats of caller phone number spoofing in the VoLTE and countermeasure technology as safety measures for mobile communication networks.

Keywords: LTE, 4G, VoLTE, phone number spoofing

Procedia PDF Downloads 432
9920 Speaker Recognition Using LIRA Neural Networks

Authors: Nestor A. Garcia Fragoso, Tetyana Baydyk, Ernst Kussul

Abstract:

This article contains information from our investigation in the field of voice recognition. For this purpose, we created a voice database that contains different phrases in two languages, English and Spanish, for men and women. As a classifier, the LIRA (Limited Receptive Area) grayscale neural classifier was selected. The LIRA grayscale neural classifier was developed for image recognition tasks and demonstrated good results. Therefore, we decided to develop a recognition system using this classifier for voice recognition. From a specific set of speakers, we can recognize the speaker’s voice. For this purpose, the system uses spectrograms of the voice signals as input to the system, extracts the characteristics and identifies the speaker. The results are described and analyzed in this article. The classifier can be used for speaker identification in security system or smart buildings for different types of intelligent devices.

Keywords: extreme learning, LIRA neural classifier, speaker identification, voice recognition

Procedia PDF Downloads 177
9919 Voice Signal Processing and Coding in MATLAB Generating a Plasma Signal in a Tesla Coil for a Security System

Authors: Juan Jimenez, Erika Yambay, Dayana Pilco, Brayan Parra

Abstract:

This paper presents an investigation of voice signal processing and coding using MATLAB, with the objective of generating a plasma signal on a Tesla coil within a security system. The approach focuses on using advanced voice signal processing techniques to encode and modulate the audio signal, which is then amplified and applied to a Tesla coil. The result is the creation of a striking visual effect of voice-controlled plasma with specific applications in security systems. The article explores the technical aspects of voice signal processing, the generation of the plasma signal, and its relationship to security. The implications and creative potential of this technology are discussed, highlighting its relevance at the forefront of research in signal processing and visual effect generation in the field of security systems.

Keywords: voice signal processing, voice signal coding, MATLAB, plasma signal, Tesla coil, security system, visual effects, audiovisual interaction

Procedia PDF Downloads 93
9918 Phone Number Spoofing Attack in VoLTE

Authors: Joo-Hyung Oh, Sekwon Kim, Myoungsun Noh, Chaetae Im

Abstract:

The number of service users of 4G VoLTE (voice over LTE) using LTE data networks is rapidly growing. VoLTE based on All-IP network enables clearer and higher-quality voice calls than 3G. It does, however, pose new challenges; a voice call through IP networks makes it vulnerable to security threats such as wiretapping and forged or falsified information. Moreover, in particular, stealing other users’ phone numbers and forging or falsifying call request messages from outgoing voice calls within VoLTE result in considerable losses that include user billing and voice phishing to acquaintances. This paper focuses on the threats of caller phone number spoofing in the VoLTE and countermeasure technology as safety measures for mobile communication networks.

Keywords: LTE, 4G, VoLTE, phone number spoofing

Procedia PDF Downloads 522
9917 Integrated Gesture and Voice-Activated Mouse Control System

Authors: Dev Pratap Singh, Harshika Hasija, Ashwini S.

Abstract:

The project aims to provide a touchless, intuitive interface for human-computer interaction, enabling users to control their computers using hand gestures and voice commands. The system leverages advanced computer vision techniques using the Media Pipe framework and OpenCV to detect and interpret real-time hand gestures, transforming them into mouse actions such as clicking, dragging, and scrolling. Additionally, the integration of a voice assistant powered by the speech recognition library allows for seamless execution of tasks like web searches, location navigation, and gesture control in the system through voice commands.

Keywords: gesture recognition, hand tracking, machine learning, convolutional neural networks, natural language processing, voice assistant

Procedia PDF Downloads 10
9916 Attack Redirection and Detection using Honeypots

Authors: Chowduru Ramachandra Sharma, Shatunjay Rawat

Abstract:

A false positive state is when the IDS/IPS identifies an activity as an attack, but the activity is acceptable behavior in the system. False positives in a Network Intrusion Detection System ( NIDS ) is an issue because they desensitize the administrator. It wastes computational power and valuable resources when rules are not tuned properly, which is the main issue with anomaly NIDS. Furthermore, most false positives reduction techniques are not performed during the real-time of attempted intrusions; instead, they have applied afterward on collected traffic data and generate alerts. Of course, false positives detection in ‘offline mode’ is tremendously valuable. Nevertheless, there is room for improvement here; automated techniques still need to reduce False Positives in real-time. This paper uses the Snort signature detection model to redirect the alerted attacks to Honeypots and verify attacks.

Keywords: honeypot, TPOT, snort, NIDS, honeybird, iptables, netfilter, redirection, attack detection, docker, snare, tanner

Procedia PDF Downloads 156
9915 HRV Analysis Based Arrhythmic Beat Detection Using kNN Classifier

Authors: Onder Yakut, Oguzhan Timus, Emine Dogru Bolat

Abstract:

Health diseases have a vital significance affecting human being's life and life quality. Sudden death events can be prevented owing to early diagnosis and treatment methods. Electrical signals, taken from the human being's body using non-invasive methods and showing the heart activity is called Electrocardiogram (ECG). The ECG signal is used for following daily activity of the heart by clinicians. Heart Rate Variability (HRV) is a physiological parameter giving the variation between the heart beats. ECG data taken from MITBIH Arrhythmia Database is used in the model employed in this study. The detection of arrhythmic heart beats is aimed utilizing the features extracted from the HRV time domain parameters. The developed model provides a satisfactory performance with ~89% accuracy, 91.7 % sensitivity and 85% specificity rates for the detection of arrhythmic beats.

Keywords: arrhythmic beat detection, ECG, HRV, kNN classifier

Procedia PDF Downloads 352
9914 Detection of Telomerase Activity as Cancer Biomarker Using Nanogap-Rich Au Nanowire SERS Sensor

Authors: G. Eom, H. Kim, A. Hwang, T. Kang, B. Kim

Abstract:

Telomerase activity is overexpressed in over 85% of human cancers while suppressed in normal somatic cells. Telomerase has been attracted as a universal cancer biomarker. Therefore, the development of effective telomerase activity detection methods is urgently demanded in cancer diagnosis and therapy. Herein, we report a nanogap-rich Au nanowire (NW) surface-enhanced Raman scattering (SERS) sensor for detection of human telomerase activity. The nanogap-rich Au NW SERS sensors were prepared simply by uniformly depositing nanoparticles (NPs) on single-crystalline Au NWs. We measured SERS spectra of methylene blue (MB) from 60 different nanogap-rich Au NWs and obtained the relative standard deviation (RSD) of 4.80%, confirming the superb reproducibility of nanogap-rich Au NW SERS sensors. The nanogap-rich Au NW SERS sensors enable us to detect telomerase activity in 0.2 cancer cells/mL. Furthermore, telomerase activity is detectable in 7 different cancer cell lines whereas undetectable in normal cell lines, which suggest the potential applicability of nanogap-rich Au NW SERS sensor in cancer diagnosis. We expect that the present nanogap-rich Au NW SERS sensor can be useful in biomedical applications including a diverse biomarker sensing.

Keywords: cancer biomarker, nanowires, surface-enhanced Raman scattering, telomerase

Procedia PDF Downloads 349
9913 Interaction between Breathiness and Nasality: An Acoustic Analysis

Authors: Pamir Gogoi, Ratree Wayland

Abstract:

This study investigates the acoustic measures of breathiness when coarticulated with nasality. The acoustic correlates of breathiness and nasality that has already been well established after years of empirical research. Some of these acoustic parameters - like low frequency peaks and wider bandwidths- are common for both nasal and breathy voice. Therefore, it is likely that these parameters interact when a sound is coarticulated with breathiness and nasality. This leads to the hypothesis that the acoustic parameters, which usually act as robust cues in differentiating between breathy and modal voice, might not be reliable cues for differentiating between breathy and modal voice when breathiness is coarticulated with nasality. The effect of nasality on the perception of breathiness has been explored in earlier studies using synthesized speech. The results showed that perceptually, nasality and breathiness do interact. The current study investigates if a similar pattern is observed in natural speech. The study is conducted on Marathi, an Indo-Aryan language which has a three-way contrast between nasality and breathiness. That is, there is a phonemic distinction between nasals, breathy voice and breathy-nasals. Voice quality parameters like – H1-H2 (Difference between the amplitude of first and second harmonic), H1-A3 (Difference between the amplitude of first harmonic and third formant, CPP (Cepstral Peak Prominence), HNR (Harmonics to Noise ratio) and B1 (Bandwidth of first formant) were extracted. Statistical models like linear mixed effects regression and Random Forest classifiers show that measures that capture the noise component in the signal- like CPP and HNR- can classify breathy voice from modal voice better than spectral measures when breathy voice is coarticulated with nasality.

Keywords: breathiness, marathi, nasality, voice quality

Procedia PDF Downloads 95
9912 Assessing the Preparedness of Teachers for Their Role in an Inclusive Classroom: Photo-Voice as a Reflexive Tool

Authors: Nan Stevens

Abstract:

Photo-voice is a participatory method through which participants identify and represent their lived experiences and contexts through the use of photo imagery. Photo-voice is a qualitative research method that explores individuals’ lived experiences. This method is known as a creative art form to help researchers listen to the 'voice' of a certain population. A teacher educator at Thompson Rivers University, responsible for preparing new teachers for the demands of the profession in an ever-changing demographic, utilized the Photo-voice method to enable a self-study of emerging teachers’ readiness for the inclusive classroom. Coding analysis was applied to 96 Photo-voice portfolios, which were created over two years with the Inclusive Education course work, in a Bachelor of Education program (Elementary). Coding utilized students’ written associations to their visual images, anecdotes attached to visual metaphors, and personal narratives that illustrated the professional development process in which they were engaged. Thematic findings include: 1) becoming an inclusive educator is a process; 2) one must be open to identifying and exploring their fear and biases, and 3) an attitudinal shift enables relevant skill acquisition and readiness for working with diverse student needs.

Keywords: teacher education, inclusive education, professional development, Photo-voice

Procedia PDF Downloads 135
9911 Voice Quality in Italian-Speaking Children with Autism

Authors: Patrizia Bonaventura, Magda Di Renzo

Abstract:

This project aims to measure and assess the voice quality in children with autism. Few previous studies exist which have analyzed the voice quality of individuals with autism: abnormal voice characteristics have been found, like a high pitch, great pitch range, and sing-song quality. Existing studies did not focus specifically on Italian-speaking children’s voices and provided analysis of a few acoustic parameters. The present study aimed to gather more data and to perform acoustic analysis of the voice of children with autism in order to identify patterns of abnormal voice features that might shed some light on the causes of the dysphonia and possibly be used to create a pediatric assessment tool for early identification of autism. The participants were five native Italian-speaking boys with autism between the age of 4 years and 10 years (mean 6.8 ± SD 1.4). The children had a diagnosis of autism, were verbal, and had no other comorbid conditions (like Down syndrome or ADHD). The voices of the autistic children were recorded in the production of sustained vowels [ah] and [ih] and of sentences from the Italian version of the CAPE-V voice assessment test. The following voice parameters, representative of normal quality, were analyzed by acoustic spectrography through Praat: Speaking Fundamental Frequency, F0 range, average intensity, and dynamic range. The results showed that the pitch parameters (Speaking Fundamental Frequency and F0 range), as well as the intensity parameters (average intensity and dynamic range), were significantly different from the relative normal reference thresholds. Also, variability among children was found, so confirming a tendency revealed in previous studies of individual variation in these aspects of voice quality. The results indicate a general pattern of abnormal voice quality characterized by a high pitch and large variations in pitch and intensity. These acoustic voice characteristics found in Italian-speaking autistic children match those found in children speaking other languages, indicating that autism symptoms affecting voice quality might be independent of the native language of the children.

Keywords: autism, voice disorders, speech science, acoustic analysis of voice

Procedia PDF Downloads 71
9910 Comparing Emotion Recognition from Voice and Facial Data Using Time Invariant Features

Authors: Vesna Kirandziska, Nevena Ackovska, Ana Madevska Bogdanova

Abstract:

The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.

Keywords: emotion recognition, facial recognition, signal processing, machine learning

Procedia PDF Downloads 316
9909 EEG Diagnosis Based on Phase Space with Wavelet Transforms for Epilepsy Detection

Authors: Mohmmad A. Obeidat, Amjed Al Fahoum, Ayman M. Mansour

Abstract:

The recognition of an abnormal activity of the brain functionality is a vital issue. To determine the type of the abnormal activity either a brain image or brain signal are usually considered. Imaging localizes the defect within the brain area and relates this area with somebody functionalities. However, some functions may be disturbed without affecting the brain as in epilepsy. In this case, imaging may not provide the symptoms of the problem. A cheaper yet efficient approach that can be utilized to detect abnormal activity is the measurement and analysis of the electroencephalogram (EEG) signals. The main goal of this work is to come up with a new method to facilitate the classification of the abnormal and disorder activities within the brain directly using EEG signal processing, which makes it possible to be applied in an on-line monitoring system.

Keywords: EEG, wavelet, epilepsy, detection

Procedia PDF Downloads 538
9908 Prophylactic Replacement of Voice Prosthesis: A Study to Predict Prosthesis Lifetime

Authors: Anne Heirman, Vincent van der Noort, Rob van Son, Marije Petersen, Lisette van der Molen, Gyorgy Halmos, Richard Dirven, Michiel van den Brekel

Abstract:

Objective: Voice prosthesis leakage significantly impacts laryngectomies patients' quality of life, causing insecurity and frequent unplanned hospital visits and costs. In this study, the concept of prophylactic voice prosthesis replacement was explored to prevent leakages. Study Design: A retrospective cohort study. Setting: Tertiary hospital. Methods: Device lifetimes and voice prosthesis replacements of a retrospective cohort, including all patients with laryngectomies between 2000 and 2012 in the Netherlands Cancer Institute, were used to calculate the number of needed voice prostheses per patient per year when preventing 70% of the leakages by prophylactic replacement. Various strategies for the timing of prophylactic replacement were considered: Adaptive strategies based on the individual patient’s history of replacement and fixed strategies based on the results of patients with similar voice prosthesis or treatment characteristics. Results: Patients used a median of 3.4 voice prostheses per year (range 0.1-48.1). We found a high inter-and intrapatient variability in device lifetime. When applying prophylactic replacement, this would become a median of 9.4 voice prostheses per year, which means replacement every 38 days, implying more than six additional voice prostheses per patient per year. The individual adaptive model showed that preventing 70% of the leakages was impossible for most patients, and only a median of 25% can be prevented. Monte-Carlo simulations showed that prophylactic replacement is not feasible due to the high Coefficient of Variation (Standard Deviation/Mean) in device lifetime. Conclusion: Based on our simulations, prophylactic replacement of voice prostheses is not feasible due to high inter-and intrapatient variation in device lifetime.

Keywords: voice prosthesis, voice rehabilitation, total laryngectomy, prosthetic leakage, device lifetime

Procedia PDF Downloads 129
9907 An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System

Authors: Ben Soltane Cheima, Ittansa Yonas Kelbesa

Abstract:

Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.

Keywords: feature extraction, speaker modeling, feature matching, Mel frequency cepstrum coefficient (MFCC), Gaussian mixture model (GMM), vector quantization (VQ), Linde-Buzo-Gray (LBG), expectation maximization (EM), pre-processing, voice activity detection (VAD), short time energy (STE), background noise statistical modeling, closed-set tex-independent speaker identification system (CISI)

Procedia PDF Downloads 309