Search results for: collecting speech emotion dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2809

Search results for: collecting speech emotion dataset

2629 Affective Robots: Evaluation of Automatic Emotion Recognition Approaches on a Humanoid Robot towards Emotionally Intelligent Machines

Authors: Silvia Santano Guillén, Luigi Lo Iacono, Christian Meder

Abstract:

One of the main aims of current social robotic research is to improve the robots’ abilities to interact with humans. In order to achieve an interaction similar to that among humans, robots should be able to communicate in an intuitive and natural way and appropriately interpret human affects during social interactions. Similarly to how humans are able to recognize emotions in other humans, machines are capable of extracting information from the various ways humans convey emotions—including facial expression, speech, gesture or text—and using this information for improved human computer interaction. This can be described as Affective Computing, an interdisciplinary field that expands into otherwise unrelated fields like psychology and cognitive science and involves the research and development of systems that can recognize and interpret human affects. To leverage these emotional capabilities by embedding them in humanoid robots is the foundation of the concept Affective Robots, which has the objective of making robots capable of sensing the user’s current mood and personality traits and adapt their behavior in the most appropriate manner based on that. In this paper, the emotion recognition capabilities of the humanoid robot Pepper are experimentally explored, based on the facial expressions for the so-called basic emotions, as well as how it performs in contrast to other state-of-the-art approaches with both expression databases compiled in academic environments and real subjects showing posed expressions as well as spontaneous emotional reactions. The experiments’ results show that the detection accuracy amongst the evaluated approaches differs substantially. The introduced experiments offer a general structure and approach for conducting such experimental evaluations. The paper further suggests that the most meaningful results are obtained by conducting experiments with real subjects expressing the emotions as spontaneous reactions.

Keywords: affective computing, emotion recognition, humanoid robot, human-robot-interaction (HRI), social robots

Procedia PDF Downloads 209
2628 Intrinsic Motivational Factor of Students in Learning Mathematics and Science Based on Electroencephalogram Signals

Authors: Norzaliza Md. Nor, Sh-Hussain Salleh, Mahyar Hamedi, Hadrina Hussain, Wahab Abdul Rahman

Abstract:

Motivational factor is mainly the students’ desire to involve in learning process. However, it also depends on the goal towards their involvement or non-involvement in academic activity. Even though, the students’ motivation might be in the same level, but the basis of their motivation may differ. In this study, it focuses on the intrinsic motivational factor which student enjoy learning or feeling of accomplishment the activity or study for its own sake. The intrinsic motivational factor of students in learning mathematics and science has found as difficult to be achieved because it depends on students’ interest. In the Program for International Student Assessment (PISA) for mathematics and science, Malaysia is ranked as third lowest. The main problem in Malaysian educational system, students tend to have extrinsic motivation which they have to score in exam in order to achieve a good result and enrolled as university students. The use of electroencephalogram (EEG) signals has found to be scarce especially to identify the students’ intrinsic motivational factor in learning science and mathematics. In this research study, we are identifying the correlation between precursor emotion and its dynamic emotion to verify the intrinsic motivational factor of students in learning mathematics and science. The 2-D Affective Space Model (ASM) was used in this research in order to identify the relationship of precursor emotion and its dynamic emotion based on the four basic emotions, happy, calm, fear and sad. These four basic emotions are required to be used as reference stimuli. Then, in order to capture the brain waves, EEG device was used, while Mel Frequency Cepstral Coefficient (MFCC) was adopted to be used for extracting the features before it will be feed to Multilayer Perceptron (MLP) to classify the valence and arousal axes for the ASM. The results show that the precursor emotion had an influence the dynamic emotions and it identifies that most students have no interest in mathematics and science according to the negative emotion (sad and fear) appear in the EEG signals. We hope that these results can help us further relate the behavior and intrinsic motivational factor of students towards learning of mathematics and science.

Keywords: EEG, MLP, MFCC, intrinsic motivational factor

Procedia PDF Downloads 342
2627 A Cross-Dialect Statistical Analysis of Final Declarative Intonation in Tuvinian

Authors: D. Beziakina, E. Bulgakova

Abstract:

This study continues the research on Tuvinian intonation and presents a general cross-dialect analysis of intonation of Tuvinian declarative utterances, specifically the character of the tone movement in order to test the hypothesis about the prevalence of level tone in some Tuvinian dialects. The results of the analysis of basic pitch characteristics of Tuvinian speech (in general and in comparison with two other Turkic languages - Uzbek and Azerbaijani) are also given in this paper. The goal of our work was to obtain the ranges of pitch parameter values typical for Tuvinian speech. Such language-specific values can be used in speaker identification systems in order to get more accurate results of ethnic speech analysis. We also present the results of a cross-dialect analysis of declarative intonation in the poorly studied Tuvinian language.

Keywords: speech analysis, statistical analysis, speaker recognition, identification of person

Procedia PDF Downloads 444
2626 A Profile of the Patients at the Hearing and Speech Clinic at the University of Jordan: A Retrospective Study

Authors: Maisa Haj-Tas, Jehad Alaraifi

Abstract:

The significance of the study: This retrospective study examined the speech and language profiles of patients who received clinical services at the University of Jordan Hearing and Speech Clinic (UJ-HSC) from 2009 to 2014. The UJ-HSC clinic is located in the capital Amman and was established in the late 1990s. It is the first hearing and speech clinic in Jordan and one of first speech and hearing clinics in the Middle East. This clinic provides services to an annual average of 2000 patients who are diagnosed with different communication disorders. Examining the speech and language profiles of patients in this clinic could provide an insight about the most common disorders seen in patients who attend similar clinics in Jordan. It could also provide information about community awareness of the role of speech therapists in the management of speech and language disorders. Methodology: The researchers examined the clinical records of 1140 patients (797 males and 343 females) who received clinical services at the UJ-HSC between the years 2009 and 2014 for the purpose of data analysis for this study. The main variables examined in the study were disorder type and gender. Participants were divided into four age groups: children, adolescents, adults, and older adults. The examined disorders were classified as either speech disorders, language disorders, or dysphagia (i.e., swallowing problems). The disorders were further classified as childhood language impairments, articulation disorders, stuttering, cluttering, voice disorders, aphasia, and dysphagia. Results: The results indicated that the prevalence for language disorders was the highest (50.7%) followed by speech disorders (48.3%), and dysphagia (0.9%). The majority of patients who were seen at the JU-HSC were diagnosed with childhood language impairments (47.3%) followed consecutively by articulation disorders (21.1%), stuttering (16.3%), voice disorders (12.1%), aphasia (2.2%), dysphagia (0.9%), and cluttering (0.2%). As for gender, the majority of patients seen at the clinic were males in all disorders except for voice disorders and cluttering. Discussion: The results of the present study indicate that the majority of examined patients were diagnosed with childhood language impairments. Based on this result, the researchers suggest that there seems to be a high prevalence of childhood language impairments among children in Jordan compared to other types of speech and language disorders. The researchers also suggest that there is a need for further examination of the actual prevalence data on speech and language disorders in Jordan. The fact that many of the children seen at the UJ-HSC were brought to the clinic either as a result of parental concern or teacher referral indicates that there seems to an increased awareness among parents and teachers about the services speech pathologists can provide about assessment and treatment of childhood speech and language disorders. The small percentage of other disorders (i.e., stuttering, cluttering, dysphasia, aphasia, and voice disorders) seen at the UJ-HSC may indicate a little awareness by the local community about the role of speech pathologists in the assessment and treatment of these disorders.

Keywords: clinic, disorders, language, profile, speech

Procedia PDF Downloads 293
2625 Measuring Emotion Dynamics on Facebook: Associations between Variability in Expressed Emotion and Psychological Functioning

Authors: Elizabeth M. Seabrook, Nikki S. Rickard

Abstract:

Examining time-dependent measures of emotion such as variability, instability, and inertia, provide critical and complementary insights into mental health status. Observing changes in the pattern of emotional expression over time could act as a tool to identify meaningful shifts between psychological well- and ill-being. From a practical standpoint, however, examining emotion dynamics day-to-day is likely to be burdensome and invasive. Utilizing social media data as a facet of lived experience can provide real-world, temporally specific access to emotional expression. Emotional language on social media may provide accurate and sensitive insights into individual and community mental health and well-being, particularly with focus placed on the within-person dynamics of online emotion expression. The objective of the current study was to examine the dynamics of emotional expression on the social network platform Facebook for active users and their relationship with psychological well- and ill-being. It was expected that greater positive and negative emotion variability, instability, and inertia would be associated with poorer psychological well-being and greater depression symptoms. Data were collected using a smartphone app, MoodPrism, which delivered demographic questionnaires, psychological inventories assessing depression symptoms and psychological well-being, and collected the Status Updates of consenting participants. MoodPrism also delivered an experience sampling methodology where participants completed items assessing positive affect, negative affect, and arousal, daily for a 30-day period. The number of positive and negative words in posts was extracted and automatically collated by MoodPrism. The relative proportion of positive and negative words from the total words written in posts was then calculated. Preliminary analyses have been conducted with the data of 9 participants. While these analyses are underpowered due to sample size, they have revealed trends that greater variability in the emotion valence expressed in posts is positively associated with greater depression symptoms (r(9) = .56, p = .12), as is greater instability in emotion valence (r(9) = .58, p = .099). Full data analysis utilizing time-series techniques to explore the Facebook data set will be presented at the conference. Identifying the features of emotion dynamics (variability, instability, inertia) that are relevant to mental health in social media emotional expression is a fundamental step in creating automated screening tools for mental health that are temporally sensitive, unobtrusive, and accurate. The current findings show how monitoring basic social network characteristics over time can provide greater depth in predicting risk and changes in depression and positive well-being.

Keywords: emotion, experience sampling methods, mental health, social media

Procedia PDF Downloads 220
2624 Environmentally Adaptive Acoustic Echo Suppression for Barge-in Speech Recognition

Authors: Jong Han Joo, Jung Hoon Lee, Young Sun Kim, Jae Young Kang, Seung Ho Choi

Abstract:

In this study, we propose a novel technique for acoustic echo suppression (AES) during speech recognition under barge-in conditions. Conventional AES methods based on spectral subtraction apply fixed weights to the estimated echo path transfer function (EPTF) at the current signal segment and to the EPTF estimated until the previous time interval. We propose a new approach that adaptively updates weight parameters in response to abrupt changes in the acoustic environment due to background noises or double-talk. Furthermore, we devised a voice activity detector and an initial time-delay estimator for barge-in speech recognition in communication networks. The initial time delay is estimated using log-spectral distance measure, as well as cross-correlation coefficients. The experimental results show that the developed techniques can be successfully applied in barge-in speech recognition systems.

Keywords: acoustic echo suppression, barge-in, speech recognition, echo path transfer function, initial delay estimator, voice activity detector

Procedia PDF Downloads 347
2623 The Clustering of Multiple Sclerosis Subgroups through L2 Norm Multifractal Denoising Technique

Authors: Yeliz Karaca, Rana Karabudak

Abstract:

Multifractal Denoising techniques are used in the identification of significant attributes by removing the noise of the dataset. Magnetic resonance (MR) image technique is the most sensitive method so as to identify chronic disorders of the nervous system such as Multiple Sclerosis. MRI and Expanded Disability Status Scale (EDSS) data belonging to 120 individuals who have one of the subgroups of MS (Relapsing Remitting MS (RRMS), Secondary Progressive MS (SPMS), Primary Progressive MS (PPMS)) as well as 19 healthy individuals in the control group have been used in this study. The study is comprised of the following stages: (i) L2 Norm Multifractal Denoising technique, one of the multifractal technique, has been used with the application on the MS data (MRI and EDSS). In this way, the new dataset has been obtained. (ii) The new MS dataset obtained from the MS dataset and L2 Multifractal Denoising technique has been applied to the K-Means and Fuzzy C Means clustering algorithms which are among the unsupervised methods. Thus, the clustering performances have been compared. (iii) In the identification of significant attributes in the MS dataset through the Multifractal denoising (L2 Norm) technique using K-Means and FCM algorithms on the MS subgroups and control group of healthy individuals, excellent performance outcome has been yielded. According to the clustering results based on the MS subgroups obtained in the study, successful clustering results have been obtained in the K-Means and FCM algorithms by applying the L2 norm of multifractal denoising technique for the MS dataset. Clustering performance has been more successful with the MS Dataset (L2_Norm MS Data Set) K-Means and FCM in which significant attributes are obtained by applying L2 Norm Denoising technique.

Keywords: clinical decision support, clustering algorithms, multiple sclerosis, multifractal techniques

Procedia PDF Downloads 141
2622 Role of Speech Articulation in English Language Learning

Authors: Khadija Rafi, Neha Jamil, Laiba Khalid, Meerub Nawaz, Mahwish Farooq

Abstract:

Speech articulation is a complex process to produce intelligible sounds with the help of precise movements of various structures within the vocal tract. All these structures in the vocal tract are named as articulators, which comprise lips, teeth, tongue, and palate. These articulators work together to produce a range of distinct phonemes, which happen to be the basis of language. It starts with the airstream from the lungs passing through the trachea and into oral and nasal cavities. When the air passes through the mouth, the tongue and the muscles around it form such coordination it creates certain sounds. It can be seen when the tongue is placed in different positions- sometimes near the alveolar ridge, soft palate, roof of the mouth or the back of the teeth which end up creating unique qualities of each phoneme. We can articulate vowels with open vocal tracts, but the height and position of the tongue is different every time depending upon each vowel, while consonants can be pronounced when we create obstructions in the airflow. For instance, the alphabet ‘b’ is a plosive and can be produced only by briefly closing the lips. Articulation disorders can not only affect communication but can also be a hurdle in speech production. To improve articulation skills for such individuals, doctors often recommend speech therapy, which involves various kinds of exercises like jaw exercises and tongue twisters. However, this disorder is more common in children who are going through developmental articulation issues right after birth, but in adults, it can be caused by injury, neurological conditions, or other speech-related disorders. In short, speech articulation is an essential aspect of productive communication, which also includes coordination of the specific articulators to produce different intelligible sounds, which are a vital part of spoken language.

Keywords: linguistics, speech articulation, speech therapy, language learning

Procedia PDF Downloads 37
2621 Hate Speech in Selected Nigerian Newspapers

Authors: Laurel Chikwado Madumere, Kevin O. Ugorji

Abstract:

A speech is said to be full of hate when it appropriates disparaging and vituperative locutions and/or appellations, which are riddled with prejudices and misconceptions about an antagonizing party on the grounds of gender, race, political orientation, religious affiliations, tribe, etc. Due largely to the dichotomies and polarities that exist in Nigeria across political ideological spectrum, tribal affiliations, and gender contradistinctions, there are possibilities for the existence of socioeconomic, religious and political conditions that would induce, provoke and catalyze hate speeches in Nigeria’s mainstream media. Therefore the aim of this paper is to investigate, using select daily newspapers in Nigeria, the extent and complexity of those likely hate speeches that emanate from the pluralism in Nigeria and to set in to relief, the discrepancies and contrariety in the interpretation of those hate words. To achieve the above, the paper shall be qualitative in orientation as it shall be using the Speech Act Theory of J. L. Austin and J. R. Searle to interpret and evaluate the hate speeches in the select Nigerian daily newspapers. Also this paper shall help to elucidate the conditions that generate hate, and inform the government and NGOs how best to approach those conditions and put an end to the possible violence and extremism that emanate from extreme cases of hate.

Keywords: extremism, gender, hate speech, pluralism, prejudice, speech act theory

Procedia PDF Downloads 124
2620 A Comparative Asessment of Some Algorithms for Modeling and Forecasting Horizontal Displacement of Ialy Dam, Vietnam

Authors: Kien-Trinh Thi Bui, Cuong Manh Nguyen

Abstract:

In order to simulate and reproduce the operational characteristics of a dam visually, it is necessary to capture the displacement at different measurement points and analyze the observed movement data promptly to forecast the dam safety. The accuracy of forecasts is further improved by applying machine learning methods to data analysis progress. In this study, the horizontal displacement monitoring data of the Ialy hydroelectric dam was applied to machine learning algorithms: Gaussian processes, multi-layer perceptron neural networks, and the M5-rules algorithm for modelling and forecasting of horizontal displacement of the Ialy hydropower dam (Vietnam), respectively, for analysing. The database which used in this research was built by collecting time series of data from 2006 to 2021 and divided into two parts: training dataset and validating dataset. The final results show all three algorithms have high performance for both training and model validation, but the MLPs is the best model. The usability of them are further investigated by comparison with a benchmark models created by multi-linear regression. The result show the performance which obtained from all the GP model, the MLPs model and the M5-Rules model are much better, therefore these three models should be used to analyze and predict the horizontal displacement of the dam.

Keywords: Gaussian processes, horizontal displacement, hydropower dam, Ialy dam, M5-Rules, multi-layer perception neural networks

Procedia PDF Downloads 177
2619 Dancing with Perfectionism and Emotional Inhibition on the Ground of Disordered Eating Behaviors: Investigating Emotion Regulation Difficulties as Mediating Factor

Authors: Merve Denizci Nazligul

Abstract:

Dancers seem to have much higher risk levels for the development of eating disorders, compared to non-dancing counterparts. In a remarkably competitive nature of dance environment, perfectionism and emotion regulation difficulties become inevitable risk factors. Moreover, early maladaptive schemas are associated with various eating disorders. In the current study, it was aimed to investigate the mediating role of difficulties with emotion regulation on the relationship between perfectionism and disordered eating behaviors, as well as on the relationship between early maladaptive schemas and disordered eating behaviors. A total of 70 volunteer dancers (n = 47 women, n = 23 men) were recruited in the study (M age = 25.91, SD = 8.9, range 19–63) from the university teams or private clubs in Turkey. The sample included various types of dancers (n = 26 ballets or ballerinas, n =32 Latin, n = 10 tango, n = 2 hiphop). The mean dancing hour per week was 11.09 (SD = 7.09) within a range of 1-30 hours. The participants filled a questionnaire set including demographic information form, Dutch Eating Behavior Questionnaire, Multidimensional Perfectionism Scale, three subscales (Emotional Inhibition, Unrelenting Standards-Hypercriticalness, Approval Seeking-Recognition Seeking) from Young Schema Questionnaire-Short Form-3 and Difficulties in Emotion Regulation Scale. The mediation hypotheses were tested using the PROCESS macro in SPSS. The findings revealed that emotion regulation difficulties significantly mediated the relationship between three distinct subtypes of perfectionism and emotional eating. The results of the Sobel test suggested that there were significant indirect effects of self-oriented perfectionism (b = .06, 95% CI = .0084, .1739), other-oriented perfectionism (b = .15, 95% CI = .0136, .4185), and socially prescribed perfectionism (b = .09, 95% CI = .0104, .2344) on emotional eating through difficulties with emotion regulation. Moreover, emotion regulation difficulties significantly mediated the relationship between emotional inhibition and emotional eating (F(1,68) = 4.67, R2 = .06, p < .05). These results seem to provide some evidence that perfectionism might become a risk factor for disordered eating behaviors when dancers are not able to regulate their emotions. Further, gaining an understanding of how inhibition of emotions leads to inverse effects on eating behavior may be important to develop intervention strategies to manage their disordered eating patterns in risk groups. The present study may also support the importance of using unified protocols for transdiagnostic approaches which focus on identifying, accepting, prompting to express maladaptive emotions and appraisals.

Keywords: dancers, disordered eating, emotion regulation difficulties, perfectionism

Procedia PDF Downloads 120
2618 Absence of Developmental Change in Epenthetic Vowel Duration in Japanese Speakers’ English

Authors: Takayuki Konishi, Kakeru Yazawa, Mariko Kondo

Abstract:

This study examines developmental change in the production of epenthetic vowels by Japanese learners of English in relation to acquisition of L2 English speech rhythm. Seventy-two Japanese learners of English in the J-AESOP corpus were divided into lower- and higher-level learners according to their proficiency score and the frequency of vowel epenthesis. Three learners were excluded because no vowel epenthesis was observed in their utterances. The analysis of their read English speech data showed no statistical difference between lower- and higher-level learners, implying the absence of any developmental change in durations of epenthetic vowels. This result, together with the findings of previous studies, will be discussed in relation to the transfer of L1 phonology and manifestation of L2 English rhythm.

Keywords: vowel epenthesis, Japanese learners of English, L2 speech corpus, speech rhythm

Procedia PDF Downloads 246
2617 The Role of Emotions in the Consumer: Theoretical Review and Analysis of Components

Authors: Mikel Alonso López

Abstract:

The early eighties saw the rise of a new research trend in several prestigious journals, mainly articles that related emotions with the decision-making processes of the consumer, and stopped treating them as external elements. That is why we ask questions such as: what are emotions? Are there different types of emotions? What components do they have? Which theories exist about them? In this study, we will review the main theories and components of emotion analysing the cognitive factor and the different emotional states that are generally recognizable with a focus in the classic debate as to whether they occur before the cognitive process or the affective process.

Keywords: emotion, consumer behaviour, feelings, decision making

Procedia PDF Downloads 321
2616 Grammatical and Lexical Cohesion in the Japan’s Prime Minister Shinzo Abe’s Speech Text ‘Nihon wa Modottekimashita’

Authors: Nadya Inda Syartanti

Abstract:

This research aims to identify, classify, and analyze descriptively the aspects of grammatical and lexical cohesion in the speech text of Japan’s Prime Minister Shinzo Abe entitled Nihon wa Modotte kimashita delivered in Washington DC, the United States on February 23, 2013, as a research data source. The method used is qualitative research, which uses descriptions through words that are applied by analyzing aspects of grammatical and lexical cohesion proposed by Halliday and Hasan (1976). The aspects of grammatical cohesion consist of references (personal, demonstrative, interrogative pronouns), substitution, ellipsis, and conjunction. In contrast, lexical cohesion consists of reiteration (repetition, synonym, antonym, hyponym, meronym) and collocation. Data classification is based on the 6 aspects of the cohesion. Through some aspects of cohesion, this research tries to find out the frequency of using grammatical and lexical cohesion in Shinzo Abe's speech text entitled Nihon wa Modotte kimashita. The results of this research are expected to help overcome the difficulty of understanding speech texts in Japanese. Therefore, this research can be a reference for learners, researchers, and anyone who is interested in the field of discourse analysis.

Keywords: cohesion, grammatical cohesion, lexical cohesion, speech text, Shinzo Abe

Procedia PDF Downloads 135
2615 Speech and Swallowing Function after Tonsillo-Lingual Sulcus Resection with PMMC Flap Reconstruction: A Case Study

Authors: K. Rhea Devaiah, B. S. Premalatha

Abstract:

Background: Tonsillar Lingual sulcus is the area between the tonsils and the base of the tongue. The surgical resection of the lesions in the head and neck results in changes in speech and swallowing functions. The severity of the speech and swallowing problem depends upon the site and extent of the lesion, types and extent of surgery and also the flexibility of the remaining structures. Need of the study: This paper focuses on the importance of speech and swallowing rehabilitation in an individual with the lesion in the Tonsillar Lingual Sulcus and post-operative functions. Aim: Evaluating the speech and swallow functions post-intensive speech and swallowing rehabilitation. The objectives are to evaluate the speech intelligibility and swallowing functions after intensive therapy and assess the quality of life. Method: The present study describes a report of an individual aged 47years male, with the diagnosis of basaloid squamous cell carcinoma, left tonsillar lingual sulcus (pT2n2M0) and underwent wide local excision with left radical neck dissection with PMMC flap reconstruction. Post-surgery the patient came with a complaint of reduced speech intelligibility, and difficulty in opening the mouth and swallowing. Detailed evaluation of the speech and swallowing functions were carried out such as OPME, articulation test, speech intelligibility, different phases of swallowing and trismus evaluation. Self-reported questionnaires such as SHI-E(Speech handicap Index- Indian English), DHI (Dysphagia handicap Index) and SESEQ -K (Self Evaluation of Swallowing Efficiency in Kannada) were also administered to know what the patient feels about his problem. Based on the evaluation, the patient was diagnosed with pharyngeal phase dysphagia associated with trismus and reduced speech intelligibility. Intensive speech and swallowing therapy was advised weekly twice for the duration of 1 hour. Results: Totally the patient attended 10 intensive speech and swallowing therapy sessions. Results indicated misarticulation of speech sounds such as lingua-palatal sounds. Mouth opening was restricted to one finger width with difficulty chewing, masticating, and swallowing the bolus. Intervention strategies included Oro motor exercise, Indirect swallowing therapy, usage of a trismus device to facilitate mouth opening, and change in the food consistency to help to swallow. A practice session was held with articulation drills to improve the production of speech sounds and also improve speech intelligibility. Significant changes in articulatory production and speech intelligibility and swallowing abilities were observed. The self-rated quality of life measures such as DHI, SHI and SESE Q-K revealed no speech handicap and near-normal swallowing ability indicating the improved QOL after the intensive speech and swallowing therapy. Conclusion: Speech and swallowing therapy post carcinoma in the tonsillar lingual sulcus is crucial as the tongue plays an important role in both speech and swallowing. The role of Speech-language and swallowing therapists in oral cancer should be highlighted in treating these patients and improving the overall quality of life. With intensive speech-language and swallowing therapy post-surgery for oral cancer, there can be a significant change in the speech outcome and swallowing functions depending on the site and extent of lesions which will thereby improve the individual’s QOL.

Keywords: oral cancer, speech and swallowing therapy, speech intelligibility, trismus, quality of life

Procedia PDF Downloads 83
2614 Classification of Land Cover Usage from Satellite Images Using Deep Learning Algorithms

Authors: Shaik Ayesha Fathima, Shaik Noor Jahan, Duvvada Rajeswara Rao

Abstract:

Earth's environment and its evolution can be seen through satellite images in near real-time. Through satellite imagery, remote sensing data provide crucial information that can be used for a variety of applications, including image fusion, change detection, land cover classification, agriculture, mining, disaster mitigation, and monitoring climate change. The objective of this project is to propose a method for classifying satellite images according to multiple predefined land cover classes. The proposed approach involves collecting data in image format. The data is then pre-processed using data pre-processing techniques. The processed data is fed into the proposed algorithm and the obtained result is analyzed. Some of the algorithms used in satellite imagery classification are U-Net, Random Forest, Deep Labv3, CNN, ANN, Resnet etc. In this project, we are using the DeepLabv3 (Atrous convolution) algorithm for land cover classification. The dataset used is the deep globe land cover classification dataset. DeepLabv3 is a semantic segmentation system that uses atrous convolution to capture multi-scale context by adopting multiple atrous rates in cascade or in parallel to determine the scale of segments.

Keywords: area calculation, atrous convolution, deep globe land cover classification, deepLabv3, land cover classification, resnet 50

Procedia PDF Downloads 120
2613 The Communicative Nature of Linguistic Interference in Learning and Teaching of Slavic Languages

Authors: Kseniia Fedorova

Abstract:

The article is devoted to interlinguistic homonymy and enantiosemy analysis. These phenomena belong to the process of linguistic interference, which leads to violation of the communicative utterances integrity and causes misunderstanding between foreign interlocutors - native speakers of different Slavic languages. More attention is paid to investigation of non-typical speech situations, which occurred spontaneously or created by somebody intentionally being based on described phenomenon mechanism. The classification of typical students' mistakes connected with the paradox of interference is being represented in the article. The survey contributes to speech act theory, contemporary linguodidactics, translation science and comparative lexicology of Slavonic languages.

Keywords: adherent enantiosemy, interference, interslavonic homonymy, speech act

Procedia PDF Downloads 218
2612 Effects of Oxytocin on Neural Response to Facial Emotion Recognition in Schizophrenia

Authors: Avyarthana Dey, Naren P. Rao, Arpitha Jacob, Chaitra V. Hiremath, Shivarama Varambally, Ganesan Venkatasubramanian, Rose Dawn Bharath, Bangalore N. Gangadhar

Abstract:

Objective: Impaired facial emotion recognition is widely reported in schizophrenia. Neuropeptide oxytocin is known to modulate brain regions involved in facial emotion recognition, namely amygdala, in healthy volunteers. However, its effect on facial emotion recognition deficits seen in schizophrenia is not well explored. In this study, we examined the effect of intranasal OXT on processing facial emotions and its neural correlates in patients with schizophrenia. Method: 12 male patients (age= 31.08±7.61 years, education= 14.50±2.20 years) participated in this single-blind, counterbalanced functional magnetic resonance imaging (fMRI) study. All participants underwent three fMRI scans; one at baseline, one each after single dose 24IU intranasal OXT and intranasal placebo. The order of administration of OXT and placebo were counterbalanced and subject was blind to the drug administered. Participants performed a facial emotion recognition task presented in a block design with six alternating blocks of faces and shapes. The faces depicted happy, angry or fearful emotions. The images were preprocessed and analyzed using SPM 12. First level contrasts comparing recognition of emotions and shapes were modelled at individual subject level. A group level analysis was performed using the contrasts generated at the first level to compare the effects of intranasal OXT and placebo. The results were thresholded at uncorrected p < 0.001 with a cluster size of 6 voxels. Neuropeptide oxytocin is known to modulate brain regions involved in facial emotion recognition, namely amygdala, in healthy volunteers. Results: Compared to placebo, intranasal OXT attenuated activity in inferior temporal, fusiform and parahippocampal gyri (BA 20), premotor cortex (BA 6), middle frontal gyrus (BA 10) and anterior cingulate gyrus (BA 24) and enhanced activity in the middle occipital gyrus (BA 18), inferior occipital gyrus (BA 19), and superior temporal gyrus (BA 22). There were no significant differences between the conditions on the accuracy scores of emotion recognition between baseline (77.3±18.38), oxytocin (82.63 ± 10.92) or Placebo (76.62 ± 22.67). Conclusion: Our results provide further evidence to the modulatory effect of oxytocin in patients with schizophrenia. Single dose oxytocin resulted in significant changes in activity of brain regions involved in emotion processing. Future studies need to examine the effectiveness of long-term treatment with OXT for emotion recognition deficits in patients with schizophrenia.

Keywords: recognition, functional connectivity, oxytocin, schizophrenia, social cognition

Procedia PDF Downloads 188
2611 Dyadic Video Evidence on How Emotions in Parent Verbal Bids Affect Child Compliance in a British Sample

Authors: Iris Sirirada Pattara-Angkoon, Rory Devine, Anja Lindberg, Wendy Browne, Sarah Foley, Gabrielle McHarg, Claire Hughes

Abstract:

Introduction: The “Terrible Twos” is a phrase used to describe toddlers 18-30 months old. It characterizes a transition from high dependency to their caregivers in infancy to more autonomy and mastery of the body and environment. Toddlers at this age may also show more willfulness and stubbornness that could predict a future trajectory leading to conduct disorders. Thus, an important goal for this age group is to promote responsiveness to their caregivers (i.e., compliance). Existing literature tends to focus on praise to increase desirable child behavior. However, this relationship is not always straightforward as some studies have found no or negative association between praise and child compliance. Research suggests positive emotions and affection showed through body language (e.g., smiles) and actions (e.g., hugs, kisses) along with positive parent-child relationship can strengthen the praise and child compliance association. Nonetheless, few studies have examined the influences of positive emotionality within the speech. This is important as implementing verbal positive emotionality is easier than physical adjustments. The literature also tends not to include fathers in the study sample as mothers were traditionally the primary caregiver. However, as child-caring duties are increasing shared equally between mothers and fathers, it is important to include fathers within the study as studies have frequently found differences between female and male caregiver characteristics. Thus, the study will address the literary gap in two ways: 1. explore the influences of positive emotionality in parental speech and 2. include an equal sample of mothers and fathers. Positive emotionality is expected to positively correlate with and predict child compliance. Methodology: This study analyzed toddlers (18-24 months) in their dyadic interactions with mothers and fathers. A Duplo (block) task was used where parents had to work with their children to build the Duplo according to the given photo for four minutes. Then, they would be told to clean up the blocks. Parental positive emotionality in different speech types (e.g., bids, praises, affirmations) and child compliance were measured. Results: The study found that mothers (M = 28.92, SD = 12.01) were significantly more likely than fathers (M = 23.01, SD = 12.28) to use positive verbal emotionality in their speech, t(105) = 4.35, p< .001. High positive emotionality in bids during Duplo task and Clean Up was positively correlated with more child compliance in each task, r(273) = .35, p< .001 and r(264) = .58, p< .001, respectively. Overall, parental positive emotionality in speech significantly predicted child compliance, F(6, 218) = 13.33, p< .001, R² = .27) with emotionality in verbal bids (t = 6.20, p< .001) and affirmations (t = 3.12, p = .002) being significant predictors. Conclusion: Positive verbal emotions may be useful for increasing compliance in toddlers. This can be beneficial for compliance interventions as well as to the parent-child relationship quality through reduction of conflict and child defiance. As this study is correlational in nature, it will be important for future research to test the directional influence of positive emotionality within speech.

Keywords: child temperament, compliance, positive emotion, toddler, verbal bids

Procedia PDF Downloads 150
2610 The Effects of Emotional Working Memory Training on Trait Anxiety

Authors: Gabrielle Veloso, Welison Ty

Abstract:

Trait anxiety is a pervasive tendency to attend to and experience fears and worries to a disproportionate degree, across various situations. This study sought to determine if participants who undergo emotional working memory training will have significantly lower scores on the trait anxiety scales post-intervention. The study also sought to determine if emotional regulation mediated the relationship between working memory training and trait anxiety. Forty-nine participants underwent 20 days of computerized emotional working memory training called Emotional Dual n-back, which involves viewing a continuous stream of emotional content on a grid, and then remembering the location and color of items presented on the grid. Participants of the treatment group had significantly lower trait anxiety compared to controls post-intervention. Mediation analysis determined that working memory training had no significant relationship to anxiety as measured by the Beck’s Anxiety Inventory-Trait (BAIT), but was significantly related to anxiety as measured by form Y2 of the Spielberger State-Trait Anxiety Inventory (STAI-Y2). Emotion regulation, as measured by the Emotional Regulation Questionnaire (ERQ), was found not to mediate between working memory training and trait anxiety reduction. Results suggest that working memory training may be useful in reducing psychoemotional symptoms rather than somatic symptoms of trait anxiety. Moreover, it proposes for future research to further look into the mediating role of emotion regulation via neuroimaging and the development of more comprehensive measures of emotion regulation.

Keywords: anxiety, emotion regulation, working-memory, working-memory training

Procedia PDF Downloads 118
2609 A Novel Method for Face Detection

Authors: H. Abas Nejad, A. R. Teymoori

Abstract:

Facial expression recognition is one of the open problems in computer vision. Robust neutral face recognition in real time is a major challenge for various supervised learning based facial expression recognition methods. This is due to the fact that supervised methods cannot accommodate all appearance variability across the faces with respect to race, pose, lighting, facial biases, etc. in the limited amount of training data. Moreover, processing each and every frame to classify emotions is not required, as the user stays neutral for the majority of the time in usual applications like video chat or photo album/web browsing. Detecting neutral state at an early stage, thereby bypassing those frames from emotion classification would save the computational power. In this work, we propose a light-weight neutral vs. emotion classification engine, which acts as a preprocessor to the traditional supervised emotion classification approaches. It dynamically learns neutral appearance at Key Emotion (KE) points using a textural statistical model, constructed by a set of reference neutral frames for each user. The proposed method is made robust to various types of user head motions by accounting for affine distortions based on a textural statistical model. Robustness to dynamic shift of KE points is achieved by evaluating the similarities on a subset of neighborhood patches around each KE point using the prior information regarding the directionality of specific facial action units acting on the respective KE point. The proposed method, as a result, improves ER accuracy and simultaneously reduces the computational complexity of ER system, as validated on multiple databases.

Keywords: neutral vs. emotion classification, Constrained Local Model, procrustes analysis, Local Binary Pattern Histogram, statistical model

Procedia PDF Downloads 320
2608 School Refusal Behaviours: The Roles of Adolescent and Parental Factors

Authors: Junwen Chen, Celina Feleppa, Tingyue Sun, Satoko Sasagawa, Michael Smithson

Abstract:

School refusal behaviours refer to behaviours to avoid school attendance, chronic lateness in arriving at school, or regular early dismissal. Poor attendance in schools is highly correlated with anxiety, depression, suicide attempts, delinquency, violence, and substance use and abuse. Poor attendance is also a strong indicator of lower achievement in school, as well as problematic social-emotional development. Long-term consequences of school refusal behaviours include fewer opportunities for higher education, employment, and social difficulties, and high risks of later psychiatric illness. Given its negative impacts on youth educational outcomes and well-being, a thorough understanding of factors that are involved in the development of this phenomenon is warranted for developing effective management approaches. This study investigated parental and adolescent factors that may contribute to school refusal behaviours by specifically focusing on the role of parental and adolescents’ anxiety and depression, emotion dysregulation, and parental rearing style. Findings are expected to inform the identification of both parental and adolescents’ factors that may contribute to school refusal behaviours. This knowledge will enable novel and effective approaches that incorporate these factors to managing school refusal behaviours in adolescents, which in turn improve their school and daily functioning. Results are important for an integrative understanding of school refusal behaviours. Furthermore, findings will also provide information for policymakers to weigh the benefits of interventions targeting school refusal behaviours in adolescents. One-hundred-and-six adolescents aged 12-18 years (mean age = 14.79 years old, SD = 1.78, males = 44) and their parents (mean age = 47.49 years old, SD = 5.61, males = 27) completed an online questionnaire measuring both parental and adolescents’ anxiety, depression, emotion dysregulation, parental rearing styles, and adolescents’ school refusal behaviours. Adolescents with school refusal behaviours reported greater anxiety and depression, with their parents showing greater emotion dysregulation. Parental emotion dysregulation and adolescents’ anxiety and depression predicted school refusal behaviours independently. To date, only limited studies have investigated the interplay between parental and youth factors in relation to youth school refusal behaviours. Although parental emotion dysregulation has been investigated in relation to youth emotion dysregulation, little is known about its role in the context of school refusal. This study is one of the very few that investigated both parental and adolescent factors in relation to school refusal behaviours in adolescents. The findings support the theoretical models that emphasise the role of youth and parental psychopathology in school refusal behaviours. Future management of school refusal behaviours should target adolescents’ anxiety and depression while incorporating training for parental emotion regulation skills.

Keywords: adolescents, school refusal behaviors, parental factors, anxiety and depression, emotion dysregulation

Procedia PDF Downloads 96
2607 Color-Based Emotion Regulation Model: An Affective E-Learning Environment

Authors: Sabahat Nadeem, Farman Ali Khan

Abstract:

Emotions are considered as a vital factor affecting the process of information handling, level of attention, memory capacity and decision making. Latest e-Learning systems are therefore taking into consideration the effective state of learners to make the learning process more effective and enjoyable. One such use of user’s affective information is in the systems that tend to regulate users’ emotions to a state optimally desirable for learning. So for, this objective has been tried to be achieved with the help of teaching strategies, background music, guided imagery, video clips and odors. Nevertheless, we know that colors can affect human emotions. Relationship between color and emotions has a strong influence on how we perceive our environment. Similarly, the colors of the interface can also affect the user positively as well as negatively. This affective behavior of color and its use as emotion regulation agent is not yet exploited. Therefore, this research proposes a Color-based Emotion Regulation Model (CERM), a new framework that can automatically adapt its colors according to user’s emotional state and her personality type and can help in producing a desirable emotional effect, aiming at providing an unobtrusive emotional support to the users of e-learning environment. The evaluation of CERM is carried out by comparing it with classical non-adaptive, static colored learning management system. Results indicate that colors of the interface, when carefully selected has significant positive impact on learner’s emotions.

Keywords: effective learning, e-learning, emotion regulation, emotional design

Procedia PDF Downloads 284
2606 Investigating the Online Effect of Language on Gesture in Advanced Bilinguals of Two Structurally Different Languages in Comparison to L1 Native Speakers of L2 and Explores Whether Bilinguals Will Follow Target L2 Patterns in Speech and Co-speech

Authors: Armita Ghobadi, Samantha Emerson, Seyda Ozcaliskan

Abstract:

Being a bilingual involves mastery of both speech and gesture patterns in a second language (L2). We know from earlier work in first language (L1) production contexts that speech and co-speech gesture form a tightly integrated system: co-speech gesture mirrors the patterns observed in speech, suggesting an online effect of language on nonverbal representation of events in gesture during the act of speaking (i.e., “thinking for speaking”). Relatively less is known about the online effect of language on gesture in bilinguals speaking structurally different languages. The few existing studies—mostly with small sample sizes—suggests inconclusive findings: some show greater achievement of L2 patterns in gesture with more advanced L2 speech production, while others show preferences for L1 gesture patterns even in advanced bilinguals. In this study, we focus on advanced bilingual speakers of two structurally different languages (Spanish L1 with English L2) in comparison to L1 English speakers. We ask whether bilingual speakers will follow target L2 patterns not only in speech but also in gesture, or alternatively, follow L2 patterns in speech but resort to L1 patterns in gesture. We examined this question by studying speech and gestures produced by 23 advanced adult Spanish (L1)-English (L2) bilinguals (Mage=22; SD=7) and 23 monolingual English speakers (Mage=20; SD=2). Participants were shown 16 animated motion event scenes that included distinct manner and path components (e.g., "run over the bridge"). We recorded and transcribed all participant responses for speech and segmented it into sentence units that included at least one motion verb and its associated arguments. We also coded all gestures that accompanied each sentence unit. We focused on motion event descriptions as it shows strong crosslinguistic differences in the packaging of motion elements in speech and co-speech gesture in first language production contexts. English speakers synthesize manner and path into a single clause or gesture (he runs over the bridge; running fingers forward), while Spanish speakers express each component separately (manner-only: el corre=he is running; circle arms next to body conveying running; path-only: el cruza el puente=he crosses the bridge; trace finger forward conveying trajectory). We tallied all responses by group and packaging type, separately for speech and co-speech gesture. Our preliminary results (n=4/group) showed that productions in English L1 and Spanish L1 differed, with greater preference for conflated packaging in L1 English and separated packaging in L1 Spanish—a pattern that was also largely evident in co-speech gesture. Bilinguals’ production in L2 English, however, followed the patterns of the target language in speech—with greater preference for conflated packaging—but not in gesture. Bilinguals used separated and conflated strategies in gesture in roughly similar rates in their L2 English, showing an effect of both L1 and L2 on co-speech gesture. Our results suggest that online production of L2 language has more limited effects on L2 gestures and that mastery of native-like patterns in L2 gesture might take longer than native-like L2 speech patterns.

Keywords: bilingualism, cross-linguistic variation, gesture, second language acquisition, thinking for speaking hypothesis

Procedia PDF Downloads 49
2605 Cognitive Semantics Study of Conceptual and Metonymical Expressions in Johnson's Speeches about COVID-19

Authors: Hussain Hameed Mayuuf

Abstract:

The study is an attempt to investigate the conceptual metonymies is used in political discourse about COVID-19. Thus, this study tries to analyze and investigate how the conceptual metonymies in Johnson's speech about coronavirus are constructed. This study aims at: Identifying how are metonymies relevant to understand the messages in Boris Johnson speeches and to find out how can conceptual blending theory help people to understand the messages in the political speech about COVID-19. Lastly, it tries to Point out the kinds of integration networks are common in political speech. The study is based on the hypotheses that conceptual blending theory is a powerful tool for investigating the intended messages in Johnson's speech and there are different processes of blending networks and conceptual mapping that enable the listeners to identify the messages in political speech. This study presents a qualitative and quantitative analysis of four speeches about COVID-19; they are said by Boris Johnson. The selected data have been tackled from the cognitive-semantic perspective by adopting Conceptual Blending Theory as a model for the analysis. It concludes that CBT is applicable to the analysis of metonymies in political discourse. Its mechanisms enable listeners to analyze and understand these speeches. Also the listener can identify and understand the hidden messages in Biden and Johnson's discourse about COVID-19 by using different conceptual networks. Finally, it is concluded that the double scope networks are the most common types of blending of metonymies in the political speech.

Keywords: cognitive, semantics, conceptual, metonymical, Covid-19

Procedia PDF Downloads 88
2604 Development of a Social Assistive Robot for Elderly Care

Authors: Edwin Foo, Woei Wen, Lui, Meijun Zhao, Shigeru Kuchii, Chin Sai Wong, Chung Sern Goh, Yi Hao He

Abstract:

This presentation presents an elderly care and assistive social robot development work. We named this robot JOS and he is restricted to table top operation. JOS is designed to have a maximum volume of 3600 cm3 with its base restricted to 250 mm and his mission is to provide companion, assist and help the elderly. In order for JOS to accomplish his mission, he will be equipped with perception, reaction and cognition capability. His appearance will be not human like but more towards cute and approachable type. JOS will also be designed to be neutral gender. However, the robot will still have eyes, eyelid and a mouth. For his eyes and eyelids, they will be built entirely with Robotis Dynamixel AX18 motor. To realize this complex task, JOS will be also be equipped with micro-phone array, vision camera and Intel i5 NUC computer and a powered by a 12 V lithium battery that will be self-charging. His face is constructed using 1 motor each for the eyelid, 2 motors for the eyeballs, 3 motors for the neck mechanism and 1 motor for the lips movement. The vision senor will be house on JOS forehead and the microphone array will be somewhere below the mouth. For the vision system, Omron latest OKAO vision sensor is used. It is a compact and versatile sensor that is only 60mm by 40mm in size and operates with only 5V supply. In addition, OKAO vision sensor is capable of identifying the user and recognizing the expression of the user. With these functions, JOS is able to track and identify the user. If he cannot recognize the user, JOS will ask the user if he would want him to remember the user. If yes, JOS will store the user information together with the capture face image into a database. This will allow JOS to recognize the user the next time the user is with JOS. In addition, JOS is also able to interpret the mood of the user through the facial expression of the user. This will allow the robot to understand the user mood and behavior and react according. Machine learning will be later incorporated to learn the behavior of the user so as to understand the mood of the user and requirement better. For the speech system, Microsoft speech and grammar engine is used for the speech recognition. In order to use the speech engine, we need to build up a speech grammar database that captures the commonly used words by the elderly. This database is built from research journals and literature on elderly speech and also interviewing elderly what do they want to robot to assist them with. Using the result from the interview and research from journal, we are able to derive a set of common words the elderly frequently used to request for the help. It is from this set that we build up our grammar database. In situation where there is more than one person near JOS, he is able to identify the person who is talking to him through an in-house developed microphone array structure. In order to make the robot more interacting, we have also included the capability for the robot to express his emotion to the user through the facial expressions by changing the position and movement of the eyelids and mouth. All robot emotions will be in response to the user mood and request. Lastly, we are expecting to complete this phase of project and test it with elderly and also delirium patient by Feb 2015.

Keywords: social robot, vision, elderly care, machine learning

Procedia PDF Downloads 420
2603 Cognitive Emotion Regulation Strategies in 9–14-Year-Old Hungarian Children with Neurotypical Development in the Light of the Hungarian Version of Cognitive Emotion Regulation Questionnaire for Children

Authors: Dorottya Horváth, Andras Lang, Diana Varro-Horvath

Abstract:

This research activity and study is part of a major research effort to gain an integrative, neuropsychological, and personality psychological understanding of Attention Deficit Hyperactivity Disorder (ADHD) and thus improve the specification of diagnostic and therapeutic care. In the past, the neuropsychology section has investigated working memory, executive function, attention, and behavioural manifestations in children. Currently, we are looking for personality psychological protective factors for ADHD and its symptomatic exacerbation. We hypothesise that secure attachment, adaptive emotion regulation, and high resilience are protective factors. The aim of this study is to measure and report the results of a Hungarian sample of the Cognitive Emotion Regulation Questionnaire for Children (CERQ-k) because before studying groups with different developmental differences, it is essential to know the average scores of groups with neurotypical devel-opment. Until now, there was no Hungarian version of the above test, so we used our own translation. This questionnaire has been developed to assess children's thoughts after experiencing negative life events. It consists of 4-4 items per subscale, for a total of 36 items. The response categories for each item range from 1 (almost never) to 5 (almost always). The subscales were self-blame, blaming others, acceptance, planning, positive refocusing, rumination or thought-focusing, positive reappraisal, putting into perspective, and catastrophizing. The data for this study were collected from 120 children aged 9-14 years. It was analysed using descriptive statistical analysis, where the mean and standard deviation values for each age group, as well as the Cronbach's alpha value, were significant in testing the reliability of the questionnaire. The results showed that the questionnaire is a reliable and valid measuring instrument also on a Hungarian sample. These developments and results will allow the use of a version of the Cognitive Emotion Regulation Questionnaire for children in Hungarian and pave the way for the study of different developmental groups such as children with learning disabilities and/or with ADHD.

Keywords: neurotypical development, emotion regulation, negative life events, CERQ-k, Hungarian average scores

Procedia PDF Downloads 45
2602 Data Augmentation for Automatic Graphical User Interface Generation Based on Generative Adversarial Network

Authors: Xulu Yao, Moi Hoon Yap, Yanlong Zhang

Abstract:

As a branch of artificial neural network, deep learning is widely used in the field of image recognition, but the lack of its dataset leads to imperfect model learning. By analysing the data scale requirements of deep learning and aiming at the application in GUI generation, it is found that the collection of GUI dataset is a time-consuming and labor-consuming project, which is difficult to meet the needs of current deep learning network. To solve this problem, this paper proposes a semi-supervised deep learning model that relies on the original small-scale datasets to produce a large number of reliable data sets. By combining the cyclic neural network with the generated countermeasure network, the cyclic neural network can learn the sequence relationship and characteristics of data, make the generated countermeasure network generate reasonable data, and then expand the Rico dataset. Relying on the network structure, the characteristics of collected data can be well analysed, and a large number of reasonable data can be generated according to these characteristics. After data processing, a reliable dataset for model training can be formed, which alleviates the problem of dataset shortage in deep learning.

Keywords: GUI, deep learning, GAN, data augmentation

Procedia PDF Downloads 154
2601 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses

Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas

Abstract:

We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.

Keywords: transient noise pulses, noise reduction, dynamic time warping, speech recognition

Procedia PDF Downloads 529
2600 The Combination of the Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), JITTER and SHIMMER Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech

Authors: Brahim-Fares Zaidi, Malika Boudraa, Sid-Ahmed Selouani

Abstract:

Our work aims to improve our Automatic Recognition System for Dysarthria Speech (ARSDS) based on the Hidden Models of Markov (HMM) and the Hidden Markov Model Toolkit (HTK) to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients (MFCC's) and Perceptual Linear Prediction (PLP's) and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers.

Keywords: hidden Markov model toolkit (HTK), hidden models of Markov (HMM), Mel-frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP’s)

Procedia PDF Downloads 135