Search results for: collecting speech emotion dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2791

Search results for: collecting speech emotion dataset

2551 The Effect of Speech-Shaped Noise and Speaker’s Voice Quality on First-Grade Children’s Speech Perception and Listening Comprehension

Authors: I. Schiller, D. Morsomme, A. Remacle

Abstract:

Children’s ability to process spoken language develops until the late teenage years. At school, where efficient spoken language processing is key to academic achievement, listening conditions are often unfavorable. High background noise and poor teacher’s voice represent typical sources of interference. It can be assumed that these factors particularly affect primary school children, because their language and literacy skills are still low. While it is generally accepted that background noise and impaired voice impede spoken language processing, there is an increasing need for analyzing impacts within specific linguistic areas. Against this background, the aim of the study was to investigate the effect of speech-shaped noise and imitated dysphonic voice on first-grade primary school children’s speech perception and sentence comprehension. Via headphones, 5 to 6-year-old children, recruited within the French-speaking community of Belgium, listened to and performed a minimal-pair discrimination task and a sentence-picture matching task. Stimuli were randomly presented according to four experimental conditions: (1) normal voice / no noise, (2) normal voice / noise, (3) impaired voice / no noise, and (4) impaired voice / noise. The primary outcome measure was task score. How did performance vary with respect to listening condition? Preliminary results will be presented with respect to speech perception and sentence comprehension and carefully interpreted in the light of past findings. This study helps to support our understanding of children’s language processing skills under adverse conditions. Results shall serve as a starting point for probing new measures to optimize children’s learning environment.

Keywords: impaired voice, sentence comprehension, speech perception, speech-shaped noise, spoken language processing

Procedia PDF Downloads 165
2550 Programmed Speech to Text Summarization Using Graph-Based Algorithm

Authors: Hamsini Pulugurtha, P. V. S. L. Jagadamba

Abstract:

Programmed Speech to Text and Text Summarization Using Graph-based Algorithms can be utilized in gatherings to get the short depiction of the gathering for future reference. This gives signature check utilizing Siamese neural organization to confirm the personality of the client and convert the client gave sound record which is in English into English text utilizing the discourse acknowledgment bundle given in python. At times just the outline of the gathering is required, the answer for this text rundown. Thus, the record is then summed up utilizing the regular language preparing approaches, for example, solo extractive text outline calculations

Keywords: Siamese neural network, English speech, English text, natural language processing, unsupervised extractive text summarization

Procedia PDF Downloads 184
2549 Irrelevant Angry Faces, Compared to Happy Faces, Facilitate the Response Inhibition

Authors: Rashmi Gupta

Abstract:

It is unclear whether arousal or valence modulates the response inhibition process. It has been suggested that irrelevant positive emotional information (e.g., happy faces) and negative emotional information (e.g., angry faces) interact with attention differently. In the present study, we used arousal-matched irrelevant happy and angry faces. These faces were used as stop-signals in the stop-signal paradigm. There were two kinds of trials: go-trials and stop-trials. Participants were required to discriminate between the letter X or O by pressing the corresponding keys on go-trials. However, a stop signal was occasionally presented on stop trials, where participants were required to withhold their motor response. A significant main effect of emotion on response inhibition was observed. It indicated that the valence of a stop signal modulates inhibitory control. We found that stop-signal reaction time was faster in response to irrelevant angry faces than happy faces, indicating that irrelevant angry faces facilitate the response inhibition process compared to happy faces. These results shed light on the interaction of emotion with cognitive control functions.

Keywords: attention, emotion, response inhibition, inhibitory control

Procedia PDF Downloads 77
2548 Experimental Study on a Solar Heat Concentrating Steam Generator

Authors: Qiangqiang Xu, Xu Ji, Jingyang Han, Changchun Yang, Ming Li

Abstract:

Replacing of complex solar concentrating unit, this paper designs a solar heat-concentrating medium-temperature steam-generating system. Solar radiation is collected by using a large solar collecting and heat concentrating plate and is converged to the metal evaporating pipe with high efficient heat transfer. In the meantime, the heat loss is reduced by employing a double-glazed cover and other heat insulating structures. Thus, a high temperature is reached in the metal evaporating pipe. The influences of the system's structure parameters on system performance are analyzed. The steam production rate and the steam production under different solar irradiance, solar collecting and heat concentrating plate area, solar collecting and heat concentrating plate temperature and heat loss are obtained. The results show that when solar irradiance is higher than 600 W/m2, the effective heat collecting area is 7.6 m2 and the double-glazing cover is adopted, the system heat loss amount is lower than the solar irradiance value. The stable steam is produced in the metal evaporating pipe at 100 ℃, 110 ℃, and 120 ℃, respectively. When the average solar irradiance is about 896 W/m2, and the steaming cumulative time is about 5 hours, the daily steam production of the system is about 6.174 kg. In a single day, the solar irradiance is larger at noon, thus the steam production rate is large at that time. Before 9:00 and after 16:00, the solar irradiance is smaller, and the steam production rate is almost 0.

Keywords: heat concentrating, heat loss, medium temperature, solar steam production

Procedia PDF Downloads 158
2547 Static and Dynamic Hand Gesture Recognition Using Convolutional Neural Network Models

Authors: Keyi Wang

Abstract:

Similar to the touchscreen, hand gesture based human-computer interaction (HCI) is a technology that could allow people to perform a variety of tasks faster and more conveniently. This paper proposes a training method of an image-based hand gesture image and video clip recognition system using a CNN (Convolutional Neural Network) with a dataset. A dataset containing 6 hand gesture images is used to train a 2D CNN model. ~98% accuracy is achieved. Furthermore, a 3D CNN model is trained on a dataset containing 4 hand gesture video clips resulting in ~83% accuracy. It is demonstrated that a Cozmo robot loaded with pre-trained models is able to recognize static and dynamic hand gestures.

Keywords: deep learning, hand gesture recognition, computer vision, image processing

Procedia PDF Downloads 113
2546 Reconstructed Phase Space Features for Estimating Post Traumatic Stress Disorder

Authors: Andre Wittenborn, Jarek Krajewski

Abstract:

Trauma-related sadness in speech can alter the voice in several ways. The generation of non-linear aerodynamic phenomena within the vocal tract is crucial when analyzing trauma-influenced speech production. They include non-laminar flow and formation of jets rather than well-behaved laminar flow aspects. Especially state-space reconstruction methods based on chaotic dynamics and fractal theory have been suggested to describe these aerodynamic turbulence-related phenomena of the speech production system. To extract the non-linear properties of the speech signal, we used the time delay embedding method to reconstruct from a scalar time series (reconstructed phase space, RPS). This approach results in the extraction of 7238 Features per .wav file (N= 47, 32 m, 15 f). The speech material was prompted by telling about autobiographical related sadness-inducing experiences (sampling rate 16 kHz, 8-bit resolution). After combining these features in a support vector machine based machine learning approach (leave-one-sample out validation), we achieved a correlation of r = .41 with the well-established, self-report ground truth measure (RATS) of post-traumatic stress disorder (PTSD).

Keywords: non-linear dynamics features, post traumatic stress disorder, reconstructed phase space, support vector machine

Procedia PDF Downloads 82
2545 Data Mining Approach: Classification Model Evaluation

Authors: Lubabatu Sada Sodangi

Abstract:

The rapid growth in exchange and accessibility of information via the internet makes many organisations acquire data on their own operation. The aim of data mining is to analyse the different behaviour of a dataset using observation. Although, the subset of the dataset being analysed may not display all the behaviours and relationships of the entire data and, therefore, may not represent other parts that exist in the dataset. There is a range of techniques used in data mining to determine the hidden or unknown information in datasets. In this paper, the performance of two algorithms Chi-Square Automatic Interaction Detection (CHAID) and multilayer perceptron (MLP) would be matched using an Adult dataset to find out the percentage of an/the adults that earn > 50k and those that earn <= 50k per year. The two algorithms were studied and compared using IBM SPSS statistics software. The result for CHAID shows that the most important predictors are relationship and education. The algorithm shows that those are married (husband) and have qualification: Bachelor, Masters, Doctorate or Prof-school whose their age is > 41<57 earn > 50k. Also, multilayer perceptron displays marital status and capital gain as the most important predictors of the income. It also shows that individuals that their capital gain is less than 6,849 and are single, separated or widow, earn <= 50K, whereas individuals with their capital gain is > 6,849, work > 35 hrs/wk, and > 27yrs their income will be > 50k. By comparing the two algorithms, it is observed that both algorithms are reliable but there is strong reliability in CHAID which clearly shows that relation and education contribute to the prediction as displayed in the data visualisation.

Keywords: data mining, CHAID, multi-layer perceptron, SPSS, Adult dataset

Procedia PDF Downloads 359
2544 Speech Perception by Video Hosting Services Actors: Urban Planning Conflicts

Authors: M. Pilgun

Abstract:

The report presents the results of a study of the specifics of speech perception by actors of video hosting services on the material of urban planning conflicts. To analyze the content, the multimodal approach using neural network technologies is employed. Analysis of word associations and associative networks of relevant stimulus revealed the evaluative reactions of the actors. Analysis of the data identified key topics that generated negative and positive perceptions from the participants. The calculation of social stress and social well-being indices based on user-generated content made it possible to build a rating of road transport construction objects according to the degree of negative and positive perception by actors.

Keywords: social media, speech perception, video hosting, networks

Procedia PDF Downloads 122
2543 Functions and Pragmatic Aspects of English Nonsense

Authors: Natalia V. Ursul

Abstract:

In linguistic studies, the question of nonsense is attracting increasing interest. Nonsense is usually defined as spoken or written words that have no meaning. However, this definition is likely to be outdated as any speech act is generated due to the speaker’s pragmatic reasons, thus it cannot be purely illogical or meaningless. In the current paper a new working definition of nonsense as a linguistic medium will be formulated; moreover, the pragmatic peculiarities of newly coined linguistic patterns and possible ways of their interpretation will be discussed.

Keywords: nonsense, nonse verse, pragmatics, speech act

Procedia PDF Downloads 490
2542 Preliminary Study of the Phonological Development in Three and Four Year Old Bulgarian Children

Authors: Tsvetomira Braynova, Miglena Simonska

Abstract:

The article presents the results of research on phonological processes in three and four-year-old children. For the purpose of the study, an author's test was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the icing is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, an elision of sound, metathesis of sound, elision of a syllable, and elision of consonants clustered in a syllable. All examined children were identified with the articulatory disorder from type bilabial lambdacism. Measuring the correlation between the average length of repeated speech and the average length of generated speech, the analysis proves that the more words a child can repeat in part “repeated speech,” the more words they can be expected to generate in part “generating sentence.” The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.

Keywords: assessment, phonology, articulation, speech-language development

Procedia PDF Downloads 151
2541 Video Object Segmentation for Automatic Image Annotation of Ethernet Connectors with Environment Mapping and 3D Projection

Authors: Marrone Silverio Melo Dantas Pedro Henrique Dreyer, Gabriel Fonseca Reis de Souza, Daniel Bezerra, Ricardo Souza, Silvia Lins, Judith Kelner, Djamel Fawzi Hadj Sadok

Abstract:

The creation of a dataset is time-consuming and often discourages researchers from pursuing their goals. To overcome this problem, we present and discuss two solutions adopted for the automation of this process. Both optimize valuable user time and resources and support video object segmentation with object tracking and 3D projection. In our scenario, we acquire images from a moving robotic arm and, for each approach, generate distinct annotated datasets. We evaluated the precision of the annotations by comparing these with a manually annotated dataset, as well as the efficiency in the context of detection and classification problems. For detection support, we used YOLO and obtained for the projection dataset an F1-Score, accuracy, and mAP values of 0.846, 0.924, and 0.875, respectively. Concerning the tracking dataset, we achieved an F1-Score of 0.861, an accuracy of 0.932, whereas mAP reached 0.894. In order to evaluate the quality of the annotated images used for classification problems, we employed deep learning architectures. We adopted metrics accuracy and F1-Score, for VGG, DenseNet, MobileNet, Inception, and ResNet. The VGG architecture outperformed the others for both projection and tracking datasets. It reached an accuracy and F1-score of 0.997 and 0.993, respectively. Similarly, for the tracking dataset, it achieved an accuracy of 0.991 and an F1-Score of 0.981.

Keywords: RJ45, automatic annotation, object tracking, 3D projection

Procedia PDF Downloads 136
2540 Factors That Affect the Mental Health Status of Syrian Refugee Girls in Post-Resettlement Context

Authors: Vivian Khamis

Abstract:

Exposure to war and forced migration have been widely linked to child subsequent adaptation. What remains sparse is research spanning multiple risk and protective factors and examining their unique and relative implications to difficulties in mental health among refugee girls. This study investigated the mechanisms through which posttraumatic stress disorder (PTSD), emotion dysregulation , neuroticism, and behavioral and emotional disorders in Syrian refugee girls is impacted by exposure to war traumas, age, and other risk and protective factors such as coping styles, family relationships, and school environment. The sample consisted of 539 Syrian refugee girls who ranged in age from 7 to 18 years attending public schools in various governorates in Lebanon and Jordan. Two school counselors carried out the interviews with children at school. Results indicated that war trauma, older age, and a combination of negative copying style associated with conflict in the family could lead to an overall state of emotion dysregulation, neuroticism, behavioral and emotional disorders, and PTSD in refugee girls. On the other hand, lapse of time since resettlement in host country, positive copying style, cohesion, and expressiveness in the family would lead to more positive mental health status, including lower levels of emotion dysregulation, neuroticism, behavioral and emotional disorders, and PTSD . Enhanced understanding of the mechanistic role of risk and protective factors in contributing to difficulties in mental health in refugee girls may contribute to the development of effective interventions to target the psychological effects of the refugee experience.

Keywords: refugee girls, PTSD, emotion dysregulation, neuroticism, behavioral and emotional disorders

Procedia PDF Downloads 48
2539 Effects of Therapeutic Horseback Riding in Speech and Communication Skills of Children with Autism

Authors: Aristi Alopoudi, Sofia Beloka, Vassiliki Pliogou

Abstract:

Autism is a complex neuro-developmental disorder with a variety of difficulties in many aspects such as social interaction, communication skills and verbal communication (speech). The aim of this study was to examine the impact of therapeutic horseback riding in improving the verbal and communication skills of children diagnosed with autism during 16 sessions. The researcher examined whether the expression of speech, the use of vocabulary, semantics, pragmatics, echolalia and communication skills were influenced by the therapeutic horseback riding when we increase the frequency of the sessions. The researcher observed two subjects of primary-school aged, in a two case observation design, with autism during 16 therapeutic horseback riding sessions (one riding session per week). Compared to baseline, at the end of the 16th therapeutic session, therapeutic horseback riding increased both verbal skills such as vocabulary, semantics, pragmatics, formation of sentences and communication skills such as eye contact, greeting, participation in dialogue and spontaneous speech. It was noticeable that echolalia remained stable. Increased frequency of therapeutic horseback riding was beneficial for significant improvement in verbal and communication skills. More specifically, from the first to the last riding session there was a great increase of vocabulary, semantics, and formation of sentences. Pragmatics reached a lower level than semantics but the same as the right usage of the first person (for example, I make a hug) and echolalia used for that. A great increase of spontaneous speech was noticed. The eye contact was presented in a lower level, and there was a slow but important raise at the greeting as well as the participation in dialogue. Last but not least; this is a first study conducted in therapeutic horseback riding studying the verbal communication and communication skills in autistic children. According to the references, therapeutic horseback riding is a therapy with a variety of benefits, thus; this research made clear that in the benefits of this therapy there should be included the improvement of verbal speech and communication.

Keywords: Autism, communication skills, speech, therapeutic horseback riding

Procedia PDF Downloads 244
2538 REFLEX: A Randomized Controlled Trial to Test the Efficacy of an Emotion Regulation Flexibility Program with Daily Measures

Authors: Carla Nardelli, Jérome Holtzmann, Céline Baeyens, Catherine Bortolon

Abstract:

Background. Emotion regulation (ER) is a process associated with difficulties in mental health. Given its transdiagnostic features, its improvement could facilitate the recovery of various psychological issues. A limit of current studies is the lack of knowledge regarding whether available interventionsimprove ER flexibility (i.e., the ability to implement ER strategies in line with contextual demands), even though this capacity has been associated with better mental health and well-being. Therefore, the aim of the study is to test the efficacy of a 9-weeks ER group program (the Affect Regulation Training-ART), using the most appropriate measures (i.e., experience sampling method) in a student population. Plus, the goal of the study is to explore the potential mediative role of ER flexibility on mental health improvement. Method. This Randomized Controlled Trial will comparethe ER program group to an active control group (a relaxation program) in 100 participants. To test the mediative role of ER flexibility on mental health, daily measures will be used before, during, and after the interventions to evaluate the extent to which participants are flexible in their ER. Expected outcomes. Using multilevel analyses, we expect an improvement in anxious-depressive symptomatology for both groups. However, we expect the ART group to improve specifically on ER flexibility ability and the last to be a mediative variable on mental health. Conclusion. This study will enhance knowledge on interventions for students and the impact of interventions on ER flexibility. Also, this research will improve knowledge on ecological measures for assessing the effect of interventions. Overall, this project represents new opportunities to improve ER skills to improve mental health in undergraduate students.

Keywords: emotion regulation flexibility, experience sampling method, psychological intervention, emotion regulation skills

Procedia PDF Downloads 111
2537 Co-Design of Accessible Speech Recognition for Users with Dysarthric Speech

Authors: Elizabeth Howarth, Dawn Green, Sean Connolly, Geena Vabulas, Sara Smolley

Abstract:

Through the EU Horizon 2020 Nuvoic Project, the project team recruited 70 individuals in the UK and Ireland to test the Voiceitt speech recognition app and provide user feedback to developers. The app is designed for people with dysarthric speech, to support communication with unfamiliar people and access to speech-driven technologies such as smart home equipment and smart assistants. Participants with atypical speech, due to a range of conditions such as cerebral palsy, acquired brain injury, Down syndrome, stroke and hearing impairment, were recruited, primarily through organisations supporting disabled people. Most had physical or learning disabilities in addition to dysarthric speech. The project team worked with individuals, their families and local support teams, to provide access to the app, including through additional assistive technologies where needed. Testing was user-led, with participants asked to identify and test use cases most relevant to their daily lives over a period of three months or more. Ongoing technical support and training were provided remotely and in-person throughout the testing period. Structured interviews were used to collect feedback on users' experiences, with delivery adapted to individuals' needs and preferences. Informal feedback was collected through ongoing contact between participants, their families and support teams and the project team. Focus groups were held to collect feedback on specific design proposals. User feedback shared with developers has led to improvements to the user interface and functionality, including faster voice training, simplified navigation, the introduction of gamification elements and of switch access as an alternative to touchscreen access, with other feature requests from users still in development. This work offers a case-study in successful and inclusive co-design with the disabled community.

Keywords: co-design, assistive technology, dysarthria, inclusive speech recognition

Procedia PDF Downloads 81
2536 Low-Income African-American Fathers' Gendered Relationships with Their Children: A Study Examining the Impact of Child Gender on Father-Child Interactions

Authors: M. Lim Haslip

Abstract:

This quantitative study explores the correlation between child gender and father-child interactions. The author analyzes data from videotaped interactions between African-American fathers and their boy or girl toddler to explain how African-American fathers and toddlers interact with each other and whether these interactions differ by child gender. The purpose of this study is to investigate the research question: 'How, if at all, do fathers’ speech and gestures differ when interacting with their two-year-old sons versus daughters during free play?' The objectives of this study are to describe how child gender impacts African-American fathers’ verbal communication, examine how fathers gesture and speak to their toddler by gender, and to guide interventions for low-income African-American families and their children in early language development. This study involves a sample of 41 low-income African-American fathers and their 24-month-old toddlers. The videotape data will be used to observe 10-minute father-child interactions during free play. This study uses the already transcribed and coded data provided by Dr. Meredith Rowe, who did her study on the impact of African-American fathers’ verbal input on their children’s language development. The Child Language Data Exchange System (CHILDES program), created to study conversational interactions, was used for transcription and coding of the videotape data. The findings focus on the quantity of speech, diversity of speech, complexity of speech, and the quantity of gesture to inform the vocabulary usage, number of spoken words, length of speech, and the number of object pointings observed during father-toddler interactions in a free play setting. This study will help intervention and prevention scientists understand early language development in the African-American population. It will contribute to knowledge of the role of African-American fathers’ interactions on their children’s language development. It will guide interventions for the early language development of African-American children.

Keywords: parental engagement, early language development, African-American families, quantity of speech, diversity of speech, complexity of speech and the quantity of gesture

Procedia PDF Downloads 86
2535 Influence of Loudness Compression on Hearing with Bone Anchored Hearing Implants

Authors: Anja Kurz, Marc Flynn, Tobias Good, Marco Caversaccio, Martin Kompis

Abstract:

Bone Anchored Hearing Implants (BAHI) are routinely used in patients with conductive or mixed hearing loss, e.g. if conventional air conduction hearing aids cannot be used. New sound processors and new fitting software now allow the adjustment of parameters such as loudness compression ratios or maximum power output separately. Today it is unclear, how the choice of these parameters influences aided speech understanding in BAHI users. In this prospective experimental study, the effect of varying the compression ratio and lowering the maximum power output in a BAHI were investigated. Twelve experienced adult subjects with a mixed hearing loss participated in this study. Four different compression ratios (1.0; 1.3; 1.6; 2.0) were tested along with two different maximum power output settings, resulting in a total of eight different programs. Each participant tested each program during two weeks. A blinded Latin square design was used to minimize bias. For each of the eight programs, speech understanding in quiet and in noise was assessed. For speech in quiet, the Freiburg number test and the Freiburg monosyllabic word test at 50, 65, and 80 dB SPL were used. For speech in noise, the Oldenburg sentence test was administered. Speech understanding in quiet and in noise was improved significantly in the aided condition in any program, when compared to the unaided condition. However, no significant differences were found between any of the eight programs. In contrast, on a subjective level there was a significant preference for medium compression ratios of 1.3 to 1.6 and higher maximum power output.

Keywords: Bone Anchored Hearing Implant, baha, compression, maximum power output, speech understanding

Procedia PDF Downloads 357
2534 The Role of Cognitive Control and Social Camouflage Associated with Social Anxiety Autism Spectrum Conditions

Authors: Siqing Guan, Fumiyo Oshima, Eiji Shimizu, Nozomi Tomita, Toru Takahashi, Hiroaki Kumano

Abstract:

Risk factors for social anxiety in autism spectrum conditions involve executive attention, emotion regulation, and thought regulation as processes of cognitive dysregulation. Social camouflaging behaviors as strategies used to mask and/or compensate for autism characteristics during social interactions in autism spectrum conditions have also been emphasized. However, the role of cognitive dysregulation and social camouflaging related to social anxiety in autism spectrum conditions has not been clarified. Whether these factors are specific to social anxiety in autism spectrum conditions or common to social anxiety independent of autism spectrum conditions needs to be clarified. Here, we explored risk factors specific to social anxiety in autism spectrum conditions and general risk factors for social anxiety independent of autism spectrum conditions. From the Japanese participants in early adulthood (age=18~39) of the online survey in Japan, those who exceeded the Japanese version Autism-Spectrum Quotient cutoff (33 points or more )were divided into the autism spectrum conditions group (ASC; N=255, mean age=32.08, SD age=5.16)and those who did not exceed the cutoff were divided into the non-autism spectrum conditions group (Non-ASC; N=255, mean age=31.70, SD age=5.09). Using the Japanese versions of the Social Phobia Scale, the Social Interaction Anxiety Scale, and the Short Fear of Negative Evaluation Scale, a composite score for social anxiety was calculated using a method of principal. We also measured emotional control difficulties using the Difficulties in Emotion Regulation Scale, executive attention using the Effortful Control Scale for Adults, rumination using the Rumination-Reflection Questionnaire, and worry using the Penn State Worry Questionnaire. This study was passed through the review of the Ethics Committee. No conflicts of interest. Multiple regression analysis with forced entry method was used to predict social anxiety in the ASC and non-ASC groups separately, based on executive attention, emotion dysregulation, worry, rumination, and social camouflage. In the ASC group, emotion dysregulation (β=.277, p<.001), worry (β=.162, p<.05), assimilation (β=.308, p<.001) and masking (β=.275, p<.001) were significant predictors of social anxiety (F (7,247) = 45.791, p <.001, R2=.565). In the non-ASC groups,emotion dysregulation (β=.171, p<.05), worry (β=.344,p <.001), assimilation (β=.366,p <.001) and executive attention (β=-.132,p <.05) were significant predictors of social anxiety (F (7,207) =47.333, p <.001, R2=.615).The findings suggest that masking was shown to be a risk factor for social anxiety specific to autism spectrum conditions, while emotion dysregulation, worry, and assimilation were shown to be common risk factors for social anxiety, regardless of autism spectrum conditions. In addition, executive attention is a risk factor for social anxiety without autism spectrum conditions.

Keywords: autism spectrum, cognitive control, social anxiety, social camouflaging

Procedia PDF Downloads 186
2533 Mobile Crowdsensing Scheme by Predicting Vehicle Mobility Using Deep Learning Algorithm

Authors: Monojit Manna, Arpan Adhikary

Abstract:

In Mobile cloud sensing across the globe, an emerging paradigm is selected by the user to compute sensing tasks. In urban cities current days, Mobile vehicles are adapted to perform the task of data sensing and data collection for universality and mobility. In this work, we focused on the optimality and mobile nodes that can be selected in order to collect the maximum amount of data from urban areas and fulfill the required data in the future period within a couple of minutes. We map out the requirement of the vehicle to configure the maximum data optimization problem and budget. The Application implementation is basically set up to generalize a realistic online platform in which real-time vehicles are moving apparently in a continuous manner. The data center has the authority to select a set of vehicles immediately. A deep learning-based scheme with the help of mobile vehicles (DLMV) will be proposed to collect sensing data from the urban environment. From the future time perspective, this work proposed a deep learning-based offline algorithm to predict mobility. Therefore, we proposed a greedy approach applying an online algorithm step into a subset of vehicles for an NP-complete problem with a limited budget. Real dataset experimental extensive evaluations are conducted for the real mobility dataset in Rome. The result of the experiment not only fulfills the efficiency of our proposed solution but also proves the validity of DLMV and improves the quantity of collecting the sensing data compared with other algorithms.

Keywords: mobile crowdsensing, deep learning, vehicle recruitment, sensing coverage, data collection

Procedia PDF Downloads 53
2532 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach

Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik

Abstract:

We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.

Keywords: noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping

Procedia PDF Downloads 384
2531 Engagement Analysis Using DAiSEE Dataset

Authors: Naman Solanki, Souraj Mondal

Abstract:

With the world moving towards online communication, the video datastore has exploded in the past few years. Consequently, it has become crucial to analyse participant’s engagement levels in online communication videos. Engagement prediction of people in videos can be useful in many domains, like education, client meetings, dating, etc. Video-level or frame-level prediction of engagement for a user involves the development of robust models that can capture facial micro-emotions efficiently. For the development of an engagement prediction model, it is necessary to have a widely-accepted standard dataset for engagement analysis. DAiSEE is one of the datasets which consist of in-the-wild data and has a gold standard annotation for engagement prediction. Earlier research done using the DAiSEE dataset involved training and testing standard models like CNN-based models, but the results were not satisfactory according to industry standards. In this paper, a multi-level classification approach has been introduced to create a more robust model for engagement analysis using the DAiSEE dataset. This approach has recorded testing accuracies of 0.638, 0.7728, 0.8195, and 0.866 for predicting boredom level, engagement level, confusion level, and frustration level, respectively.

Keywords: computer vision, engagement prediction, deep learning, multi-level classification

Procedia PDF Downloads 95
2530 Speech Recognition Performance by Adults: A Proposal for a Battery for Marathi

Authors: S. B. Rathna Kumar, Pranjali A Ujwane, Panchanan Mohanty

Abstract:

The present study aimed to develop a battery for assessing speech recognition performance by adults in Marathi. A total of four word lists were developed by considering word frequency, word familiarity, words in common use, and phonemic balance. Each word list consists of 25 words (15 monosyllabic words in CVC structure and 10 monosyllabic words in CVCV structure). Equivalence analysis and performance-intensity function testing was carried using the four word lists on a total of 150 native speakers of Marathi belonging to different regions of Maharashtra (Vidarbha, Marathwada, Khandesh and Northern Maharashtra, Pune, and Konkan). The subjects were further equally divided into five groups based on above mentioned regions. It was found that there was no significant difference (p > 0.05) in the speech recognition performance between groups for each word list and between word lists for each group. Hence, the four word lists developed were equally difficult for all the groups and can be used interchangeably. The performance-intensity (PI) function curve showed semi-linear function, and the groups’ mean slope of the linear portions of the curve indicated an average linear slope of 4.64%, 4.73%, 4.68%, and 4.85% increase in word recognition score per dB for list 1, list 2, list 3 and list 4 respectively. Although, there is no data available on speech recognition tests for adults in Marathi, most of the findings of the study are in line with the findings of research reports on other languages. The four word lists, thus developed, were found to have sufficient reliability and validity in assessing speech recognition performance by adults in Marathi.

Keywords: speech recognition performance, phonemic balance, equivalence analysis, performance-intensity function testing, reliability, validity

Procedia PDF Downloads 331
2529 Emotional Processing Difficulties in Recovered Anorexia Nervosa Patients: State or Trait

Authors: Telma Fontao de Castro, Kylee Miller, Maria Xavier Araújo, Isabel Brandao, Sandra Torres

Abstract:

Objective: There is a dearth of research investigating the long-term emotional functioning of individuals recovered from anorexia nervosa (AN). This 15-year longitudinal study aimed to examine whether difficulties in cognitive processing of emotions persisted after long-term AN recovery and its link to anxiety and depression. Method: Twenty-four females, who were tested longitudinally during their acute and recovered AN phases, and 24 healthy control (HC) women, were screened for anxiety, depression, alexithymia, and emotion regulation difficulties (ER; only assessed in recovery phase). Results: Anxiety, depression, and alexithymia levels decreased significantly with AN recovery. However, scores on anxiety and difficulty in identifying feelings (alexithymia factor) remained high when compared to the HC group. Scores on emotion regulation difficulties were also lower in HC group. The abovementioned differences between AN recovered group and HC group in difficulties in identifying and accepting feelings and lack of emotional clarity were no longer present when the effect of anxiety and depression was controlled. Conclusions: Findings suggest that emotional dysfunction tends to decrease in AN recovered phase. However, using an HC group as a reference, we conclude that several emotional difficulties are still increased after long-term AN recovery, in particular, limited access to emotion regulation strategies, and difficulty controlling impulses and engaging in goal-directed behavior, thus suggesting to be a trait vulnerability. In turn, competencies related to emotional clarity and acceptance of emotional responses seem to be state-dependent phenomena linked to anxiety and depression. In sum, managing emotions remains a challenge for individuals recovered from AN. Under this circumstance, maladaptive eating behavior can serve as an affect regulatory function, increasing the risk of relapse. Emotional education and stabilization of depressive and anxious symptomatology after recovery emerge as an important avenue to protect from long-term AN relapse.

Keywords: alexithymia, anorexia nervosa, emotion recognition, emotion regulation

Procedia PDF Downloads 98
2528 Interpersonal Emotion Regulation in Adolescence: An Enhanced Critical Incident Study

Authors: Setareh Shayanfar

Abstract:

Given the increasing importance of peer relationships during adolescence, the present study aimed to examine peer interactions that facilitate or hinder adolescents’ regulation of negative emotions. Using the Enhanced Critical Incident Technique, 1-hour semi-structured interviews were conducted with 16 junior high school adolescents. Participants were asked to recall situations when they experienced strong negative emotions during the past school year, indicate the peer interactions that helped or hindered their emotion regulation, and identify prospective interactions with the potential to help regulate their emotions. Data analysis extracted 182 critical incidents, including 109 helping incidents, 45 hindering incidents, and 28 wish list items, which generated 10 categories nested within four overarching themes: Positive Personal Support included (a) supportive presence, (b) expressing concern, (c) empathizing, and (d) encouraging and cheering up; while Strategy Transmission included (e) sharing perspective, and (f) giving advice; Activated Support included (g) taking action, and (h) distracting; while Negative Personal Interactions included (i) withdrawing and (j) punishing. Implications for mental health and service providers, as well as recommendations for future research, are presented.

Keywords: adolescence, emotion regulation, enhanced critical incident technique, peers

Procedia PDF Downloads 115
2527 Emotion Detection in a General Human-Robot Interaction System Optimized for Embedded Platforms

Authors: Julio Vega

Abstract:

Expression recognition is a field of Artificial Intelligence whose main objectives are to recognize basic forms of affective expression that appear on people’s faces and contributing to behavioral studies. In this work, a ROS node has been developed that, based on Deep Learning techniques, is capable of detecting the facial expressions of the people that appear in the image. These algorithms were optimized so that they can be executed in real time on an embedded platform. The experiments were carried out in a PC with a USB camera and in a Raspberry Pi 4 with a PiCamera. The final results shows a plausible system, which is capable to work in real time even in an embedded platform.

Keywords: python, low-cost, raspberry pi, emotion detection, human-robot interaction, ROS node

Procedia PDF Downloads 100
2526 A Comparative Study on Vowel Articulation in Malayalam Speaking Children Using Cochlear Implant

Authors: Deepthy Ann Joy, N. Sreedevi

Abstract:

Hearing impairment (HI) at an early age, identified before the onset of language development can reduce the negative effect on speech and language development of children. Early rehabilitation is very important in the improvement of speech production in children with HI. Other than conventional hearing aids, Cochlear Implants are being used in the rehabilitation of children with HI. However, delay in acquisition of speech and language milestones persist in children with Cochlear Implant (CI). Delay in speech milestones are reflected through speech sound errors. These errors reflect the temporal and spectral characteristics of speech. Hence, acoustical analysis of the speech sounds will provide a better representation of speech production skills in children with CI. The present study aimed at investigating the acoustic characteristics of vowels in Malayalam speaking children with a cochlear implant. The participants of the study consisted of 20 Malayalam speaking children in the age range of four and seven years. The experimental group consisted of 10 children with CI, and the control group consisted of 10 typically developing children. Acoustic analysis was carried out for 5 short (/a/, /i/, /u/, /e/, /o/) and 5 long vowels (/a:/, /i:/, /u:/, /e:/, /o:/) in word-initial position. The responses were recorded and analyzed for acoustic parameters such as Vowel duration, Ratio of the duration of a short and long vowel, Formant frequencies (F₁ and F₂) and Formant Centralization Ratio (FCR) computed using the formula (F₂u+F₂a+F₁i+F₁u)/(F₂i+F₁a). Findings of the present study indicated that the values for vowel duration were higher in experimental group compared to the control group for all the vowels except for /u/. Ratio of duration of short and long vowel was also found to be higher in experimental group compared to control group except for /i/. Further F₁ for all vowels was found to be higher in experimental group with variability noticed in F₂ values. FCR was found be higher in experimental group, indicating vowel centralization. Further, the results of independent t-test revealed no significant difference across the parameters in both the groups. It was found that the spectral and temporal measures in children with CI moved towards normal range. The result emphasizes the significance of early rehabilitation in children with hearing impairment. The role of rehabilitation related aspects are also discussed in detail which can be clinically incorporated for the betterment of speech therapeutic services in children with CI.

Keywords: acoustics, cochlear implant, Malayalam, vowels

Procedia PDF Downloads 119
2525 Physiology of Temporal Lobe and Limbic System

Authors: Khaled A. Abdel-Sater

Abstract:

There are four areas of the temporal lobe. Primary auditory area (areas 41 and 42); it is for the perception of auditory impulse, auditory association area (area 22, 21, and 20): Areas 21 and 20 are for understanding and interpretation of auditory sensation, recognition of language, and long-term memories. Area 22, also called Wernicke’s area, and a sensory speech centre. It is for interpretation of auditory and visual information, formation of thoughts in the mind, and choice of words to be used. Ideas and thoughts originate in it. The limbic system is a part of cortical and subcortical structure forming a ring around the brainstem. Cortical structures are the orbitofrontal area, subcallosal gyrus, cingulate gyrus, parahippocampal gyrus, and uncus. Subcortical structures are the hypothalamus, hippocampus, amygdala, septum, paraolfactory area, anterior nucleus of the thalamus portions of the basal ganglia. There are several physiological functions of the limbic system, including regulation of behavior, motivation, and emotion.

Keywords: limbic system, motivation, emotions, temporal lobe

Procedia PDF Downloads 170
2524 Exploring Pre-Trained Automatic Speech Recognition Model HuBERT for Early Alzheimer’s Disease and Mild Cognitive Impairment Detection in Speech

Authors: Monica Gonzalez Machorro

Abstract:

Dementia is hard to diagnose because of the lack of early physical symptoms. Early dementia recognition is key to improving the living condition of patients. Speech technology is considered a valuable biomarker for this challenge. Recent works have utilized conventional acoustic features and machine learning methods to detect dementia in speech. BERT-like classifiers have reported the most promising performance. One constraint, nonetheless, is that these studies are either based on human transcripts or on transcripts produced by automatic speech recognition (ASR) systems. This research contribution is to explore a method that does not require transcriptions to detect early Alzheimer’s disease (AD) and mild cognitive impairment (MCI). This is achieved by fine-tuning a pre-trained ASR model for the downstream early AD and MCI tasks. To do so, a subset of the thoroughly studied Pitt Corpus is customized. The subset is balanced for class, age, and gender. Data processing also involves cropping the samples into 10-second segments. For comparison purposes, a baseline model is defined by training and testing a Random Forest with 20 extracted acoustic features using the librosa library implemented in Python. These are: zero-crossing rate, MFCCs, spectral bandwidth, spectral centroid, root mean square, and short-time Fourier transform. The baseline model achieved a 58% accuracy. To fine-tune HuBERT as a classifier, an average pooling strategy is employed to merge the 3D representations from audio into 2D representations, and a linear layer is added. The pre-trained model used is ‘hubert-large-ls960-ft’. Empirically, the number of epochs selected is 5, and the batch size defined is 1. Experiments show that our proposed method reaches a 69% balanced accuracy. This suggests that the linguistic and speech information encoded in the self-supervised ASR-based model is able to learn acoustic cues of AD and MCI.

Keywords: automatic speech recognition, early Alzheimer’s recognition, mild cognitive impairment, speech impairment

Procedia PDF Downloads 100
2523 Face Recognition Using Body-Worn Camera: Dataset and Baseline Algorithms

Authors: Ali Almadan, Anoop Krishnan, Ajita Rattani

Abstract:

Facial recognition is a widely adopted technology in surveillance, border control, healthcare, banking services, and lately, in mobile user authentication with Apple introducing “Face ID” moniker with iPhone X. A lot of research has been conducted in the area of face recognition on datasets captured by surveillance cameras, DSLR, and mobile devices. Recently, face recognition technology has also been deployed on body-worn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic so far, without the availability of any publicly available datasets with a sufficient sample size. This paper aims to advance research in the area of face recognition using body-worn cameras. To this aim, the contribution of this work is two-fold: (1) collection of a dataset consisting of a total of 136,939 facial images of 102 subjects captured using body-worn cameras in in-door and daylight conditions and (2) evaluation of various deep-learning architectures for face identification on the collected dataset. Experimental results suggest a maximum True Positive Rate(TPR) of 99.86% at False Positive Rate(FPR) of 0.000 obtained by SphereFace based deep learning architecture in daylight condition. The collected dataset and the baseline algorithms will promote further research and development. A downloadable link of the dataset and the algorithms is available by contacting the authors.

Keywords: face recognition, body-worn cameras, deep learning, person identification

Procedia PDF Downloads 141
2522 Audience Members' Perspective-Taking Predicts Accurate Identification of Musically Expressed Emotion in a Live Improvised Jazz Performance

Authors: Omer Leshem, Michael F. Schober

Abstract:

This paper introduces a new method for assessing how audience members and performers feel and think during live concerts, and how audience members' recognized and felt emotions are related. Two hypotheses were tested in a live concert setting: (1) that audience members’ cognitive perspective taking ability predicts their accuracy in identifying an emotion that a jazz improviser intended to express during a performance, and (2) that audience members' affective empathy predicts their likelihood of feeling the same emotions as the performer. The aim was to stage a concert with audience members who regularly attend live jazz performances, and to measure their cognitive and affective reactions during the performance as non-intrusively as possible. Pianist and Grammy nominee Andy Milne agreed, without knowing details of the method or hypotheses, to perform a full-length solo improvised concert that would include an ‘unusual’ piece. Jazz fans were recruited through typical advertising for New York City jazz performances. The event was held at the New School’s Glass Box Theater, the home of leading NYC jazz venue ‘The Stone.’ Audience members were charged typical NYC jazz club admission prices; advertisements informed them that anyone who chose to participate in the study would be reimbursed their ticket price after the concert. The concert, held in April 2018, had 30 attendees, 23 of whom participated in the study. Twenty-two minutes into the concert, the performer was handed a paper note with the instruction: ‘Perform a 3-5-minute improvised piece with the intention of conveying sadness.’ (Sadness was chosen based on previous music cognition lab studies, where solo listeners were less likely to select sadness as the musically-expressed emotion accurately from a list of basic emotions, and more likely to misinterpret sadness as tenderness). Then, audience members and the performer were invited to respond to a questionnaire from a first envelope under their seat. Participants used their own words to describe the emotion the performer had intended to express, and then to select the intended emotion from a list. They also reported the emotions they had felt while listening using Izard’s differential emotions scale. The concert then continued as usual. At the end, participants answered demographic questions and Davis’ interpersonal reactivity index (IRI), a 28-item scale designed to assess both cognitive and affective empathy. Hypothesis 1 was supported: audience members with greater cognitive empathy were more likely to accurately identify sadness as the expressed emotion. Moreover, audience members who accurately selected ‘sadness’ reported feeling marginally sadder than people who did not select sadness. Hypotheses 2 was not supported; audience members with greater affective empathy were not more likely to feel the same emotions as the performer. If anything, members with lower cognitive perspective-taking ability had marginally greater emotional overlap with the performer, which makes sense given that these participants were less likely to identify the music as sad, which corresponded with the performer’s actual feelings. Results replicate findings from solo lab studies in a concert setting and demonstrate the viability of exploring empathy and collective cognition in improvised live performance.

Keywords: audience, cognition, collective cognition, emotion, empathy, expressed emotion, felt emotion, improvisation, live performance, recognized emotion

Procedia PDF Downloads 109