Search results for: collecting speech emotion dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2814

Search results for: collecting speech emotion dataset

2604 General Architecture for Automation of Machine Learning Practices

Authors: U. Borasi, Amit Kr. Jain, Rakesh, Piyush Jain

Abstract:

Data collection, data preparation, model training, model evaluation, and deployment are all processes in a typical machine learning workflow. Training data needs to be gathered and organised. This often entails collecting a sizable dataset and cleaning it to remove or correct any inaccurate or missing information. Preparing the data for use in the machine learning model requires pre-processing it after it has been acquired. This often entails actions like scaling or normalising the data, handling outliers, selecting appropriate features, reducing dimensionality, etc. This pre-processed data is then used to train a model on some machine learning algorithm. After the model has been trained, it needs to be assessed by determining metrics like accuracy, precision, and recall, utilising a test dataset. Every time a new model is built, both data pre-processing and model training—two crucial processes in the Machine learning (ML) workflow—must be carried out. Thus, there are various Machine Learning algorithms that can be employed for every single approach to data pre-processing, generating a large set of combinations to choose from. Example: for every method to handle missing values (dropping records, replacing with mean, etc.), for every scaling technique, and for every combination of features selected, a different algorithm can be used. As a result, in order to get the optimum outcomes, these tasks are frequently repeated in different combinations. This paper suggests a simple architecture for organizing this largely produced “combination set of pre-processing steps and algorithms” into an automated workflow which simplifies the task of carrying out all possibilities.

Keywords: machine learning, automation, AUTOML, architecture, operator pool, configuration, scheduler

Procedia PDF Downloads 33
2603 Cultural-Creative Design with Language Figures of Speech

Authors: Wei Chen Chang, Ming Yu Hsiao

Abstract:

The commodity takes one kind of mark, the designer how to construction and interpretation the user how to use the process and effectively convey message in design education has always been an important issue. Cultural-creative design refers to signifying cultural heritage for product design. In terms of Peirce’s Semiotic Triangle: signifying elements-object-interpretant, signifying elements are the outcomes of design, the object is cultural heritage, and the interpretant is the positioning and description of product design. How to elaborate the positioning, design, and development of a product is a narrative issue of the interpretant, and how to shape the signifying elements of a product by modifying and adapting styles is a rhetoric matter. This study investigated the rhetoric of elements signifying products to develop a rhetoric model with cultural style. Figures of speech are a rhetoric method in narrative. By adapting figures of speech to the interpretant, this study developed the rhetoric context of cultural context by narrative means. In this two-phase study, phase I defines figures of speech and phase II analyzes existing cultural-creative products in terms of figures of speech to develop a rhetoric of style model. We expect it can reference for the future development of Cultural-creative design.

Keywords: cultural-creative design, cultural-creative products, figures of speech, Peirce’s semiotic triangle, rhetoric of style model

Procedia PDF Downloads 348
2602 A Literature Review of Emotional Labor and Non-Task Behavior

Authors: Yeong-Gyeong Choi, Kyoung-Seok Kim

Abstract:

This study, literature review research, intends to deal with the problem of conceptual ambiguity among research on emotional labor, and to look into the evolutionary trends and changing aspects of defining the concept of emotional labor. In addition, in existing studies, deep acting and surface acting are highly related to a positive outcome variable and a negative outcome variable, respectively. It was confirmed that for employees performing emotional labor, deep acting and surface acting are highly related to OCB and CWB, respectively. While positive emotion that employees come to experience during job performance process can easily trigger a positive non-task behavior such as OCB, negative emotion that employees experience through excessive workload or unfair treatment can easily induce a negative behavior like CWB. The two management behaviors of emotional labor, surface acting and deep acting, can have either a positive or negative effect on non-task behavior of employees, depending on which one they would choose. Thus, the purpose of this review paper is to clarify the relationship between emotional labor and non-task behavior more specifically.

Keywords: emotion labor, non-task behavior, OCB, CWB

Procedia PDF Downloads 324
2601 Understanding Mental Constructs of Language and Emotion

Authors: Sakshi Ghai

Abstract:

The word ‘emotion’ has been microscopically studied through psychological, anthropological and biological lenses and have indubitably been one of the most researched concepts as, in all situations and reactions that constitute human life, emotions form the very niche of our mutual existence. While understanding the social aspects of cognition, one can realize that emotions are deeply interwoven with language and thereby are pivotal in inducing human actions and behavior. The society or the outward social structure is the result of the inward psychological structure of our human relationships, for the individual is the result of the total experience, knowledge and conduct of man. The aim of this paper is threefold: first, to establish the relation between mental representations of emotions and its neuropsychological connection with language on a conscious and sub-conscious level; secondly, to describe how innate, basic and higher cognitive emotions affect the constantly changing state of an agent and peruse its assistance in determining the moral compass within all beings. Lastly, in the course of this paper, the concept of the architecture of mind is explored considering how it has developed an ability to display adaptive emotional states and responses, which are in sync with the language of thought. For every response to the social environment is so deeply determined by the very social milieu in which one is situated, language has a fundamental role in constructing emotions and articulating behavior. Being linguistic beings, we tend to associate emotion, feelings and other aspects of inwards mental states intrinsically with the language we use. This paper aims to devise a discursive approach to understand how emotions are fabricated, intertwined with the mental constructs further expressed and communicated through the various units of language.

Keywords: mental representation, emotion, language, psychology

Procedia PDF Downloads 264
2600 Data Gathering and Analysis for Arabic Historical Documents

Authors: Ali Dulla

Abstract:

This paper introduces a new dataset (and the methodology used to generate it) based on a wide range of historical Arabic documents containing clean data simple and homogeneous-page layouts. The experiments are implemented on printed and handwritten documents obtained respectively from some important libraries such as Qatar Digital Library, the British Library and the Library of Congress. We have gathered and commented on 150 archival document images from different locations and time periods. It is based on different documents from the 17th-19th century. The dataset comprises differing page layouts and degradations that challenge text line segmentation methods. Ground truth is produced using the Aletheia tool by PRImA and stored in an XML representation, in the PAGE (Page Analysis and Ground truth Elements) format. The dataset presented will be easily available to researchers world-wide for research into the obstacles facing various historical Arabic documents such as geometric correction of historical Arabic documents.

Keywords: dataset production, ground truth production, historical documents, arbitrary warping, geometric correction

Procedia PDF Downloads 148
2599 Personality Moderates the Relation Between Mother´s Emotional Intelligence and Young Children´s Emotion Situation Knowledge

Authors: Natalia Alonso-Alberca, Ana I. Vergara

Abstract:

From the very first years of their life, children are confronted with situations in which they need to deal with emotions. The family provides the first emotional experiences, and it is in the family context that children usually take their first steps towards acquiring emotion knowledge. Parents play a key role in this important task, helping their children develop emotional skills that they will need in challenging situations throughout their lives. Specifically, mothers are models imitated by their children. They create specific spatial and temporal contexts in which children learn about emotions, their causes, consequences, and complexity. This occurs not only through what mothers say or do directly to the child. Rather, it occurs, to a large extent, through the example that they set using their own emotional skills. The aim of the current study was to analyze how maternal abilities to perceive and to manage emotions influence children’s emotion knowledge, specifically, their emotion situation knowledge, taking into account the role played by the mother’s personality, the time spent together, and controlling the effect of age, sex and the child’s verbal abilities. Participants were 153 children from 4 schools in Spain, and their mothers. Children (41.8% girls)age range was 35 - 72 months. Mothers (N = 140) age (M = 38.7; R = 27-49). Twelve mothers had more than one child participating in the study. Main variables were the child´s emotion situation knowledge (ESK), measured by the Emotion Matching Task (EMT), and receptive language, using the Picture Vocabulary Test. Also, their mothers´ Emotional Intelligence (EI), through the Mayer, Salovey, Caruso Emotional Intelligence Test (MSCEIT) and personality, with The Big Five Inventory were analyzed. The results showed that the predictive power of maternal emotional skills on ESK was moderated by the mother’s personality, affecting both the direction and size of the relationships detected: low neuroticism and low openness to experience lead to a positive influence of maternal EI on children’s ESK, while high levels in these personality dimensions resulted in a negative influence on child´s ESK. The time that the mother and the child spend together was revealed as a positive predictor of this EK, while it did not moderate the influence of the mother's EI on child’s ESK. In light of the results, we can infer that maternal EI is linked to children’s emotional skills, though high level of maternal EI does not necessarily predict a greater degree of emotionknowledge in children, which seems rather to depend on specific personality profiles. The results of the current study indicate that a good level of maternal EI does not guarantee that children will learn the emotional skills that foster prosocial adaptation. Rather, EI must be accompanied by certain psychological characteristics (personality traits in this case).

Keywords: emotional intelligence, emotion situation knowledge, mothers, personality, young children

Procedia PDF Downloads 103
2598 Exploratory Analysis of A Review of Nonexistence Polarity in Native Speech

Authors: Deawan Rakin Ahamed Remal, Sinthia Chowdhury, Sharun Akter Khushbu, Sheak Rashed Haider Noori

Abstract:

Native Speech to text synthesis has its own leverage for the purpose of mankind. The extensive nature of art to speaking different accents is common but the purpose of communication between two different accent types of people is quite difficult. This problem will be motivated by the extraction of the wrong perception of language meaning. Thus, many existing automatic speech recognition has been placed to detect text. Overall study of this paper mentions a review of NSTTR (Native Speech Text to Text Recognition) synthesis compared with Text to Text recognition. Review has exposed many text to text recognition systems that are at a very early stage to comply with the system by native speech recognition. Many discussions started about the progression of chatbots, linguistic theory another is rule based approach. In the Recent years Deep learning is an overwhelming chapter for text to text learning to detect language nature. To the best of our knowledge, In the sub continent a huge number of people speak in Bangla language but they have different accents in different regions therefore study has been elaborate contradictory discussion achievement of existing works and findings of future needs in Bangla language acoustic accent.

Keywords: TTR, NSTTR, text to text recognition, deep learning, natural language processing

Procedia PDF Downloads 105
2597 Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms

Authors: Aqsa Ali, Aleem Mushtaq, Attaullah Memon, Monna

Abstract:

In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.

Keywords: American sign language, assistive hand gesture interpreter, human-machine interface, machine learning, sensing glove

Procedia PDF Downloads 267
2596 Conversational Assistive Technology of Visually Impaired Person for Social Interaction

Authors: Komal Ghafoor, Tauqir Ahmad, Murtaza Hanif, Hira Zaheer

Abstract:

Assistive technology has been developed to support visually impaired people in their social interactions. Conversation assistive technology is designed to enhance communication skills, facilitate social interaction, and improve the quality of life of visually impaired individuals. This technology includes speech recognition, text-to-speech features, and other communication devices that enable users to communicate with others in real time. The technology uses natural language processing and machine learning algorithms to analyze spoken language and provide appropriate responses. It also includes features such as voice commands and audio feedback to provide users with a more immersive experience. These technologies have been shown to increase the confidence and independence of visually impaired individuals in social situations and have the potential to improve their social skills and relationships with others. Overall, conversation-assistive technology is a promising tool for empowering visually impaired people and improving their social interactions. One of the key benefits of conversation-assistive technology is that it allows visually impaired individuals to overcome communication barriers that they may face in social situations. It can help them to communicate more effectively with friends, family, and colleagues, as well as strangers in public spaces. By providing a more seamless and natural way to communicate, this technology can help to reduce feelings of isolation and improve overall quality of life. The main objective of this research is to give blind users the capability to move around in unfamiliar environments through a user-friendly device by face, object, and activity recognition system. This model evaluates the accuracy of activity recognition. This device captures the front view of the blind, detects the objects, recognizes the activities, and answers the blind query. It is implemented using the front view of the camera. The local dataset is collected that includes different 1st-person human activities. The results obtained are the identification of the activities that the VGG-16 model was trained on, where Hugging, Shaking Hands, Talking, Walking, Waving video, etc.

Keywords: dataset, visually impaired person, natural language process, human activity recognition

Procedia PDF Downloads 35
2595 Quantum Cum Synaptic-Neuronal Paradigm and Schema for Human Speech Output and Autism

Authors: Gobinathan Devathasan, Kezia Devathasan

Abstract:

Objective: To improve the current modified Broca-Wernicke-Lichtheim-Kussmaul speech schema and provide insight into autism. Methods: We reviewed the pertinent literature. Current findings, involving Brodmann areas 22, 46, 9,44,45,6,4 are based on neuropathology and functional MRI studies. However, in primary autism, there is no lucid explanation and changes described, whether neuropathology or functional MRI, appear consequential. Findings: We forward an enhanced model which may explain the enigma related to autism. Vowel output is subcortical and does need cortical representation whereas consonant speech is cortical in origin. Left lateralization is needed to commence the circuitry spin as our life have evolved with L-amino acids and left spin of electrons. A fundamental species difference is we are capable of three syllable-consonants and bi-syllable expression whereas cetaceans and songbirds are confined to single or dual consonants. The 4 key sites for speech are superior auditory cortex, Broca’s two areas, and the supplementary motor cortex. Using the Argand’s diagram and Reimann’s projection, we theorize that the Euclidean three dimensional synaptic neuronal circuits of speech are quantized to coherent waves, and then decoherence takes place at area 6 (spherical representation). In this quantum state complex, 3-consonant languages are instantaneously integrated and multiple languages can be learned, verbalized and differentiated. Conclusion: We postulate that evolutionary human speech is elevated to quantum interaction unlike cetaceans and birds to achieve the three consonants/bi-syllable speech. In classical primary autism, the sudden speech switches off and on noted in several cases could now be explained not by any anatomical lesion but failure of coherence. Area 6 projects directly into prefrontal saccadic area (8); and this further explains the second primary feature in autism: lack of eye contact. The third feature which is repetitive finger gestures, located adjacent to the speech/motor areas, are actual attempts to communicate with the autistic child akin to sign language for the deaf.

Keywords: quantum neuronal paradigm, cetaceans and human speech, autism and rapid magnetic stimulation, coherence and decoherence of speech

Procedia PDF Downloads 169
2594 Enhancing Fault Detection in Rotating Machinery Using Wiener-CNN Method

Authors: Mohamad R. Moshtagh, Ahmad Bagheri

Abstract:

Accurate fault detection in rotating machinery is of utmost importance to ensure optimal performance and prevent costly downtime in industrial applications. This study presents a robust fault detection system based on vibration data collected from rotating gears under various operating conditions. The considered scenarios include: (1) both gears being healthy, (2) one healthy gear and one faulty gear, and (3) introducing an imbalanced condition to a healthy gear. Vibration data was acquired using a Hentek 1008 device and stored in a CSV file. Python code implemented in the Spider environment was used for data preprocessing and analysis. Winner features were extracted using the Wiener feature selection method. These features were then employed in multiple machine learning algorithms, including Convolutional Neural Networks (CNN), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), and Random Forest, to evaluate their performance in detecting and classifying faults in both the training and validation datasets. The comparative analysis of the methods revealed the superior performance of the Wiener-CNN approach. The Wiener-CNN method achieved a remarkable accuracy of 100% for both the two-class (healthy gear and faulty gear) and three-class (healthy gear, faulty gear, and imbalanced) scenarios in the training and validation datasets. In contrast, the other methods exhibited varying levels of accuracy. The Wiener-MLP method attained 100% accuracy for the two-class training dataset and 100% for the validation dataset. For the three-class scenario, the Wiener-MLP method demonstrated 100% accuracy in the training dataset and 95.3% accuracy in the validation dataset. The Wiener-KNN method yielded 96.3% accuracy for the two-class training dataset and 94.5% for the validation dataset. In the three-class scenario, it achieved 85.3% accuracy in the training dataset and 77.2% in the validation dataset. The Wiener-Random Forest method achieved 100% accuracy for the two-class training dataset and 85% for the validation dataset, while in the three-class training dataset, it attained 100% accuracy and 90.8% accuracy for the validation dataset. The exceptional accuracy demonstrated by the Wiener-CNN method underscores its effectiveness in accurately identifying and classifying fault conditions in rotating machinery. The proposed fault detection system utilizes vibration data analysis and advanced machine learning techniques to improve operational reliability and productivity. By adopting the Wiener-CNN method, industrial systems can benefit from enhanced fault detection capabilities, facilitating proactive maintenance and reducing equipment downtime.

Keywords: fault detection, gearbox, machine learning, wiener method

Procedia PDF Downloads 53
2593 Ahmad Sabzi Balkhkanloo, Motahareh Sadat Hashemi, Seyede Marzieh Hosseini, Saeedeh Shojaee-Aliabadi, Leila Mirmoghtadaie

Authors: Elyria Kemp, Kelly Cowart, My Bui

Abstract:

According to the National Institute of Mental Health, an estimated 31.9% of adolescents have had an anxiety disorder. Several environmental factors may help to contribute to high levels of anxiety and depression in young people (i.e., Generation Z, Millennials). However, as young people negotiate life on social media, they may begin to evaluate themselves using excessively high standards and adopt self-perfectionism tendencies. Broadly defined, self-perfectionism involves very critical evaluations of the self. Perfectionism may also come from others and may manifest as socially prescribed perfectionism, and young adults are reporting higher levels of socially prescribed perfectionism than previous generations. This rising perfectionism is also associated with anxiety, greater physiological reactivity, and a sense of social disconnection. However, theories from psychology suggest that improvement in emotion regulation can contribute to enhanced psychological and emotional well-being. Emotion regulation refers to the ways people manage how and when they experience and express their emotions. Cognitive reappraisal and expressive suppression are common emotion regulation strategies. Cognitive reappraisal involves changing the meaning of a stimulus that involves construing a potentially emotion-eliciting situation in a way that changes its emotional impact. By contrast, expressive suppression involves inhibiting the behavioral expression of emotion. The purpose of this research is to examine the efficacy of social marketing initiatives which promote emotion regulation strategies to help young adults regulate their emotions. In Study 1 a single factor (emotional regulation strategy: a cognitive reappraisal, expressive, control) between-subjects design was conducted using an online, non-student consumer panel (n=96). Sixty-eight percent of participants were male, and 32% were female. Study participants belonged to the Millennial and Gen Z cohort, ranging in age from 22 to 35 (M=27). Participants were first told to spend at least three minutes writing about a public speaking appearance which made them anxious. The purpose of this exercise was to induce anxiety. Next, participants viewed one of three advertisements (randomly assigned) which promoted an emotion regulation strategy—cognitive reappraisal, expressive suppression, or an advertisement non-emotional in nature. After being exposed to one of the ads, participants responded to a measure composed of two items to access their emotional state and the efficacy of the messages in fostering emotion management. Findings indicated that individuals in the cognitive reappraisal condition (M=3.91) exhibited the most positive feelings and more effective emotion regulation than the expressive suppression (M=3.39) and control conditions (M=3.72, F(1,92) = 3.3, p<.05). Results from this research can be used by institutions (e.g., schools) in taking a leadership role in attacking anxiety and other mental health issues. Social stigmas regarding mental health can be removed and a more proactive stance can be taken in promoting healthy coping behaviors and strategies to manage negative emotions.

Keywords: emotion regulation, anxiety, social marketing, generation z

Procedia PDF Downloads 180
2592 Performance Analysis of VoIP Coders for Different Modulations Under Pervasive Environment

Authors: Jasbinder Singh, Harjit Pal Singh, S. A. Khan

Abstract:

The work, in this paper, presents the comparison of encoded speech signals by different VoIP narrow-band and wide-band codecs for different modulation schemes. The simulation results indicate that codec has an impact on the speech quality and also effected by modulation schemes.

Keywords: VoIP, coders, modulations, BER, MOS

Procedia PDF Downloads 485
2591 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 55
2590 A Method for the Extraction of the Character's Tendency from Korean Novels

Authors: Min-Ha Hong, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The character in the story-based content, such as novels and movies, is one of the core elements to understand the story. In particular, the character’s tendency is an important factor to analyze the story-based content, because it has a significant influence on the storyline. If readers have the knowledge of the tendency of characters before reading a novel, it will be helpful to understand the structure of conflict, episode and relationship between characters in the novel. It may therefore help readers to select novel that the reader wants to read. In this paper, we propose a method of extracting the tendency of the characters from a novel written in Korean. In advance, we build the dictionary with pairs of the emotional words in Korean and English since the emotion words in the novel’s sentences express character’s feelings. We rate the degree of polarity (positive or negative) of words in our emotional words dictionary based on SenticNet. Then we extract characters and emotion words from sentences in a novel. Since the polarity of a word grows strong or weak due to sentence features such as quotations and modifiers, our proposed method consider them to calculate the polarity of characters. The information of the extracted character’s polarity can be used in the book search service or book recommendation service.

Keywords: character tendency, data mining, emotion word, Korean novel

Procedia PDF Downloads 315
2589 Emotions in Human-Machine Interaction

Authors: Joanna Maj

Abstract:

Awe inspiring is the idea that emotions could be present in human-machine interactions, both on the human side as well as the machine side. Human factors present intriguing components and are examined in detail while discussing this controversial topic. Mood, attention, memory, performance, assessment, causes of emotion, and neurological responses are analyzed as components of the interaction. Problems in computer-based technology, revenge of the system on its users and design, and applications comprise a major part of all descriptions and examples throughout this paper. It also allows for critical thinking while challenging intriguing questions regarding future directions in research, dealing with emotion in human-machine interactions.

Keywords: biocomputing, biomedical engineering, emotions, human-machine interaction, interfaces

Procedia PDF Downloads 110
2588 Multimodal Data Fusion Techniques in Audiovisual Speech Recognition

Authors: Hadeer M. Sayed, Hesham E. El Deeb, Shereen A. Taie

Abstract:

In the big data era, we are facing a diversity of datasets from different sources in different domains that describe a single life event. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. Multimodal fusion is the concept of integrating information from multiple modalities in a joint representation with the goal of predicting an outcome through a classification task or regression task. In this paper, multimodal fusion techniques are classified into two main classes: model-agnostic techniques and model-based approaches. It provides a comprehensive study of recent research in each class and outlines the benefits and limitations of each of them. Furthermore, the audiovisual speech recognition task is expressed as a case study of multimodal data fusion approaches, and the open issues through the limitations of the current studies are presented. This paper can be considered a powerful guide for interested researchers in the field of multimodal data fusion and audiovisual speech recognition particularly.

Keywords: multimodal data, data fusion, audio-visual speech recognition, neural networks

Procedia PDF Downloads 85
2587 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: bi-lingual, children who stutter, children with language impairment, hidden markov models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies

Procedia PDF Downloads 195
2586 Emotion Regulation and Executive Functioning Scale for Children and Adolescents (REMEX): Scale Development

Authors: Cristina Costescu, Carmen David, Adrian Roșan

Abstract:

Executive functions (EF) and emotion regulation strategies are processes that allow individuals to function in an adaptative way and to be goal-oriented, which is essential for success in daily living activities, at school, or in social contexts. The Emotion Regulation and Executive Functioning Scale for Children and Adolescents (REMEX) represents an empirically based tool (based on the model of EF developed by Diamond) for evaluating significant dimensions of child and adolescent EFs and emotion regulation strategies, mainly in school contexts. The instrument measures the following dimensions: working memory, inhibition, cognitive flexibility, executive attention, planning, emotional control, and emotion regulation strategies. Building the instrument involved not only a top-down process, as we selected the content in accordance with prominent models of FE, but also a bottom-up one, as we were able to identify valid contexts in which FE and ER are put to use. For the construction of the instrument, we implemented three focus groups with teachers and other professionals since the aim was to develop an accurate, objective, and ecological instrument. We used the focus group method in order to address each dimension and to yield a bank of items to be further tested. Each dimension is addressed through a task that the examiner will apply and through several items derived from the main task. For the validation of the instrument, we plan to use item response theory (IRT), also known as the latent response theory, that attempts to explain the relationship between latent traits (unobservable cognitive processes) and their manifestations (i.e., observed outcomes, responses, or performance). REMEX represents an ecological scale that integrates a current scientific understanding of emotion regulation and EF and is directly applicable to school contexts, and it can be very useful for developing intervention protocols. We plan to test his convergent validity with the Childhood Executive Functioning Inventory (CHEXI) and Emotion Dysregulation Inventory (EDI) and divergent validity between a group of typically developing children and children with neurodevelopmental disorders, aged between 6 and 9 years old. In a previous pilot study, we enrolled a sample of 40 children with autism spectrum disorders and attention-deficit/hyperactivity disorder aged 6 to 12 years old, and we applied the above-mentioned scales (CHEXI and EDI). Our results showed that deficits in planning, bebavior regulation, inhibition, and working memory predict high levels of emotional reactivity, leading to emotional and behavioural problems. Considering previous results, we expect our findings to provide support for the validity and reliability of the REMEX version as an ecological instrument for assessing emotion regulation and EF in children and for key features of its uses in intervention protocols.

Keywords: executive functions, emotion regulation, children, item response theory, focus group

Procedia PDF Downloads 75
2585 Emotional and Physiological Reaction While Listening the Speech of Adults Who Stutter

Authors: Xharavina V., Gallopeni F., Ahmeti K.

Abstract:

Stuttered speech is filled with intermittent sound prolongations and/or rapid part word repetitions. Oftentimes, these aberrant acoustic behaviors are associated with intermittent physical tension and struggle behaviors such as head jerks, arm jerks, finger tapping, excessive eye-blinks, etc. Additionally, the jarring nature of acoustic and physical manifestations that often accompanies moderate-severe stuttering may induce negative emotional responses in listeners, which alters communication between the person who stutters and their listeners. However, researches for the influence of negative emotions in the communication and for physical reaction are limited. Therefore, to compare psycho-physiological responses of fluent adults, while listening the speech of adults who speak fluency and adults who stutter, are necessary. This study comprises the experimental method, with total of 104 participants (average age-20 years old, SD=2.1), divided into 3 groups. All participants self-reported no impairments in speech, language, or hearing. Exploring the responses of the participants, there were used two records speeches; a voice who speaks fluently and the voice who stutters. Heartbeats and the pulse were measured by the digital blood pressure monitor called 'Tensoval', as a physiological response to the fluent and stuttering sample. Meanwhile, the emotional responses of participants were measured by the self-reporting questionnaire (Steenbarger, 2001). Results showed an increase in heartbeats during the stuttering speech compared with the fluent sample (p < 0.5). The listeners also self-reported themselves as more alive, unhappy, nervous, repulsive, sad, tense, distracted and upset when listening the stuttering words versus the words of the fluent adult (where it was reported to experience positive emotions). These data support the notions that speech with stuttering can bring a psycho-physical reaction to the listeners. Speech pathologists should be aware that listeners show intolerable physiological reactions to stuttering that remain visible over time.

Keywords: emotional, physiological, stuttering, fluent speech

Procedia PDF Downloads 122
2584 Evaluating Models Through Feature Selection Methods Using Data Driven Approach

Authors: Shital Patil, Surendra Bhosale

Abstract:

Cardiac diseases are the leading causes of mortality and morbidity in the world, from recent few decades accounting for a large number of deaths have emerged as the most life-threatening disorder globally. Machine learning and Artificial intelligence have been playing key role in predicting the heart diseases. A relevant set of feature can be very helpful in predicting the disease accurately. In this study, we proposed a comparative analysis of 4 different features selection methods and evaluated their performance with both raw (Unbalanced dataset) and sampled (Balanced) dataset. The publicly available Z-Alizadeh Sani dataset have been used for this study. Four feature selection methods: Data Analysis, minimum Redundancy maximum Relevance (mRMR), Recursive Feature Elimination (RFE), Chi-squared are used in this study. These methods are tested with 8 different classification models to get the best accuracy possible. Using balanced and unbalanced dataset, the study shows promising results in terms of various performance metrics in accurately predicting heart disease. Experimental results obtained by the proposed method with the raw data obtains maximum AUC of 100%, maximum F1 score of 94%, maximum Recall of 98%, maximum Precision of 93%. While with the balanced dataset obtained results are, maximum AUC of 100%, F1-score 95%, maximum Recall of 95%, maximum Precision of 97%.

Keywords: cardio vascular diseases, machine learning, feature selection, SMOTE

Procedia PDF Downloads 90
2583 Effect of Signal Acquisition Procedure on Imagined Speech Classification Accuracy

Authors: M.R Asghari Bejestani, Gh. R. Mohammad Khani, V.R. Nafisi

Abstract:

Imagined speech recognition is one of the most interesting approaches to BCI development and a lot of works have been done in this area. Many different experiments have been designed and hundreds of combinations of feature extraction methods and classifiers have been examined. Reported classification accuracies range from the chance level to more than 90%. Based on non-stationary nature of brain signals, we have introduced 3 classification modes according to time difference in inter and intra-class samples. The modes can explain the diversity of reported results and predict the range of expected classification accuracies from the brain signal accusation procedure. In this paper, a few samples are illustrated by inspecting results of some previous works.

Keywords: brain computer interface, silent talk, imagined speech, classification, signal processing

Procedia PDF Downloads 128
2582 Translation and Sociolinguistics of Classical Books

Authors: Laura de Almeida

Abstract:

This paper aims to present research involving the translation of classical books originally in English and translated into the Portuguese language. The objective is to analyze the linguistic varieties evident and how they appear in the other language the work was translated into. We based our study on the sociolinguistics theory, more specifically, the study of the Black English Vernacular. Our methodology is built on collecting data from the speech characters of the Black English Vernacular from some books such as The Adventures of Huckleberry Finn by Mark Twain. On doing so, we compare the two versions of a book and how they reflected the linguistic variety. Our purpose is to show that some translators do not worry when dealing with linguistic variety. In other words, they just translate the story without taking into account some important linguistic aspects which need attention, such as language variation.

Keywords: classical books, linguistic variation, sociolinguistics, translation

Procedia PDF Downloads 371
2581 The Importance of the Historical Approach in the Linguistic Research

Authors: Zoran Spasovski

Abstract:

The paper shortly discusses the significance and the benefits of the historical approach in the research of languages by presenting examples of it in the fields of phonetics and phonology, lexicology, morphology, syntax, and even in the onomastics (toponomy and anthroponomy). The examples from the field of phonetics/phonology include insights into animal speech and its evolution into human speech, the evolution of the sounds of human speech from vocals to glides and consonants and from velar consonants to palatal, etc., on well-known examples of former researchers. Those from the field of lexicology show shortly the formation of the lexemes and their evolution; the morphology and syntax are explained by examples of the development of grammar and syntax forms, and the importance of the historical approach in the research of place-names and personal names is briefly outlined through examples of place-names and personal names and surnames, and the conclusions that come from it, in different languages.

Keywords: animal speech, glotogenesis, grammar forms, lexicology, place-names, personal names, surnames, syntax categories

Procedia PDF Downloads 49
2580 Emotion Classification Using Recurrent Neural Network and Scalable Pattern Mining

Authors: Jaishree Ranganathan, MuthuPriya Shanmugakani Velsamy, Shamika Kulkarni, Angelina Tzacheva

Abstract:

Emotions play an important role in everyday life. An-alyzing these emotions or feelings from social media platforms like Twitter, Facebook, blogs, and forums based on user comments and reviews plays an important role in various factors. Some of them include brand monitoring, marketing strategies, reputation, and competitor analysis. The opinions or sentiments mined from such data helps understand the current state of the user. It does not directly provide intuitive insights on what actions to be taken to benefit the end user or business. Actionable Pattern Mining method provides suggestions or actionable recommendations on what changes or actions need to be taken in order to benefit the end user. In this paper, we propose automatic classification of emotions in Twitter data using Recurrent Neural Network - Gated Recurrent Unit. We achieve training accuracy of 87.58% and validation accuracy of 86.16%. Also, we extract action rules with respect to the user emotion that helps to provide actionable suggestion.

Keywords: emotion mining, twitter, recurrent neural network, gated recurrent unit, actionable pattern mining

Procedia PDF Downloads 142
2579 An Automatic Speech Recognition of Conversational Telephone Speech in Malay Language

Authors: M. Draman, S. Z. Muhamad Yassin, M. S. Alias, Z. Lambak, M. I. Zulkifli, S. N. Padhi, K. N. Baharim, F. Maskuriy, A. I. A. Rahim

Abstract:

The performance of Malay automatic speech recognition (ASR) system for the call centre environment is presented. The system utilizes Kaldi toolkit as the platform to the entire library and algorithm used in performing the ASR task. The acoustic model implemented in this system uses a deep neural network (DNN) method to model the acoustic signal and the standard (n-gram) model for language modelling. With 80 hours of training data from the call centre recordings, the ASR system can achieve 72% of accuracy that corresponds to 28% of word error rate (WER). The testing was done using 20 hours of audio data. Despite the implementation of DNN, the system shows a low accuracy owing to the varieties of noises, accent and dialect that typically occurs in Malaysian call centre environment. This significant variation of speakers is reflected by the large standard deviation of the average word error rate (WERav) (i.e., ~ 10%). It is observed that the lowest WER (13.8%) was obtained from recording sample with a standard Malay dialect (central Malaysia) of native speaker as compared to 49% of the sample with the highest WER that contains conversation of the speaker that uses non-standard Malay dialect.

Keywords: conversational speech recognition, deep neural network, Malay language, speech recognition

Procedia PDF Downloads 298
2578 The Effect of Emotion Self-Confidence and Perceived Social Support on Hong Kong Higher-Education Students' Suicide-Related Emotional Experiences

Authors: K. C. Ching

Abstract:

There is growing public concern over the increasing prevalence of student suicide in Hong Kong. Some identify the problem with insufficient social support, while some attribute it to the vast fluctuations in emotional experience and the hindrances to emotion-regulation, both typical of adolescence and emerging adulthood. This study is thus designed to explore the respective effect of perceived social support and emotion self-confidence, on positive emotions and negative emotions. Fifty-seven Hong Kong higher-education students (17 males, 40 females) aged between 18 and 25 (M = 21.78) responded to an online questionnaire consisted of self-reported measures of perceived social support, emotional self-confidence, positive emotions, and negative emotions. Hierarchical regression analysis revealed that emotional self-confidence positively associated with positive emotions and negatively with negative emotions, while perceived social support positively associated with positive emotions but was not related to negative emotions. Perceived social support and emotional self-confidence both predicted positive emotions, but did not interact to predict any emotional outcome. It is concluded that students’ positive and negative emotional experiences are closely related to their emotion-regulation process. But for social support, its effect is merely protective, meaning that although perceived social support generally promotes positive emotions, it alone does not suffice to alleviate students’ negative emotions. These conclusions carry profound implications to suicide prevention practices, including that most existing suicide prevention campaigns should advance from merely fostering mutual support to directly promoting adaptive coping of emotional negativity.

Keywords: emerging adulthood, emotional self-confidence, hong kong, perceived social support, suicide prevention

Procedia PDF Downloads 110
2577 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: DBSCAN, potential function, speech signal, the UBSS model

Procedia PDF Downloads 108
2576 Propagation of the Effects of Certain Types of Military Psychological Operations in a Networked Population

Authors: Colette Faucher

Abstract:

In modern asymmetric conflicts, the Armed Forces generally have to intervene in countries where the internal peace is in danger. They must make the local population an ally in order to be able to deploy the necessary military actions with its support. For this purpose, psychological operations (PSYOPs) are used to shape people’s behaviors and emotions by the modification of their attitudes in acting on their perceptions. PSYOPs aim at elaborating and spreading a message that must be read, listened to and/or looked at, then understood by the info-targets in order to get from them the desired behavior. A message can generate in the info-targets, reasoned thoughts, spontaneous emotions or reflex behaviors, this effect partly depending on the means of conveyance used to spread this message. In this paper, we focus on psychological operations that generate emotions. We present a method based on the Intergroup Emotion Theory, that determines, from the characteristics of the conveyed message and of the people from the population directly reached by the means of conveyance (direct info-targets), the emotion likely to be triggered in them and we simulate the propagation of the effects of such a message on indirect info-targets that are connected to them through the social networks that structure the population.

Keywords: military psychological operations, social identity, social network, emotion propagation

Procedia PDF Downloads 386
2575 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 260