Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 2504

Search results for: speech emotion recognition

2414 Annexation (Al-Iḍāfah) in Thariq bin Ziyad’s Speech

Abstract:

Annexation is a typical construction that commonly used in Arabic language. The use of the construction appears in Arabic speech such as the speech of Thariq bin Ziyad. The speech as one of the most famous speeches in the history of Islam uses many annexations. This qualitative research paper uses the secondary data by library method. Based on the data, this paper concludes that the speech has two basic structures with some variations and has some grammatical relationship. Different from the other researches that identify the speech in sociology field, the speech in this paper will be analyzed in linguistic field to take a look at the structure of its annexation as well as the grammatical relationship.

Keywords: annexation, Thariq bin Ziyad, grammatical relationship, Arabic syntax

Procedia PDF Downloads 279

2413 An Exploratory Survey Questionnaire to Understand What Emotions Are Important and Difficult to Communicate for People with Dysarthria and Their Methodology of Communicating

Authors: Lubna Alhinti, Heidi Christensen, Stuart Cunningham

Abstract:

People with speech disorders may rely on augmentative and alternative communication (AAC) technologies to help them communicate. However, the limitations of the current AAC technologies act as barriers to the optimal use of these technologies in daily communication settings. The ability to communicate effectively relies on a number of factors that are not limited to the intelligibility of the spoken words. In fact, non-verbal cues play a critical role in the correct comprehension of messages and having to rely on verbal communication only, as is the case with current AAC technology, may contribute to problems in communication. This is especially true for people’s ability to express their feelings and emotions, which are communicated to a large part through non-verbal cues. This paper focuses on understanding more about the non-verbal communication ability of people with dysarthria, with the overarching aim of this research being to improve AAC technology by allowing people with dysarthria to better communicate emotions. Preliminary survey results are presented that gives an understanding of how people with dysarthria convey emotions, what emotions that are important for them to get across, what emotions that are difficult for them to convey, and whether there is a difference in communicating emotions when speaking to familiar versus unfamiliar people.

Keywords: alternative and augmentative communication technology, dysarthria, speech emotion recognition, VIVOCA

Procedia PDF Downloads 119

2412 Blind Speech Separation Using SRP-PHAT Localization and Optimal Beamformer in Two-Speaker Environments

Authors: Hai Quang Hong Dam, Hai Ho, Minh Hoang Le Ngo

Abstract:

This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.

Keywords: blind speech separation, voice activity detector, SRP-PHAT, optimal beamformer

Procedia PDF Downloads 251

2411 Job Characteristics, Emotion Regulation and University Teachers' Well-Being: A Job Demands-Resources Analysis

Authors: Jiying Han

Abstract:

Teaching is widely known to be an emotional endeavor, and teachers’ ability to regulate their emotions is important for their well-being and the effectiveness of their classroom management. Considering that teachers’ emotion regulation is an underexplored issue in the field of educational research, some studies have attempted to explore the role of emotion regulation in teachers’ work and to explore the links between teachers’ emotion regulation, job characteristics, and well-being, based on the Job Demands-Resources (JD-R) model. However, those studies targeted primary or secondary teachers. So far, very little is known about the relationships between university teachers’ emotion regulation and its antecedents and effects on teacher well-being. Based on the job demands-resources model and emotion regulation theory, this study examined the relationships between job characteristics of university teaching (i.e., emotional job demands and teaching support), emotion regulation strategies (i.e., reappraisal and suppression), and university teachers’ well-being. Data collected from a questionnaire survey of 643 university teachers in China were analysed. The results indicated that (1) both emotional job demands and teaching support had desirable effects on university teachers’ well-being; (2) both emotional job demands and teaching support facilitated university teachers’ use of reappraisal strategies; and (3) reappraisal was beneficial to university teachers’ well-being, whereas suppression was harmful. These findings support the applicability of the job demands-resources model to the contexts of higher education and highlight the mediating role of emotion regulation.

Keywords: emotional job demands, teaching support, emotion regulation strategies, the job demands-resources model

Procedia PDF Downloads 123

2410 A Cross-Dialect Statistical Analysis of Final Declarative Intonation in Tuvinian

Authors: D. Beziakina, E. Bulgakova

Abstract:

This study continues the research on Tuvinian intonation and presents a general cross-dialect analysis of intonation of Tuvinian declarative utterances, specifically the character of the tone movement in order to test the hypothesis about the prevalence of level tone in some Tuvinian dialects. The results of the analysis of basic pitch characteristics of Tuvinian speech (in general and in comparison with two other Turkic languages - Uzbek and Azerbaijani) are also given in this paper. The goal of our work was to obtain the ranges of pitch parameter values typical for Tuvinian speech. Such language-specific values can be used in speaker identification systems in order to get more accurate results of ethnic speech analysis. We also present the results of a cross-dialect analysis of declarative intonation in the poorly studied Tuvinian language.

Keywords: speech analysis, statistical analysis, speaker recognition, identification of person

Procedia PDF Downloads 439

2409 The Relationships among Learning Emotion, Major Satisfaction, Learning Flow, and Academic Achievement in Medical School Students

Authors: S. J. Yune, S. Y. Lee, S. J. Im, B. S. Kam, S. Y. Baek

Abstract:

This study explored whether academic emotion, major satisfaction, and learning flow are associated with academic achievement in medical school. We know that emotion and affective factors are important factors in students' learning and performance. Emotion has taken the stage in much of contemporary educational psychology literature, no longer relegated to secondary status behind traditionally studied cognitive constructs. Medical school students (n=164) completed academic emotion, major satisfaction, and learning flow online survey. Academic performance was operationalized as students' average grade on two semester exams. For data analysis, correlation analysis, multiple regression analysis, hierarchical multiple regression analyses and ANOVA were conducted. The results largely confirmed the hypothesized relations among academic emotion, major satisfaction, learning flow and academic achievement. Positive academic emotion had a correlation with academic achievement (β=.191). Positive emotion had 8.5% explanatory power for academic achievement. Especially, sense of accomplishment had a significant impact on learning performance (β=.265). On the other hand, negative emotion, major satisfaction, and learning flow did not affect academic performance. Also, there were differences in sense of great (F=5.446, p=.001) and interest (F=2.78, p=.043) among positive emotion, boredom (F=3.55, p=.016), anger (F=4.346, p=.006), and petulance (F=3.779, p=.012) among negative emotion by grade. This study suggested that medical students' positive emotion was an important contributor to their academic achievement. At the same time, it is important to consider that some negative emotions can act to increase one’s motivation. Of particular importance is the notion that instructors can and should create learning environment that foster positive emotion for students. In doing so, instructors improve their chances of positively impacting students’ achievement emotions, as well as their subsequent motivation, learning, and performance. This result had an implication for medical educators striving to understand the personal emotional factors that influence learning and performance in medical training.

Keywords: academic achievement, learning emotion, learning flow, major satisfaction

Procedia PDF Downloads 237

2408 Exploring Subjective Simultaneous Mixed Emotion Experiences in Middle Childhood

Authors: Esther Burkitt

Abstract:

Background: Evidence is mounting that mixed emotions can be experienced simultaneously in different ways across the lifespan. Four types of patterns of simultaneously mixed emotions (sequential, prevalent, highly parallel, and inverse types) have been identified in middle childhood and adolescence. Moreover, the recognition of these experiences tends to develop firstly when children consider peers rather than the self. This evidence from children and adolescents is based on examining the presence of experiences specified in adulthood. The present study, therefore, applied an exhaustive coding scheme to investigate whether children experience types of previously unidentified simultaneous mixed emotional experiences. Methodology: One hundred and twenty children (60 girls) aged 7 years 1 month - 9 years 2 months (X=8 years 1 month; SD = 10 months) were recruited from mainstream schools across the UK. Two age groups were formed (youngest, n = 61, 7 years 1 month- 8 years 1 months: oldest, n = 59, 8 years 2 months – 9 years 2 months) and allocated to one of two conditions hearing vignettes describing happy and sad mixed emotion events in age and gender-matched protagonist or themselves. Results: Loglinear analyses identified new types of flexuous, vertical, and other experiences along with established sequential, prevalent, highly parallel, and inverse types of experience. Older children recognised more complex experiences other than the self-condition. Conclusion: Several additional types of simultaneously mixed emotions are recognised in middle childhood. The theoretical relevance of simultaneous mixed emotion processing in childhood is considered, and the potential utility of the findings in emotion assessments is discussed.

Keywords: emotion, childhood, self, other

Procedia PDF Downloads 44

2407 A Systematic Review Emotion Regulation through Music in Children, Adults, and Elderly

Authors: Fabiana Ribeiro, Ana Moreno, Antonio Oliveira, Patricia Oliveira-Silva

Abstract:

Music is present in our daily lives, and to our knowledge music is often used to change the emotions in the listeners. For this reason, the objective of this study was to explore and synthesize results examining the use and effects of music on emotion regulation in children, adults, and elderly, and clarify if the music is effective across ages to promote emotion regulation. A literature search was conducted using ISI Web of Knowledge, Pubmed, PsycINFO, and Scopus, inclusion criteria comprised children, adolescents, young, and old adults, including health population. Articles applying musical intervention, specifically musical listening, and assessing the emotion regulation directly through reports or neurophysiological measures were included in this review. Results showed age differences in the function of musical listening; initially, adolescents revealed age increments in emotional listening compared to children, and young adults in comparison to older adults, in which the first use music aiming to emotion regulation and social connection, while older adults also utilize music as emotion regulation searching for personal growth. Moreover, some of the studies showed that personal characteristics also would determine the efficiency of the emotion regulation strategy. In conclusion, it was observed that music could beneficiate all ages investigated, however, this review detected a necessity to develop adequate paradigms to explore the use of music for emotion regulation.

Keywords: music, emotion, regulation, musical listening

Procedia PDF Downloads 142

2406 Optimized Deep Learning-Based Facial Emotion Recognition System

Authors: Erick C. Valverde, Wansu Lim

Abstract:

Facial emotion recognition (FER) system has been recently developed for more advanced computer vision applications. The ability to identify human emotions would enable smart healthcare facility to diagnose mental health illnesses (e.g., depression and stress) as well as better human social interactions with smart technologies. The FER system involves two steps: 1) face detection task and 2) facial emotion recognition task. It classifies the human expression in various categories such as angry, disgust, fear, happy, sad, surprise, and neutral. This system requires intensive research to address issues with human diversity, various unique human expressions, and variety of human facial features due to age differences. These issues generally affect the ability of the FER system to detect human emotions with high accuracy. Early stage of FER systems used simple supervised classification task algorithms like K-nearest neighbors (KNN) and artificial neural networks (ANN). These conventional FER systems have issues with low accuracy due to its inefficiency to extract significant features of several human emotions. To increase the accuracy of FER systems, deep learning (DL)-based methods, like convolutional neural networks (CNN), are proposed. These methods can find more complex features in the human face by means of the deeper connections within its architectures. However, the inference speed and computational costs of a DL-based FER system is often disregarded in exchange for higher accuracy results. To cope with this drawback, an optimized DL-based FER system is proposed in this study.An extreme version of Inception V3, known as Xception model, is leveraged by applying different network optimization methods. Specifically, network pruning and quantization are used to enable lower computational costs and reduce memory usage, respectively. To support low resource requirements, a 68-landmark face detector from Dlib is used in the early step of the FER system.Furthermore, a DL compiler is utilized to incorporate advanced optimization techniques to the Xception model to improve the inference speed of the FER system. In comparison to VGG-Net and ResNet50, the proposed optimized DL-based FER system experimentally demonstrates the objectives of the network optimization methods used. As a result, the proposed approach can be used to create an efficient and real-time FER system.

Keywords: deep learning, face detection, facial emotion recognition, network optimization methods

Procedia PDF Downloads 75

2405 Virtual Reality Based 3D Video Games and Speech-Lip Synchronization Superseding Algebraic Code Excited Linear Prediction

Authors: P. S. Jagadeesh Kumar, S. Meenakshi Sundaram, Wenli Hu, Yang Yung

Abstract:

In 3D video games, the dominance of production is unceasingly growing with a protruding level of affordability in terms of budget. Afterward, the automation of speech-lip synchronization technique is customarily onerous and has advanced a critical research subject in virtual reality based 3D video games. This paper presents one of these automatic tools, precisely riveted on the synchronization of the speech and the lip movement of the game characters. A robust and precise speech recognition segment that systematized with Algebraic Code Excited Linear Prediction method is developed which unconventionally delivers lip sync results. The Algebraic Code Excited Linear Prediction algorithm is constructed on that used in code-excited linear prediction, but Algebraic Code Excited Linear Prediction codebooks have an explicit algebraic structure levied upon them. This affords a quicker substitute to the software enactments of lip sync algorithms and thus advances the superiority of service factors abridged production cost.

Keywords: algebraic code excited linear prediction, speech-lip synchronization, video games, virtual reality

Procedia PDF Downloads 439

2404 Speech Impact Realization via Manipulative Argumentation Techniques in Modern American Political Discourse

Authors: Zarine Avetisyan

Abstract:

Paper presents the discussion of scholars concerning speech impact, peculiarities of its realization, speech strategies, and techniques. Departing from the viewpoints of many prominent linguists, the paper suggests manipulative argumentation be viewed as a most pervasive speech strategy with a certain set of techniques which are to be found in modern American political discourse. The precedence of their occurrence allows us to regard them as pragmatic patterns of speech impact realization in effective public speaking.

Keywords: speech impact, manipulative argumentation, political discourse, technique

Procedia PDF Downloads 470

2403 Speech Enhancement Using Kalman Filter in Communication

Authors: Eng. Alaa K. Satti Salih

Abstract:

Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.

Keywords: autoregressive process, Kalman filter, Matlab, noise speech

Procedia PDF Downloads 312

2402 Effect of Signal Acquisition Procedure on Imagined Speech Classification Accuracy

Authors: M.R Asghari Bejestani, Gh. R. Mohammad Khani, V.R. Nafisi

Abstract:

Imagined speech recognition is one of the most interesting approaches to BCI development and a lot of works have been done in this area. Many different experiments have been designed and hundreds of combinations of feature extraction methods and classifiers have been examined. Reported classification accuracies range from the chance level to more than 90%. Based on non-stationary nature of brain signals, we have introduced 3 classification modes according to time difference in inter and intra-class samples. The modes can explain the diversity of reported results and predict the range of expected classification accuracies from the brain signal accusation procedure. In this paper, a few samples are illustrated by inspecting results of some previous works.

Keywords: brain computer interface, silent talk, imagined speech, classification, signal processing

Procedia PDF Downloads 119

2401 Makhraj Recognition Using Convolutional Neural Network

Authors: Zan Azma Nasruddin, Irwan Mazlin, Nor Aziah Daud, Fauziah Redzuan, Fariza Hanis Abdul Razak

Abstract:

This paper focuses on a machine learning that learn the correct pronunciation of Makhraj Huroofs. Usually, people need to find an expert to pronounce the Huroof accurately. In this study, the researchers have developed a system that is able to learn the selected Huroofs which are ha, tsa, zho, and dza using the Convolutional Neural Network. The researchers present the chosen type of the CNN architecture to make the system that is able to learn the data (Huroofs) as quick as possible and produces high accuracy during the prediction. The researchers have experimented the system to measure the accuracy and the cross entropy in the training process.

Keywords: convolutional neural network, Makhraj recognition, speech recognition, signal processing, tensorflow

Procedia PDF Downloads 301

2400 Hand Gesture Detection via EmguCV Canny Pruning

Authors: N. N. Mosola, S. J. Molete, L. S. Masoebe, M. Letsae

Abstract:

Hand gesture recognition is a technique used to locate, detect, and recognize a hand gesture. Detection and recognition are concepts of Artificial Intelligence (AI). AI concepts are applicable in Human Computer Interaction (HCI), Expert systems (ES), etc. Hand gesture recognition can be used in sign language interpretation. Sign language is a visual communication tool. This tool is used mostly by deaf societies and those with speech disorder. Communication barriers exist when societies with speech disorder interact with others. This research aims to build a hand recognition system for Lesotho’s Sesotho and English language interpretation. The system will help to bridge the communication problems encountered by the mentioned societies. The system has various processing modules. The modules consist of a hand detection engine, image processing engine, feature extraction, and sign recognition. Detection is a process of identifying an object. The proposed system uses Canny pruning Haar and Haarcascade detection algorithms. Canny pruning implements the Canny edge detection. This is an optimal image processing algorithm. It is used to detect edges of an object. The system employs a skin detection algorithm. The skin detection performs background subtraction, computes the convex hull, and the centroid to assist in the detection process. Recognition is a process of gesture classification. Template matching classifies each hand gesture in real-time. The system was tested using various experiments. The results obtained show that time, distance, and light are factors that affect the rate of detection and ultimately recognition. Detection rate is directly proportional to the distance of the hand from the camera. Different lighting conditions were considered. The more the light intensity, the faster the detection rate. Based on the results obtained from this research, the applied methodologies are efficient and provide a plausible solution towards a light-weight, inexpensive system which can be used for sign language interpretation.

Keywords: canny pruning, hand recognition, machine learning, skin tracking

Procedia PDF Downloads 154

2399 Emotion Regulation Mediates the Relationship between Affective Disposition and Depression

Authors: Valentina Colonnello, Paolo Maria Russo

Abstract:

Studies indicate a link between individual differences in affective disposition and depression, as well as between emotion dysregulation and depression. However, the specific role of emotion dysregulation domains in mediating the relationship between affective disposition and depression remains largely unexplored. In three cross-sectional quantitative studies (total n = 1350), we explored the extent to which specific emotion regulation difficulties mediate the relationship between personal distress disposition (Study 1), separation distress as a primary emotional trait (Study 2), and an insecure, anxious attachment style (Study 3) and depression. Across all studies, we found that the relationship between affective disposition and depression was mediated by difficulties in accessing adaptive emotion regulation strategies. These findings underscore the potential for modifiable abilities that could be targeted through preventive interventions.

Keywords: emotions, mental health, individual traits, personality

Procedia PDF Downloads 20

2398 Freedom of Speech and Involvement in Hatred Speech on Social Media Networks

Authors: Sara Chinnasamy, Michelle Gun, M. Adnan Hashim

Abstract:

Federal Constitution guarantees Malaysians the right to free speech and expression; yet hatred speech can be commonly found on social media platforms such as Facebook, Twitter, and Instagram. In Malaysia social media sphere, most hatred speech involves religion, race and politics. Recent cases of racial attacks on social media have created social tensions among Malaysians. Many Malaysians always argue on their rights to freedom of speech. However, there are laws that limit their expression to the public and protecting social media users from being a victim of hate speech. This paper aims to explore the attitude and involvement of Malaysian netizens towards freedom of speech and hatred speech on social media. It also examines the relationship between involvement in hatred speech among Malaysian netizens and attitude towards freedom of speech. For most Malaysians, practicing total freedom of speech in the open is unthinkable. As a result, the best channel to articulate their feelings and opinions liberally is the internet. With the advent of the internet medium, more and more Malaysians are conveying their viewpoints using the various internet channels although sensitivity of the audience is seldom taken into account. Consequently, this situation has led to pockets of social disharmony among the citizens. Although this unhealthy activity is denounced by the authority, netizens are generally of the view that they have the right to write anything they want. Using the quantitative method, survey was conducted among Malaysians aged between 18 and 50 years who are active social media users. Results from the survey reveal that despite a weak relationship level between hatred speech involvement on social media and attitude towards freedom of speech, the association is still considerably significant. As such, it can be safely presumed that hatred speech on social media occurs due to the freedom of speech that exists by way of social media channels.

Keywords: freedom of speech, hatred speech, social media, Malaysia, netizens

Procedia PDF Downloads 415

2397 The Effectiveness of Dialectical Behavior Therapy in Developing Emotion Regulation Skill for Adolescent with Intellectual Disability

Authors: Shahnaz Safitri, Rose Mini Agoes Salim, Pratiwi Widyasari

Abstract:

Intellectual disability is characterized by significant limitations in intellectual functioning and adaptive behavior that appears before the age of 18 years old. The prominent impacts of intellectual disability in adolescents are failure to establish interpersonal relationships as socially expected and lower academic achievement. Meanwhile, it is known that emotion regulation skills have a role in supporting the functioning of individual, either by nourishing the development of social skills as well as by facilitating the process of learning and adaptation in school. This study aims to look for the effectiveness of Dialectical Behavior Therapy (DBT) in developing emotion regulation skills for adolescents with intellectual disability. DBT's special consideration toward clients’ social environment and their biological condition is foreseen to be the key for developing emotion regulation capacity for subjects with intellectual disability. Through observations on client's behavior, conducted before and after the completion of DBT intervention program, it was found that there is an improvement in client's knowledge and attitudes related to the mastery of emotion regulation skills. In addition, client's consistency to actually practice emotion regulation techniques over time is largely influenced by the support received from the client's social circles.

Keywords: adolescent, dialectical behavior therapy, emotion regulation, intellectual disability

Procedia PDF Downloads 268

2396 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy

Authors: Nazaket Gazieva

Abstract:

Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.

Keywords: phonogram, speech signal, temporal characteristics, fundamental frequency, biometric fingerprints

Procedia PDF Downloads 105

2395 Emotion Detection in a General Human-Robot Interaction System Optimized for Embedded Platforms

Authors: Julio Vega

Abstract:

Expression recognition is a field of Artificial Intelligence whose main objectives are to recognize basic forms of affective expression that appear on people’s faces and contributing to behavioral studies. In this work, a ROS node has been developed that, based on Deep Learning techniques, is capable of detecting the facial expressions of the people that appear in the image. These algorithms were optimized so that they can be executed in real time on an embedded platform. The experiments were carried out in a PC with a USB camera and in a Raspberry Pi 4 with a PiCamera. The final results shows a plausible system, which is capable to work in real time even in an embedded platform.

Keywords: python, low-cost, raspberry pi, emotion detection, human-robot interaction, ROS node

Procedia PDF Downloads 95

2394 Italian Speech Vowels Landmark Detection through the Legacy Tool 'xkl' with Integration of Combined CNNs and RNNs

Authors: Kaleem Kashif, Tayyaba Anam, Yizhi Wu

Abstract:

This paper introduces a methodology for advancing Italian speech vowels landmark detection within the distinctive feature-based speech recognition domain. Leveraging the legacy tool 'xkl' by integrating combined convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the study presents a comprehensive enhancement to the 'xkl' legacy software. This integration incorporates re-assigned spectrogram methodologies, enabling meticulous acoustic analysis. Simultaneously, our proposed model, integrating combined CNNs and RNNs, demonstrates unprecedented precision and robustness in landmark detection. The augmentation of re-assigned spectrogram fusion within the 'xkl' software signifies a meticulous advancement, particularly enhancing precision related to vowel formant estimation. This augmentation catalyzes unparalleled accuracy in landmark detection, resulting in a substantial performance leap compared to conventional methods. The proposed model emerges as a state-of-the-art solution in the distinctive feature-based speech recognition systems domain. In the realm of deep learning, a synergistic integration of combined CNNs and RNNs is introduced, endowed with specialized temporal embeddings, harnessing self-attention mechanisms, and positional embeddings. The proposed model allows it to excel in capturing intricate dependencies within Italian speech vowels, rendering it highly adaptable and sophisticated in the distinctive feature domain. Furthermore, our advanced temporal modeling approach employs Bayesian temporal encoding, refining the measurement of inter-landmark intervals. Comparative analysis against state-of-the-art models reveals a substantial improvement in accuracy, highlighting the robustness and efficacy of the proposed methodology. Upon rigorous testing on a database (LaMIT) speech recorded in a silent room by four Italian native speakers, the landmark detector demonstrates exceptional performance, achieving a 95% true detection rate and a 10% false detection rate. A majority of missed landmarks were observed in proximity to reduced vowels. These promising results underscore the robust identifiability of landmarks within the speech waveform, establishing the feasibility of employing a landmark detector as a front end in a speech recognition system. The synergistic integration of re-assigned spectrogram fusion, CNNs, RNNs, and Bayesian temporal encoding not only signifies a significant advancement in Italian speech vowels landmark detection but also positions the proposed model as a leader in the field. The model offers distinct advantages, including unparalleled accuracy, adaptability, and sophistication, marking a milestone in the intersection of deep learning and distinctive feature-based speech recognition. This work contributes to the broader scientific community by presenting a methodologically rigorous framework for enhancing landmark detection accuracy in Italian speech vowels. The integration of cutting-edge techniques establishes a foundation for future advancements in speech signal processing, emphasizing the potential of the proposed model in practical applications across various domains requiring robust speech recognition systems.

Keywords: landmark detection, acoustic analysis, convolutional neural network, recurrent neural network

Procedia PDF Downloads 12

2393 Biosignal Recognition for Personal Identification

Authors: Hadri Hussain, M.Nasir Ibrahim, Chee-Ming Ting, Mariani Idroas, Fuad Numan, Alias Mohd Noor

Abstract:

A biometric security system has become an important application in client identification and verification system. A conventional biometric system is normally based on unimodal biometric that depends on either behavioural or physiological information for authentication purposes. The behavioural biometric depends on human body biometric signal (such as speech) and biosignal biometric (such as electrocardiogram (ECG) and phonocardiogram or heart sound (HS)). The speech signal is commonly used in a recognition system in biometric, while the ECG and the HS have been used to identify a person’s diseases uniquely related to its cluster. However, the conventional biometric system is liable to spoof attack that will affect the performance of the system. Therefore, a multimodal biometric security system is developed, which is based on biometric signal of ECG, HS, and speech. The biosignal data involved in the biometric system is initially segmented, with each segment Mel Frequency Cepstral Coefficients (MFCC) method is exploited for extracting the feature. The Hidden Markov Model (HMM) is used to model the client and to classify the unknown input with respect to the modal. The recognition system involved training and testing session that is known as client identification (CID). In this project, twenty clients are tested with the developed system. The best overall performance at 44 kHz was 93.92% for ECG and the worst overall performance was ECG at 88.47%. The results were compared to the best overall performance at 44 kHz for (20clients) to increment of clients, which was 90.00% for HS and the worst overall performance falls at ECG at 79.91%. It can be concluded that the difference multimodal biometric has a substantial effect on performance of the biometric system and with the increment of data, even with higher frequency sampling, the performance still decreased slightly as predicted.

Keywords: electrocardiogram, phonocardiogram, hidden markov model, mel frequency cepstral coeffiecients, client identification

Procedia PDF Downloads 250

2392 Handwriting Recognition of Gurmukhi Script: A Survey of Online and Offline Techniques

Authors: Ravneet Kaur

Abstract:

Character recognition is a very interesting area of pattern recognition. From past few decades, an intensive research on character recognition for Roman, Chinese, and Japanese and Indian scripts have been reported. In this paper, a review of Handwritten Character Recognition work on Indian Script Gurmukhi is being highlighted. Most of the published papers were summarized, various methodologies were analysed and their results are reported.

Keywords: Gurmukhi character recognition, online, offline, HCR survey

Procedia PDF Downloads 395

2391 Intervention of Self-Limiting L1 Inner Speech during L2 Presentations: A Study of Bangla-English Bilinguals

Authors: Abdul Wahid

Abstract:

Inner speech, also known as verbal thinking, self-talk or private speech, is characterized by the subjective language experience in the absence of overt or audible speech. It is a psychological form of verbal activity which is being rehearsed without the articulation of any sound wave. In Psychology, self-limiting speech means the type of speech which contains information that inhibits the development of the self. People, in most cases, experience inner speech in their first language. It is very frequent in Bangladesh where the Bangla (L1) speaking students lose track of speech during their presentations in English (L2). This paper investigates into the long pauses (more than 0.4 seconds long) in English (L2) presentations by Bangla speaking students (18-21 year old) and finds the intervention of Bangla (L1) inner speech as one of its causes. The overt speeches of the presenters are placed on Audacity Audio Editing software where the length of pauses are measured in milliseconds. Varieties of inner speech questionnaire (VISQ) have been conducted randomly amongst the participants out of whom 20 were selected who have similar phenomenology of inner speech. They have been interviewed to describe the type and content of the voices that went on in their head during the long pauses. The qualitative interview data are then codified and converted into quantitative data. It was observed that in more than 80% cases students experience self-limiting inner speech/self-talk during their unwanted pauses in L2 presentations.

Keywords: Bangla-English Bilinguals, inner speech, L1 intervention in bilingualism, motor schema, pauses, phonological loop, phonological store, working memory

Procedia PDF Downloads 124

2390 Various Perspectives for the Concept of the Emotion Labor

Authors: Jae Soo Do, Kyoung-Seok Kim

Abstract:

Radical changes in the industrial environment, and spectacular developments of IT have changed the current of managements from people-centered to technology- or IT-centered. Interpersonal emotion exchanges have long become insipid and interactive services have also come as mechanical reactions. This study offers various concepts for the emotional labor based on traditional studies on emotional labor. Especially the present day, on which human emotions are subject to being served as machinized thing, is the time when the study on human emotions comes momentous. Precedent researches on emotional labors commonly and basically dealt with the relationship between the active group who performs actions and the passive group who is done with the action. This study focuses on the passive group and tries to offer a new perspective of 'liquid emotion' as a defence mechanism for the passive group from the external environment. Especially, this addresses a concrete discussion on directions of following studies on the liquid labor as a newly suggested perspective.

Keywords: emotion labor, surface acting, deep acting, liquid emotion

Procedia PDF Downloads 317

2389 Emotion Expression of the Leader and Collective Efficacy: Pride and Guilt

Authors: Hsiu-Tsu Cho

Abstract:

Collective efficacy refers to a group’s sense of its capacity to complete a task successfully or to reach objectives. Little effort has been expended on investigating the relationship between the emotion expression of a leader and collective efficacy. In this study, we examined the impact of the different emotions and emotion expression of a group leader on collective efficacy and explored whether the emotion–expressive effects differed under conditions of negative and positive emotions. A total of 240 undergraduate and graduate students recruited using Facebook and posters at a university participated in this research. The participants were separated randomly into 80 groups of four persons consisting of three participants and a confederate. They were randomly assigned to one of five conditions in a 2 (pride vs. guilt) × 2 (emotion expression of group leader vs. no emotion expression of group leader) factorial design and a control condition. Each four-person group was instructed to get the reward in a group competition of solving the five-disk Tower of Hanoi puzzle and making decisions on an investment case. We surveyed the participants by employing the emotional measure revised from previous researchers and collective efficacy questionnaire on a 5-point scale. To induce an emotion of pride (or guilt), the experimenter announced whether the group performance was good enough to have a chance of getting the reward (ranking the top or bottom 20% among all groups) after group task. The leader (confederate) could either express or not express a feeling of pride (or guilt) following the instruction according to the assigned condition. To check manipulation of emotion, we added a control condition under which the experimenter revealed no results regarding group performance in maintaining a neutral emotion. One-way ANOVAs and post hoc pairwise comparisons among the three emotion conditions (pride, guilt, and control condition) involved assigning pride and guilt scores (pride: F(1,75) = 32.41, p < .001; guilt: F(1,75) = 6.75, p < .05). The results indicated that manipulations of emotion were successful. A two-way between-measures ANOVA was conducted to examine the predictions of the main effects of emotion types and emotion expression as well as the interaction effect of these two variables on collective efficacy. The experimental findings suggest that pride did not affect collective efficacy (F(1,60) = 1.90, ns.) more than guilt did and that the group leader did not motivate collective efficacy regardless of whether he or she expressed emotion (F(1,60) = .89, ns.). However, the interaction effect of emotion types and emotion expression was statistically significant (F(1,60) = 4.27, p < .05, ω2 = .066); the effects accounted for 6.6% of the variance. Additional results revealed that, under the pride condition, the leader enhanced group efficacy when expressing emotion, whereas, under the guilt condition, an expression of emotion could reduce collective efficacy. Overall, these findings challenge the assumption that the effect of expression emotion are the same on all emotions and suggest that a leader should be cautious when expressing negative emotions toward a group to avoid reducing group effectiveness.

Keywords: collective efficacy, group leader, emotion expression, pride, guilty

Procedia PDF Downloads 302

2388 Intertextuality in Choreography: Investigation of Text and Movements in Making Choreography

Authors: Muhammad Fairul Azreen Mohd Zahid

Abstract:

Speech, text, and movement intensify aspects of creating choreography by connecting with emotional entanglements, tradition, literature, and other texts. This research focuses on the practice as research that will prioritise the choreography process as an inquiry approach. With the driven context, the study intervenes in critical conjunctions of choreographic theory, bringing together new reflections on the moving body, spaces of action, as well as intertextuality between text and movements in making choreography. Throughout the process, the researcher will introduce the level of deliberation from speech through movements and text to express emotion within a narrative context of an “illocutionary act.” This practice as research will produce a different meaning from the “utterance text” to “utterance movements” in the perspective of speech acts theory by J.L Austin based on fragmented text from “pidato adat” which has been used as opening speech in Randai. Looking at the theory of deconstruction by Jacque Derrida also will give a different meaning from the text. Nevertheless, the process of creating the choreography will also help to lay the basic normative structure implicit in “constative” (statement text/movement) and “performative” (command text/movement). Through this process, the researcher will also look at several methods of using text from two works by Joseph Gonzales, “Becoming King-The Pakyung Revisited” and Crystal Pite's “The Statement,” as references to produce different methods in making choreography. The perspective from the semiotic foundation will support how occurrences within dance discourses as texts through a semiotic lens. The method used in this research is qualitative, which includes an interview and simulation of the concept to get an outcome.

Keywords: intertextuality, choreography, speech act, performative, deconstruction

Procedia PDF Downloads 64

2387 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text

Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert

Abstract:

This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.

Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies

Procedia PDF Downloads 134

2386 Performance Evaluation of Acoustic-Spectrographic Voice Identification Method in Native and Non-Native Speech

Authors: E. Krasnova, E. Bulgakova, V. Shchemelinin

Abstract:

The paper deals with acoustic-spectrographic voice identification method in terms of its performance in non-native language speech. Performance evaluation is conducted by comparing the result of the analysis of recordings containing native language speech with recordings that contain foreign language speech. Our research is based on Tajik and Russian speech of Tajik native speakers due to the character of the criminal situation with drug trafficking. We propose a pilot experiment that represents a primary attempt enter the field.

Keywords: speaker identification, acoustic-spectrographic method, non-native speech, performance evaluation

Procedia PDF Downloads 413

2385 Automatic Segmentation of the Clean Speech Signal

Authors: M. A. Ben Messaoud, A. Bouzid, N. Ellouze

Abstract:

Speech Segmentation is the measure of the change point detection for partitioning an input speech signal into regions each of which accords to only one speaker. In this paper, we apply two features based on multi-scale product (MP) of the clean speech, namely the spectral centroid of MP, and the zero crossings rate of MP. We focus on multi-scale product analysis as an important tool for segmentation extraction. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. We have evaluated our method on the Keele database. Experimental results show the effectiveness of our method presenting a good performance. It shows that the two simple features can find word boundaries, and extracted the segments of the clean speech.

Keywords: multiscale product, spectral centroid, speech segmentation, zero crossings rate

Procedia PDF Downloads 466