Search results for: character recognition
2136 Global Based Histogram for 3D Object Recognition
Authors: Somar Boubou, Tatsuo Narikiyo, Michihiro Kawanishi
Abstract:
In this work, we address the problem of 3D object recognition with depth sensors such as Kinect or Structure sensor. Compared with traditional approaches based on local descriptors, which depends on local information around the object key points, we propose a global features based descriptor. Proposed descriptor, which we name as Differential Histogram of Normal Vectors (DHONV), is designed particularly to capture the surface geometric characteristics of the 3D objects represented by depth images. We describe the 3D surface of an object in each frame using a 2D spatial histogram capturing the normalized distribution of differential angles of the surface normal vectors. The object recognition experiments on the benchmark RGB-D object dataset and a self-collected dataset show that our proposed descriptor outperforms two others descriptors based on spin-images and histogram of normal vectors with linear-SVM classifier.Keywords: vision in control, robotics, histogram, differential histogram of normal vectors
Procedia PDF Downloads 2782135 Speech Emotion Recognition with Bi-GRU and Self-Attention based Feature Representation
Authors: Bubai Maji, Monorama Swain
Abstract:
Speech is considered an essential and most natural medium for the interaction between machines and humans. However, extracting effective features for speech emotion recognition (SER) is remains challenging. The present studies show that the temporal information captured but high-level temporal-feature learning is yet to be investigated. In this paper, we present an efficient novel method using the Self-attention (SA) mechanism in a combination of Convolutional Neural Network (CNN) and Bi-directional Gated Recurrent Unit (Bi-GRU) network to learn high-level temporal-feature. In order to further enhance the representation of the high-level temporal-feature, we integrate a Bi-GRU output with learnable weights features by SA, and improve the performance. We evaluate our proposed method on our created SITB-OSED and IEMOCAP databases. We report that the experimental results of our proposed method achieve state-of-the-art performance on both databases.Keywords: Bi-GRU, 1D-CNNs, self-attention, speech emotion recognition
Procedia PDF Downloads 1122134 Wavelet Based Signal Processing for Fault Location in Airplane Cable
Authors: Reza Rezaeipour Honarmandzad
Abstract:
Wavelet analysis is an exciting method for solving difficult problems in mathematics, physics, and engineering, with modern applications as diverse as wave propagation, data compression, signal processing, image processing, pattern recognition, etc. Wavelets allow complex information such as signals, images and patterns to be decomposed into elementary forms at different positions and scales and subsequently reconstructed with high precision. In this paper a wavelet-based signal processing algorithm for airplane cable fault location is proposed. An orthogonal discrete wavelet decomposition and reconstruction algorithm is used to eliminate the noise in the aircraft cable fault signal. The experiment result has shown that the character of emission pulse and reflect pulse used to test the aircraft cable fault point are reserved and the high-frequency noise are eliminated by means of the proposed algorithm in this paper.Keywords: wavelet analysis, signal processing, orthogonal discrete wavelet, noise, aircraft cable fault signal
Procedia PDF Downloads 5232133 Morphology of Cartographic Words: A Perspective from Chinese Characters
Authors: Xinyu Gong, Zhilin Li, Xintao Liu
Abstract:
Maps are a means of communication. Cartographic language involves established theories of natural language for understanding maps. “Cartographic words’, or “map symbols”, are crucial elements of cartographic language. Personalized mapping is increasingly popular, with growing demands for customized map-making by the general public. Automated symbol-making and customization play a key role in personalized mapping. However, formal representations for the automated construction of map symbols are still lacking. In natural language, the process of word and sentence construction can be formalized. Through the analogy between natural language and graphical language, formal representations of natural language construction can be used as a reference for constructing cartographic language. We selected Chinese character structures (i.e., SKeywords: personalized mapping, Chinese character, cartographic language, map symbols
Procedia PDF Downloads 1752132 Naïve Bayes: A Classical Approach for the Epileptic Seizures Recognition
Authors: Bhaveek Maini, Sanjay Dhanka, Surita Maini
Abstract:
Electroencephalography (EEG) is used to classify several epileptic seizures worldwide. It is a very crucial task for the neurologist to identify the epileptic seizure with manual EEG analysis, as it takes lots of effort and time. Human error is always at high risk in EEG, as acquiring signals needs manual intervention. Disease diagnosis using machine learning (ML) has continuously been explored since its inception. Moreover, where a large number of datasets have to be analyzed, ML is acting as a boon for doctors. In this research paper, authors proposed two different ML models, i.e., logistic regression (LR) and Naïve Bayes (NB), to predict epileptic seizures based on general parameters. These two techniques are applied to the epileptic seizures recognition dataset, available on the UCI ML repository. The algorithms are implemented on an 80:20 train test ratio (80% for training and 20% for testing), and the performance of the model was validated by 10-fold cross-validation. The proposed study has claimed accuracy of 81.87% and 95.49% for LR and NB, respectively.Keywords: epileptic seizure recognition, logistic regression, Naïve Bayes, machine learning
Procedia PDF Downloads 582131 Recognition of Noisy Words Using the Time Delay Neural Networks Approach
Authors: Khenfer-Koummich Fatima, Mesbahi Larbi, Hendel Fatiha
Abstract:
This paper presents a recognition system for isolated words like robot commands. It’s carried out by Time Delay Neural Networks; TDNN. To teleoperate a robot for specific tasks as turn, close, etc… In industrial environment and taking into account the noise coming from the machine. The choice of TDNN is based on its generalization in terms of accuracy, in more it acts as a filter that allows the passage of certain desirable frequency characteristics of speech; the goal is to determine the parameters of this filter for making an adaptable system to the variability of speech signal and to noise especially, for this the back propagation technique was used in learning phase. The approach was applied on commands pronounced in two languages separately: The French and Arabic. The results for two test bases of 300 spoken words for each one are 87%, 97.6% in neutral environment and 77.67%, 92.67% when the white Gaussian noisy was added with a SNR of 35 dB.Keywords: TDNN, neural networks, noise, speech recognition
Procedia PDF Downloads 2882130 Usability Testing on Information Design through Single-Lens Wearable Device
Authors: Jae-Hyun Choi, Sung-Soo Bae, Sangyoung Yoon, Hong-Ku Yun, Jiyoung Kwahk
Abstract:
This study was conducted to investigate the effect of ocular dominance on recognition performance using a single-lens smart display designed for cycling. A total of 36 bicycle riders who have been cycling consistently were recruited and participated in the experiment. The participants were asked to perform tasks riding a bicycle on a stationary stand for safety reasons. Independent variables of interest include ocular dominance, bike usage, age group, and information layout. Recognition time (i.e., the time required to identify specific information measured with an eye-tracker), error rate (i.e. false answer or failure to identify the information in 5 seconds), and user preference scores were measured and statistical tests were conducted to identify significant results. Recognition time and error ratio showed significant difference by ocular dominance factor, while the preference score did not. Recognition time was faster when the single-lens see-through display on the dominant eye (average 1.12sec) than on the non-dominant eye (average 1.38sec). Error ratio of the information recognition task was significantly lower when the see-through display was worn on the dominant eye (average 4.86%) than on the non-dominant eye (average 14.04%). The interaction effect of ocular dominance and age group was significant with respect to recognition time and error ratio. The recognition time of the users in their 40s was significantly longer than the other age groups when the display was placed on the non-dominant eye, while no difference was observed on the dominant eye. Error ratio also showed the same pattern. Although no difference was observed for the main effect of ocular dominance and bike usage, the interaction effect between the two variables was significant with respect to preference score. Preference score of daily bike users was higher when the display was placed on the dominant eye, whereas participants who use bikes for leisure purposes showed the opposite preference patterns. It was found more effective and efficient to wear a see-through display on the dominant eye than on the non-dominant eye, although user preference was not affected by ocular dominance. It is recommended to wear a see-through display on the dominant eye since it is safer by helping the user recognize the presented information faster and more accurately, even if the user may not notice the difference.Keywords: eye tracking, information recognition, ocular dominance, smart headware, wearable device
Procedia PDF Downloads 2712129 Robust Recognition of Locomotion Patterns via Data-Driven Machine Learning in the Cloud Environment
Authors: Shinoy Vengaramkode Bhaskaran, Kaushik Sathupadi, Sandesh Achar
Abstract:
Human locomotion recognition is important in a variety of sectors, such as robotics, security, healthcare, fitness tracking and cloud computing. With the increasing pervasiveness of peripheral devices, particularly Inertial Measurement Units (IMUs) sensors, researchers have attempted to exploit these advancements in order to precisely and efficiently identify and categorize human activities. This research paper introduces a state-of-the-art methodology for the recognition of human locomotion patterns in a cloud environment. The methodology is based on a publicly available benchmark dataset. The investigation implements a denoising and windowing strategy to deal with the unprocessed data. Next, feature extraction is adopted to abstract the main cues from the data. The SelectKBest strategy is used to abstract optimal features from the data. Furthermore, state-of-the-art ML classifiers are used to evaluate the performance of the system, including logistic regression, random forest, gradient boosting and SVM have been investigated to accomplish precise locomotion classification. Finally, a detailed comparative analysis of results is presented to reveal the performance of recognition models.Keywords: artificial intelligence, cloud computing, IoT, human locomotion, gradient boosting, random forest, neural networks, body-worn sensors
Procedia PDF Downloads 92128 Effects of Oxytocin on Neural Response to Facial Emotion Recognition in Schizophrenia
Authors: Avyarthana Dey, Naren P. Rao, Arpitha Jacob, Chaitra V. Hiremath, Shivarama Varambally, Ganesan Venkatasubramanian, Rose Dawn Bharath, Bangalore N. Gangadhar
Abstract:
Objective: Impaired facial emotion recognition is widely reported in schizophrenia. Neuropeptide oxytocin is known to modulate brain regions involved in facial emotion recognition, namely amygdala, in healthy volunteers. However, its effect on facial emotion recognition deficits seen in schizophrenia is not well explored. In this study, we examined the effect of intranasal OXT on processing facial emotions and its neural correlates in patients with schizophrenia. Method: 12 male patients (age= 31.08±7.61 years, education= 14.50±2.20 years) participated in this single-blind, counterbalanced functional magnetic resonance imaging (fMRI) study. All participants underwent three fMRI scans; one at baseline, one each after single dose 24IU intranasal OXT and intranasal placebo. The order of administration of OXT and placebo were counterbalanced and subject was blind to the drug administered. Participants performed a facial emotion recognition task presented in a block design with six alternating blocks of faces and shapes. The faces depicted happy, angry or fearful emotions. The images were preprocessed and analyzed using SPM 12. First level contrasts comparing recognition of emotions and shapes were modelled at individual subject level. A group level analysis was performed using the contrasts generated at the first level to compare the effects of intranasal OXT and placebo. The results were thresholded at uncorrected p < 0.001 with a cluster size of 6 voxels. Neuropeptide oxytocin is known to modulate brain regions involved in facial emotion recognition, namely amygdala, in healthy volunteers. Results: Compared to placebo, intranasal OXT attenuated activity in inferior temporal, fusiform and parahippocampal gyri (BA 20), premotor cortex (BA 6), middle frontal gyrus (BA 10) and anterior cingulate gyrus (BA 24) and enhanced activity in the middle occipital gyrus (BA 18), inferior occipital gyrus (BA 19), and superior temporal gyrus (BA 22). There were no significant differences between the conditions on the accuracy scores of emotion recognition between baseline (77.3±18.38), oxytocin (82.63 ± 10.92) or Placebo (76.62 ± 22.67). Conclusion: Our results provide further evidence to the modulatory effect of oxytocin in patients with schizophrenia. Single dose oxytocin resulted in significant changes in activity of brain regions involved in emotion processing. Future studies need to examine the effectiveness of long-term treatment with OXT for emotion recognition deficits in patients with schizophrenia.Keywords: recognition, functional connectivity, oxytocin, schizophrenia, social cognition
Procedia PDF Downloads 2192127 A Smart Visitors’ Notification System with Automatic Secure Door Lock Using Mobile Communication Technology
Authors: Rabail Shafique Satti, Sidra Ejaz, Madiha Arshad, Marwa Khalid, Sadia Majeed
Abstract:
The paper presents the development of an automated security system to automate the entry of visitors, providing more flexibility of managing their record and securing homes or workplaces. Face recognition is part of this system to authenticate the visitors. A cost effective and SMS based door security module has been developed and integrated with the GSM network and made part of this system to allow communication between system and owner. This system functions in real time as when the visitor’s arrived it will detect and recognizes his face and on the result of face recognition process it will open the door for authorized visitors or notifies and allows the owner’s to take further action in case of unauthorized visitor. The proposed system is developed and it is successfully ensuring security, managing records and operating gate without physical interaction of owner.Keywords: SMS, e-mail, GSM modem, authenticate, face recognition, authorized
Procedia PDF Downloads 7882126 Supporting a Moral Growth Mindset Among College Students
Authors: Kate Allman, Heather Maranges, Elise Dykhuis
Abstract:
Moral Growth Mindset (MGM) is the belief that one has the capacity to become a more moral person, as opposed to a fixed conception of one’s moral ability and capacity (Han et al., 2018). Building from Dweck’s work in incremental implicit theories of intelligence (2008), Moral Growth Mindset (Han et al., 2020) extends growth mindsets into the moral dimension. The concept of MGM has the potential to help researchers understand how both mindsets and interventions can impact character development, and it has even been shown to have connections to voluntary service engagement (Han et al., 2018). Understanding the contexts in which MGM might be cultivated could help to promote the further cultivation of character, in addition to prosocial behaviors like service engagement, which may, in turn, promote larger scale engagement in social justice-oriented thoughts, feelings, and behaviors. In particular, college may be a place to intentionally cultivate a growth mindset toward moral capacities, given the unique developmental and maturational components of the college experience, including contextual opportunity (Lapsley & Narvaez, 2006) and independence requiring the constant consideration, revision, and internalization of personal values (Lapsley & Woodbury, 2016). In a semester-long, quasi-experimental study, we examined the impact of a pedagogical approach designed to cultivate college student character development on participants’ MGM. With an intervention (n=69) and a control group (n=97; Pre-course: 27% Men; 66% Women; 68% White; 18% Asian; 2% Black; <1% Hispanic/Latino), we investigated whether college courses that intentionally incorporate character education pedagogy (Lamb, Brant, Brooks, 2021) affect a variety of psychosocial variables associated with moral thoughts, feelings, identity, and behavior (e.g. moral growth mindset, honesty, compassion, etc.). The intervention group consisted of 69 undergraduate students (Pre-course: 40% Men; 52% Women; 68% White; 10.5% Black; 7.4% Asian; 4.2% Hispanic/Latino) that voluntarily enrolled in five undergraduate courses that encouraged students to engage with key concepts and methods of character development through the application of research-based strategies and personal reflection on goals and experiences. Moral Growth Mindset was measured using the four-item Moral Growth Mindset scale (Han et al., 2020), with items such as You can improve your basic morals and character considerably on a six-point Likert scale from 1 (strongly disagree) to 6 (strongly agree). Higher scores of MGM indicate a stronger belief that one can become a more moral person with personal effort. Reliability at Time 1 was Cronbach’s ɑ= .833, and at Time 2 Cronbach’s ɑ= .772. An Analysis of Covariance (ANCOVA) was conducted to explore whether post-course MGM scores were different between the intervention and control when controlling for pre-course MGM scores. The ANCOVA indicated significant differences in MGM between groups post-course, F(1,163) = 8.073, p = .005, R² = .11, where descriptive statistics indicate that intervention scores were higher than the control group at post-course. Results indicate that intentional character development pedagogy can be leveraged to support the development of Moral Growth Mindset and related capacities in undergraduate settings.Keywords: moral personality, character education, incremental theories of personality, growth mindset
Procedia PDF Downloads 1452125 The Influence of Job Recognition and Job Motivation on Organizational Commitment in Public Sector: The Mediation Role of Employee Engagement
Authors: Muhammad Tayyab, Saba Saira
Abstract:
It is an established fact that organizations across the globe consider employees as their assets and try to advance their well-being. However, the local firms of developing countries are mostly profit oriented and do not have much concern about their employees’ engagement or commitment. Like other developing countries, the local organizations of Pakistan are also less concerned about the well-being of their employees. Especially public sector organizations lack concern regarding engagement, satisfaction or commitment of the employees. Therefore, this study aimed at investigating the impact of job recognition and job motivation on organizational commitment in the mediation role of employee engagement. The data were collected from land record officers of board of revenue, Punjab, Pakistan. Structured questionnaire was used to collect data through physically visiting land record officers and also through the internet. A total of 318 land record officers’ responses were finalized to perform data analysis. The data were analyzed through confirmatory factor analysis and structural equation modeling technique. The findings revealed that job recognition and job motivation have direct as well as indirect positive and significant impact on organizational commitment. The limitations, practical implications and future research indications are also explained.Keywords: job motivation, job recognition, employee engagement, employee commitment, public sector, land record officers
Procedia PDF Downloads 1322124 Digital Transformation as the Subject of the Knowledge Model of the Discursive Space
Authors: Rafal Maciag
Abstract:
Due to the development of the current civilization, one must create suitable models of its pervasive massive phenomena. Such a phenomenon is the digital transformation, which has a substantial number of disciplined, methodical interpretations forming the diversified reflection. This reflection could be understood pragmatically as the current temporal, a local differential state of knowledge. The model of the discursive space is proposed as a model for the analysis and description of this knowledge. Discursive space is understood as an autonomous multidimensional space where separate discourses traverse specific trajectories of what can be presented in multidimensional parallel coordinate system. Discursive space built on the world of facts preserves the complex character of that world. Digital transformation as a discursive space has a relativistic character that means that at the same time, it is created by the dynamic discourses and these discourses are molded by the shape of this space.Keywords: complexity, digital transformation, discourse, discursive space, knowledge
Procedia PDF Downloads 1912123 An Automatic Speech Recognition of Conversational Telephone Speech in Malay Language
Authors: M. Draman, S. Z. Muhamad Yassin, M. S. Alias, Z. Lambak, M. I. Zulkifli, S. N. Padhi, K. N. Baharim, F. Maskuriy, A. I. A. Rahim
Abstract:
The performance of Malay automatic speech recognition (ASR) system for the call centre environment is presented. The system utilizes Kaldi toolkit as the platform to the entire library and algorithm used in performing the ASR task. The acoustic model implemented in this system uses a deep neural network (DNN) method to model the acoustic signal and the standard (n-gram) model for language modelling. With 80 hours of training data from the call centre recordings, the ASR system can achieve 72% of accuracy that corresponds to 28% of word error rate (WER). The testing was done using 20 hours of audio data. Despite the implementation of DNN, the system shows a low accuracy owing to the varieties of noises, accent and dialect that typically occurs in Malaysian call centre environment. This significant variation of speakers is reflected by the large standard deviation of the average word error rate (WERav) (i.e., ~ 10%). It is observed that the lowest WER (13.8%) was obtained from recording sample with a standard Malay dialect (central Malaysia) of native speaker as compared to 49% of the sample with the highest WER that contains conversation of the speaker that uses non-standard Malay dialect.Keywords: conversational speech recognition, deep neural network, Malay language, speech recognition
Procedia PDF Downloads 3212122 Local Image Features Emerging from Brain Inspired Multi-Layer Neural Network
Authors: Hui Wei, Zheng Dong
Abstract:
Object recognition has long been a challenging task in computer vision. Yet the human brain, with the ability to rapidly and accurately recognize visual stimuli, manages this task effortlessly. In the past decades, advances in neuroscience have revealed some neural mechanisms underlying visual processing. In this paper, we present a novel model inspired by the visual pathway in primate brains. This multi-layer neural network model imitates the hierarchical convergent processing mechanism in the visual pathway. We show that local image features generated by this model exhibit robust discrimination and even better generalization ability compared with some existing image descriptors. We also demonstrate the application of this model in an object recognition task on image data sets. The result provides strong support for the potential of this model.Keywords: biological model, feature extraction, multi-layer neural network, object recognition
Procedia PDF Downloads 5412121 Development of a Sequential Multimodal Biometric System for Web-Based Physical Access Control into a Security Safe
Authors: Babatunde Olumide Olawale, Oyebode Olumide Oyediran
Abstract:
The security safe is a place or building where classified document and precious items are kept. To prevent unauthorised persons from gaining access to this safe a lot of technologies had been used. But frequent reports of an unauthorised person gaining access into security safes with the aim of removing document and items from the safes are pointers to the fact that there is still security gap in the recent technologies used as access control for the security safe. In this paper we try to solve this problem by developing a multimodal biometric system for physical access control into a security safe using face and voice recognition. The safe is accessed by the combination of face and speech pattern recognition and also in that sequential order. User authentication is achieved through the use of camera/sensor unit and a microphone unit both attached to the door of the safe. The user face was captured by the camera/sensor while the speech was captured by the use of the microphone unit. The Scale Invariance Feature Transform (SIFT) algorithm was used to train images to form templates for the face recognition system while the Mel-Frequency Cepitral Coefficients (MFCC) algorithm was used to train the speech recognition system to recognise authorise user’s speech. Both algorithms were hosted in two separate web based servers and for automatic analysis of our work; our developed system was simulated in a MATLAB environment. The results obtained shows that the developed system was able to give access to authorise users while declining unauthorised person access to the security safe.Keywords: access control, multimodal biometrics, pattern recognition, security safe
Procedia PDF Downloads 3322120 The Reflexive Interaction in Group Formal Practices: The Question of Criteria and Instruments for the Character-Skills Evaluation
Authors: Sara Nosari
Abstract:
In the research field on adult education, the learning development project followed different itineraries: recently it has promoted adult transformation by practices focused on the reflexive oriented interaction. This perspective, that connects life stories and life-based methods, characterizes a transformative space between formal and informal education. Within this framework, in the Nursing Degree Courses of Turin University, it has been discussed and realized a formal reflexive path on the care work professional identity through group practices. This path compared the future care professionals with possible experiences staged by texts used with the function of a pre-tests: these texts, setting up real or believable professional situations, had the task to start a reflection on the different 'elements' of care work professional life (relationship, educational character of relationship, relationship between different care roles; or even human identity, aims and ultimate aim of care, …). The learning transformative aspect of this kind of experience-test is that it is impossible to anticipate the process or the conclusion of reflexion because they depend on two main conditions: the personal sensitivity and the specific situation. The narrated experience is not a device, it does not include any tricks to understand the answering advance; the text is not aimed at deepening the knowledge, but at being an active and creative force which takes the group to compare with problematic figures. In fact, the experience-text does not have the purpose to explain but to problematize: it creates a space of suspension to live for questioning, for discussing, for researching, for deciding. It creates a space 'open' and 'in connection' where each one, in comparing with others, has the possibility to build his/her position. In this space, everyone has to possibility to expose his/her own argumentations and to be aware of the others emerged points of view, aiming to research and find the own personal position. However, to define his/her position, it is necessary to learn to exercise character skills (conscientiousness, motivation, creativity, critical thinking, …): if these not-cognitive skills have an undisputed evidence, less evident is how to value them. The paper will reflect on the epistemological limits and possibility to 'measure' character skills, suggesting some evaluation criteria.Keywords: transformative learning, educational role, formal/informal education, character-skills
Procedia PDF Downloads 1932119 Getting Back Out There Looking like That: A Visual Critique of Rebecca Welton’s Costuming in Reference to Female Representation in Television
Authors: Abigail R. Gardner
Abstract:
With the rise of big budget television comes a demand for more nuanced characters. However, female characters are often underdeveloped, especially those who do not fit neatly into societal norms. This study examines how Ted Lasso’s Rebecca Welton challenges this idea by using her on-screen fashion to mirror her motivations and character development. Through detailed analysis, this research explores how Rebecca’s wardrobe adds depth to her character, contrasting traditional strategies of costuming female characters in mainstream movies and television. While women, especially older women, are getting more screen time, very few have been given a wardrobe to reflect their dynamic characters. Rebecca’s costumes represent a form of visual storytelling typically reserved for film, but with the rise of single-camera television, there is an opportunity to redefine the relationship between women and fashion on screen.Keywords: costume design, gender and media, visual storytelling, women in television
Procedia PDF Downloads 162118 The Combination of the Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), JITTER and SHIMMER Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech
Authors: Brahim-Fares Zaidi, Malika Boudraa, Sid-Ahmed Selouani
Abstract:
Our work aims to improve our Automatic Recognition System for Dysarthria Speech (ARSDS) based on the Hidden Models of Markov (HMM) and the Hidden Markov Model Toolkit (HTK) to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients (MFCC's) and Perceptual Linear Prediction (PLP's) and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers.Keywords: hidden Markov model toolkit (HTK), hidden models of Markov (HMM), Mel-frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP’s)
Procedia PDF Downloads 1602117 A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children
Authors: Mumtaz Begum Mustafa, Siti Salwah Salim, Feizal Dani Rahman
Abstract:
Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.Keywords: Automatic Speech Recognition System, children speech, adaptation, Malay
Procedia PDF Downloads 3962116 Facial Expression Phoenix (FePh): An Annotated Sequenced Dataset for Facial and Emotion-Specified Expressions in Sign Language
Authors: Marie Alaghband, Niloofar Yousefi, Ivan Garibay
Abstract:
Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this manuscript, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over 3000 facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of "sad", "surprise", "fear", "angry", "neutral", "disgust", and "happy". We also considered the "None" class if the image’s facial expression could not be described by any of the aforementioned emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems.Keywords: annotated facial expression dataset, gesture recognition, sequenced facial expression dataset, sign language recognition
Procedia PDF Downloads 1572115 Lip Localization Technique for Myanmar Consonants Recognition Based on Lip Movements
Authors: Thein Thein, Kalyar Myo San
Abstract:
Lip reading system is one of the different supportive technologies for hearing impaired, or elderly people or non-native speakers. For normal hearing persons in noisy environments or in conditions where the audio signal is not available, lip reading techniques can be used to increase their understanding of spoken language. Hearing impaired persons have used lip reading techniques as important tools to find out what was said by other people without hearing voice. Thus, visual speech information is important and become active research area. Using visual information from lip movements can improve the accuracy and robustness of a speech recognition system and the need for lip reading system is ever increasing for every language. However, the recognition of lip movement is a difficult task because of the region of interest (ROI) is nonlinear and noisy. Therefore, this paper proposes method to detect the accurate lips shape and to localize lip movement towards automatic lip tracking by using the combination of Otsu global thresholding technique and Moore Neighborhood Tracing Algorithm. Proposed method shows how accurate lip localization and tracking which is useful for speech recognition. In this work of study and experiments will be carried out the automatic lip localizing the lip shape for Myanmar consonants using the only visual information from lip movements which is useful for visual speech of Myanmar languages.Keywords: lip reading, lip localization, lip tracking, Moore neighborhood tracing algorithm
Procedia PDF Downloads 3522114 Using AI Based Software as an Assessment Aid for University Engineering Assignments
Authors: Waleed Al-Nuaimy, Luke Anastassiou, Manjinder Kainth
Abstract:
As the process of teaching has evolved with the advent of new technologies over the ages, so has the process of learning. Educators have perpetually found themselves on the lookout for new technology-enhanced methods of teaching in order to increase learning efficiency and decrease ever expanding workloads. Shortly after the invention of the internet, web-based learning started to pick up in the late 1990s and educators quickly found that the process of providing learning material and marking assignments could change thanks to the connectivity offered by the internet. With the creation of early web-based virtual learning environments (VLEs) such as SPIDER and Blackboard, it soon became apparent that VLEs resulted in higher reported computer self-efficacy among students, but at the cost of students being less satisfied with the learning process . It may be argued that the impersonal nature of VLEs, and their limited functionality may have been the leading factors contributing to this reported dissatisfaction. To this day, often faced with the prospects of assigning colossal engineering cohorts their homework and assessments, educators may frequently choose optimally curated assessment formats, such as multiple-choice quizzes and numerical answer input boxes, so that automated grading software embedded in the VLEs can save time and mark student submissions instantaneously. A crucial skill that is meant to be learnt during most science and engineering undergraduate degrees is gaining the confidence in using, solving and deriving mathematical equations. Equations underpin a significant portion of the topics taught in many STEM subjects, and it is in homework assignments and assessments that this understanding is tested. It is not hard to see that this can become challenging if the majority of assignment formats students are engaging with are multiple-choice questions, and educators end up with a reduced perspective of their students’ ability to manipulate equations. Artificial intelligence (AI) has in recent times been shown to be an important consideration for many technologies. In our paper, we explore the use of new AI based software designed to work in conjunction with current VLEs. Using our experience with the software, we discuss its potential to solve a selection of problems ranging from impersonality to the reduction of educator workloads by speeding up the marking process. We examine the software’s potential to increase learning efficiency through its features which claim to allow more customized and higher-quality feedback. We investigate the usability of features allowing students to input equation derivations in a range of different forms, and discuss relevant observations associated with these input methods. Furthermore, we make ethical considerations and discuss potential drawbacks to the software, including the extent to which optical character recognition (OCR) could play a part in the perpetuation of errors and create disagreements between student intent and their submitted assignment answers. It is the intention of the authors that this study will be useful as an example of the implementation of AI in a practical assessment scenario insofar as serving as a springboard for further considerations and studies that utilise AI in the setting and marking of science and engineering assignments.Keywords: engineering education, assessment, artificial intelligence, optical character recognition (OCR)
Procedia PDF Downloads 1222113 Fusion of Finger Inner Knuckle Print and Hand Geometry Features to Enhance the Performance of Biometric Verification System
Authors: M. L. Anitha, K. A. Radhakrishna Rao
Abstract:
With the advent of modern computing technology, there is an increased demand for developing recognition systems that have the capability of verifying the identity of individuals. Recognition systems are required by several civilian and commercial applications for providing access to secured resources. Traditional recognition systems which are based on physical identities are not sufficiently reliable to satisfy the security requirements due to the use of several advances of forgery and identity impersonation methods. Recognizing individuals based on his/her unique physiological characteristics known as biometric traits is a reliable technique, since these traits are not transferable and they cannot be stolen or lost. Since the performance of biometric based recognition system depends on the particular trait that is utilized, the present work proposes a fusion approach which combines Inner knuckle print (IKP) trait of the middle, ring and index fingers with the geometrical features of hand. The hand image captured from a digital camera is preprocessed to find finger IKP as region of interest (ROI) and hand geometry features. Geometrical features are represented as the distances between different key points and IKP features are extracted by applying local binary pattern descriptor on the IKP ROI. The decision level AND fusion was adopted, which has shown improvement in performance of the combined scheme. The proposed approach is tested on the database collected at our institute. Proposed approach is of significance since both hand geometry and IKP features can be extracted from the palm region of the hand. The fusion of these features yields a false acceptance rate of 0.75%, false rejection rate of 0.86% for verification tests conducted, which is less when compared to the results obtained using individual traits. The results obtained confirm the usefulness of proposed approach and suitability of the selected features for developing biometric based recognition system based on features from palmar region of hand.Keywords: biometrics, hand geometry features, inner knuckle print, recognition
Procedia PDF Downloads 2192112 Hand Symbol Recognition Using Canny Edge Algorithm and Convolutional Neural Network
Authors: Harshit Mittal, Neeraj Garg
Abstract:
Hand symbol recognition is a pivotal component in the domain of computer vision, with far-reaching applications spanning sign language interpretation, human-computer interaction, and accessibility. This research paper discusses the approach with the integration of the Canny Edge algorithm and convolutional neural network. The significance of this study lies in its potential to enhance communication and accessibility for individuals with hearing impairments or those engaged in gesture-based interactions with technology. In the experiment mentioned, the data is manually collected by the authors from the webcam using Python codes, to increase the dataset augmentation, is applied to original images, which makes the model more compatible and advanced. Further, the dataset of about 6000 coloured images distributed equally in 5 classes (i.e., 1, 2, 3, 4, 5) are pre-processed first to gray images and then by the Canny Edge algorithm with threshold 1 and 2 as 150 each. After successful data building, this data is trained on the Convolutional Neural Network model, giving accuracy: 0.97834, precision: 0.97841, recall: 0.9783, and F1 score: 0.97832. For user purposes, a block of codes is built in Python to enable a window for hand symbol recognition. This research, at its core, seeks to advance the field of computer vision by providing an advanced perspective on hand sign recognition. By leveraging the capabilities of the Canny Edge algorithm and convolutional neural network, this study contributes to the ongoing efforts to create more accurate, efficient, and accessible solutions for individuals with diverse communication needs.Keywords: hand symbol recognition, computer vision, Canny edge algorithm, convolutional neural network
Procedia PDF Downloads 632111 The Relationship between Friedrich Nietzsche’s Dream and Intoxication: Through Analyzing the “Steppenwolf” by Hermann Hesse
Authors: Mengjie Liu
Abstract:
This essay mainly analyses the representation of the Apollo and Dionysus spirits in Hermann Hesse’s novel “Steppenwolf.” This analysis adopts a theoretical approach based on Fredrich Nietzsche’s theory of the two separate art worlds, dream and intoxication, which corresponds to the two art deities, Apollo and Dionysus. The essay will discuss Friedrich Nietzsche’s art and aesthetic theory of dream and intoxication in the first part. Then the essay will elaborate on the representation of the Apollo spirit and dream in “Steppenwolf” in the second section from two aspects: (1) Harry Haller’s (the main character) self-recognition and semblance with Hermina. (2) The realization of Hermina’s prophecy of the dream. Then the essay will analyze the representation of the Dionysus spirit and the intoxication in the third part by demonstrating Harry Haller’s self-forgetting and melting into the crowd. The essay will combine the two spirits in the fourth section and discuss the relationship between dream and intoxication as the stimulator (dream) and the realizing (intoxication). This essay takes Nietzsche’s theory as the basic foundation while also drawing sources from psychological analysis theories and other literature sources.Keywords: dream, intoxication, Nietzsche, Steppenwolf
Procedia PDF Downloads 1472110 Multimodal Database of Emotional Speech, Video and Gestures
Authors: Tomasz Sapiński, Dorota Kamińska, Adam Pelikant, Egils Avots, Cagri Ozcinar, Gholamreza Anbarjafari
Abstract:
People express emotions through different modalities. Integration of verbal and non-verbal communication channels creates a system in which the message is easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human-machine interaction. In this article, the authors present a Polish emotional database composed of three modalities: facial expressions, body movement and gestures, and speech. The corpora contains recordings registered in studio conditions, acted out by 16 professional actors (8 male and 8 female). The data is labeled with six basic emotions categories, according to Ekman’s emotion categories. To check the quality of performance, all recordings are evaluated by experts and volunteers. The database is available to academic community and might be useful in the study on audio-visual emotion recognition.Keywords: body movement, emotion recognition, emotional corpus, facial expressions, gestures, multimodal database, speech
Procedia PDF Downloads 3482109 Two Brazilian Medeas: The Cases of Mata Teu Pai and Medeia Negra
Authors: Jaqueline Bohn Donada
Abstract:
The significance of Euripides’ Medea for contemporary literature is noticeable. Even if the bulk of Classical Reception studies does not tend to look carefully and consistently to the literature produced outside the Anglophone world, Brazilian literature offers abundant materials for such studies. Indeed, a certain Classical background can be observed in Brazilian literature at least since 1975 when Gota d’Água [The Final Straw, in English], a play that recreates the story of Medea and sets it in a favela in Rio de Janeiro. Also worthy of notice is Ivo Bender’s Trilogia Perversa [Perverse Trilogy, in English], a series of three historical plays set in Southern Brazil and based on Aeschylus’ Oresteia and on Euripides’ Iphigenia in Aulis published in the 1980s. Since then, a number of works directly inspired by the plays of Aeschylus, Sophocles and Euripides have been published, not to mention several adaptations of Homer’s two epic poems. This paper proposes a comparative analysis of two such works: Grace Passô’s 2017 play Mata teu Pai [Kill your father, in English] and Marcia Lima’s 2019 play Medeia Negra [Black Medea, in English] from the perspective of Classical Reception Studies in an intersection with feminist literary criticism. The paper intends to look at the endurance of Euripides’ character in contemporary Brazilian literature with a focus on how the character seems to have acquired special relevance to the treatment of pressing issues of the twenty-first century. Whereas Grace Passô’s play sets Medea at the center of a group of immigrant women, Marcia Limma has the character enact the dilemmas of incarcerated women in Brazil. The hypothesis that this research aims at testing is that both artists preserve the pathos of Euripides’s original character at the same time that they recreate his Medea in concrete circumstances of Brazilian contemporary social reality. At the end, the research aims at stating the significance of the Medea theme to contemporary Brazilian literature.Keywords: Euripides, Medea, Grace Passô, Marcia Limma, Brazilian literature
Procedia PDF Downloads 1302108 An Approach for Vocal Register Recognition Based on Spectral Analysis of Singing
Authors: Aleksandra Zysk, Pawel Badura
Abstract:
Recognizing and controlling vocal registers during singing is a difficult task for beginner vocalist. It requires among others identifying which part of natural resonators is being used when a sound propagates through the body. Thus, an application has been designed allowing for sound recording, automatic vocal register recognition (VRR), and a graphical user interface providing real-time visualization of the signal and recognition results. Six spectral features are determined for each time frame and passed to the support vector machine classifier yielding a binary decision on the head or chest register assignment of the segment. The classification training and testing data have been recorded by ten professional female singers (soprano, aged 19-29) performing sounds for both chest and head register. The classification accuracy exceeded 93% in each of various validation schemes. Apart from a hard two-class clustering, the support vector classifier returns also information on the distance between particular feature vector and the discrimination hyperplane in a feature space. Such an information reflects the level of certainty of the vocal register classification in a fuzzy way. Thus, the designed recognition and training application is able to assess and visualize the continuous trend in singing in a user-friendly graphical mode providing an easy way to control the vocal emission.Keywords: classification, singing, spectral analysis, vocal emission, vocal register
Procedia PDF Downloads 3002107 ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian
Authors: Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei Giglou, Saeed Bibak
Abstract:
The Persian language is an inflectional subject-object-verb language. This fact makes Persian a more uncertain language. However, using techniques such as Zero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian followed by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and 98.50%, respectively. Experimental results show that our proposed approach is very effective in text refinement for the Persian language.Keywords: Persian Ezafe, punctuation, ZWNJ, NLP, ParsBERT, transformers
Procedia PDF Downloads 213