Search results for: voice recognition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2033

Search results for: voice recognition

1913 Investigation of New Gait Representations for Improving Gait Recognition

Authors: Chirawat Wattanapanich, Hong Wei

Abstract:

This study presents new gait representations for improving gait recognition accuracy on cross gait appearances, such as normal walking, wearing a coat and carrying a bag. Based on the Gait Energy Image (GEI), two ideas are implemented to generate new gait representations. One is to append lower knee regions to the original GEI, and the other is to apply convolutional operations to the GEI and its variants. A set of new gait representations are created and used for training multi-class Support Vector Machines (SVMs). Tests are conducted on the CASIA dataset B. Various combinations of the gait representations with different convolutional kernel size and different numbers of kernels used in the convolutional processes are examined. Both the entire images as features and reduced dimensional features by Principal Component Analysis (PCA) are tested in gait recognition. Interestingly, both new techniques, appending the lower knee regions to the original GEI and convolutional GEI, can significantly contribute to the performance improvement in the gait recognition. The experimental results have shown that the average recognition rate can be improved from 75.65% to 87.50%.

Keywords: convolutional image, lower knee, gait

Procedia PDF Downloads 178
1912 Offline Signature Verification in Punjabi Based On SURF Features and Critical Point Matching Using HMM

Authors: Rajpal Kaur, Pooja Choudhary

Abstract:

Biometrics, which refers to identifying an individual based on his or her physiological or behavioral characteristics, has the capabilities to the reliably distinguish between an authorized person and an imposter. The Signature recognition systems can categorized as offline (static) and online (dynamic). This paper presents Surf Feature based recognition of offline signatures system that is trained with low-resolution scanned signature images. The signature of a person is an important biometric attribute of a human being which can be used to authenticate human identity. However the signatures of human can be handled as an image and recognized using computer vision and HMM techniques. With modern computers, there is need to develop fast algorithms for signature recognition. There are multiple techniques are defined to signature recognition with a lot of scope of research. In this paper, (static signature) off-line signature recognition & verification using surf feature with HMM is proposed, where the signature is captured and presented to the user in an image format. Signatures are verified depended on parameters extracted from the signature using various image processing techniques. The Off-line Signature Verification and Recognition is implemented using Mat lab platform. This work has been analyzed or tested and found suitable for its purpose or result. The proposed method performs better than the other recently proposed methods.

Keywords: offline signature verification, offline signature recognition, signatures, SURF features, HMM

Procedia PDF Downloads 358
1911 The “Prologue” in Tommy Orange’S There, There: Reinventing the Introductory Section

Authors: Kristin Murray

Abstract:

The proposed paper exams prologues in 20th and 21st century American literature in order to show how Native American writer Tommy Orange’s Prologue in his 2018 novel There, Thereis different. In an interview about his 2018 novel There, There, explains he feels “a kind of burden to catch the general reader up with what really happened, because history has got it so wrong and still continue to” (Laubernds). Orange, thus, includes a “Prologue” in his novel to do this work, catching readers upon Native Americans and their history. Prologues are usually from the narrator’s voice, a character’s voice, or even from a fictionalized version of the author, but the tone of Orange’s “Prologue” is that of a non-fictional first-person essayist. Examining prologues in American literature posits Orange’s prologue outside the norm. This paper also examines other introductory sections, the preface, in particular. The research and examination reveal that Orange is adding his personal voice in the Prologue to the multiple narratorsof the novel, and his is the voice of a writer who knows that his audience comes to his novel with a plethora of misinformation. The truths he tells are horrifying and hopeful. He tells of Thanksgiving as a “land deal” and a “successful massacre,” but he also tellsreaders how urban Indians have found a sense of the land, even through concrete. Native American writers contributed and still contribute to the genre of autobiography in ways that have changed our understanding of this genre. This examination of Orange’s Prologue reveals the new and unexpected way to view this often under-examined introductory section, the prologue.

Keywords: native american literature, prologues, prefaces, 20th century american literature

Procedia PDF Downloads 154
1910 Convolutional Neural Networks-Optimized Text Recognition with Binary Embeddings for Arabic Expiry Date Recognition

Authors: Mohamed Lotfy, Ghada Soliman

Abstract:

Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on Convolutional Neural Networks (CNN) that take the dot matrix as input and generate embeddings that are rounded to generate binary representations of the digits. The binary embeddings are then used to perform Optical Character Recognition (OCR) on the digit images. To overcome the challenge of the limited availability of dotted Arabic expiration date images, we developed a True Type Font (TTF) for generating synthetic images of Arabic dot-matrix characters. The model was trained on a synthetic dataset of 3287 images and 658 synthetic images for testing, representing realistic expiration dates from 2019 to 2027 in the format of yyyy/mm/dd. Our model achieved an accuracy of 98.94% on the expiry date recognition with Arabic dot matrix format using fewer parameters and less computational resources than traditional CNN-based models. By investigating and presenting our findings comprehensively, we aim to contribute substantially to the field of OCR and pave the way for advancements in Arabic dot-matrix character recognition. Our proposed approach is not limited to Arabic dot matrix digit recognition but can also be extended to text recognition tasks, such as text classification and sentiment analysis.

Keywords: computer vision, pattern recognition, optical character recognition, deep learning

Procedia PDF Downloads 51
1909 The Impact of Vocal and Physical Attractiveness on the Employment Interview

Authors: Alexandra Roy

Abstract:

This research examines how physical and vocal attractiveness affect impressions of an applicant and whether these impressions are affected by gender or job type. Findings, based on two samples, indicate that individuals with less attractiveness voice and physical appearance were viewed as less suitable job applicants and as possessing more negative characteristics than those others. These negative impressions were pervasive and unaffected by either applicant gender or job type. Specifically, we found that job candidates with an attractive voice or physique were perceived as more extroverted, less agreeable, less conscientious, less trustworthy less competent, less sociable and less recruitable. Results are robust to various sensitivity checks.

Keywords: discrimination, nonverbal, hiring, attractiveness

Procedia PDF Downloads 189
1908 Adaptation and Validation of Voice Handicap Index in Telugu Language

Authors: B. S. Premalatha, Kausalya Sahani

Abstract:

Background: Voice is multidimensional which convey emotion, feelings, and communication. Voice disorders have an adverse effect on the physical, emotional and functional domains of an individual. Self-rating by clients about their voice problem helps the clinicians to plan intervention strategies. Voice handicap index is one such self-rating scale contains 30 questions that quantify the functional, physical and emotional impacts of a voice disorder on a patient’s quality of life. Each subsection has 10 questions. Though adapted and validated versions of VHI are available in other Indian languages but not in Telugu, which is a Dravidian language native to India. It is mainly spoken in Andhra Pradesh and neighbouring states in southern India. Objectives: To adapt and validate the English version of Voice Handicap Index (VHI) into Telugu language and evaluate its internal consistency and clinical validate in Telugu speaking population. Materials: The study carried out in three stages. First stage was a forward translation of English version of VHI, was given to ten experts, who were well proficient in writing and reading Telugu and five speech-language pathologists to translate into Telugu. Second Stage was backward translation where translated version of Telugu was given to a different group of ten experts (who were well proficient in writing and reading Telugu) and five speech-language pathologists who were native Telugu speakers and had good proficiency in Telugu and English. The third stage was an administration of translated version on Telugu to the targeted population. Totally 40 clinical subjects and 40 normal controls served as participants, and each group had 26 males and 14 females’ age range of 20 to 60 years. Clinical group comprised of individuals with laryngectomee with the Tracheoesophageal puncture (n=18), laryngitis (n=11), vocal nodules (n=7) and vocal fold palsy (n=4). Participants were asked to mark of their each experience on a 5 point equal appearing scale (0=never, 1=almost never, 2=sometimes, 3=almost always, 4=always) with a maximum total score of 120. Results: Statistical analysis was made by using SPSS software (22.0.0 Version). Mean, standard deviation and percentage (%) were calculated all the participants for both the groups. Internal consistency of VHI in Telugu was found to be excellent with the consistency scores for all the domains such as physical, emotional and functional are 0.742, 0.934and 0.938. The validity of scores showed a significant difference between clinical population and control group for domains like physical, emotional and functional and total scores. P value found to be less than 0.001( < 0.001). Negative correlation found in age and gender among self-domains such as physical, emotional and functional total scores in dysphonic and control group. Conclusion: The present study indicated that VHI in Telugu is able to discriminate participants having voice pathology from normal populations, which make this as a valid tool to collect information about their voice from the participants.

Keywords: adaptation, Telugu Version, translation, Voice Handicap Index (VHI)

Procedia PDF Downloads 257
1907 Recognition of Grocery Products in Images Captured by Cellular Phones

Authors: Farshideh Einsele, Hassan Foroosh

Abstract:

In this paper, we present a robust algorithm to recognize extracted text from grocery product images captured by mobile phone cameras. Recognition of such text is challenging since text in grocery product images varies in its size, orientation, style, illumination, and can suffer from perspective distortion. Pre-processing is performed to make the characters scale and rotation invariant. Since text degradations can not be appropriately defined using wellknown geometric transformations such as translation, rotation, affine transformation and shearing, we use the whole character black pixels as our feature vector. Classification is performed with minimum distance classifier using the maximum likelihood criterion, which delivers very promising Character Recognition Rate (CRR) of 89%. We achieve considerably higher Word Recognition Rate (WRR) of 99% when using lower level linguistic knowledge about product words during the recognition process.

Keywords: camera-based OCR, feature extraction, document, image processing, grocery products

Procedia PDF Downloads 375
1906 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 258
1905 Vision-Based Hand Segmentation Techniques for Human-Computer Interaction

Authors: M. Jebali, M. Jemni

Abstract:

This work is the part of vision based hand gesture recognition system for Natural Human Computer Interface. Hand tracking and segmentation are the primary steps for any hand gesture recognition system. The aim of this paper is to develop robust and efficient hand segmentation algorithm such as an input to another system which attempt to bring the HCI performance nearby the human-human interaction, by modeling an intelligent sign language recognition system based on prediction in the context of dialogue between the system (avatar) and the interlocutor. For the purpose of hand segmentation, an overcoming occlusion approach has been proposed for superior results for detection of hand from an image.

Keywords: HCI, sign language recognition, object tracking, hand segmentation

Procedia PDF Downloads 380
1904 An Erudite Technique for Face Detection and Recognition Using Curvature Analysis

Authors: S. Jagadeesh Kumar

Abstract:

Face detection and recognition is an authoritative technology for image database management, video surveillance, and human computer interface (HCI). Face recognition is a rapidly nascent method, which has been extensively discarded in forensics such as felonious identification, tenable entree, and custodial security. This paper recommends an erudite technique using curvature analysis (CA) that has less false positives incidence, operative in different light environments and confiscates the artifacts that are introduced during image acquisition by ring correction in polar coordinate (RCP) method. This technique affronts mean and median filtering technique to remove the artifacts but it works in polar coordinate during image acquisition. Investigational fallouts for face detection and recognition confirms decent recitation even in diagonal orientation and stance variation.

Keywords: curvature analysis, ring correction in polar coordinate method, face detection, face recognition, human computer interaction

Procedia PDF Downloads 254
1903 The Effects of Culture and Language on Social Impression Formation from Voice Pleasantness: A Study with French and Iranian People

Authors: L. Bruckert, A. Mansourzadeh

Abstract:

The voice has a major influence on interpersonal communication in everyday life via the perception of pleasantness. The evolutionary perspective postulates that the mechanisms underlying the pleasantness judgments are universal adaptations that have evolved in the service of choosing a mate (through the process of sexual selection). From this point of view, the favorite voices would be those with more marked sexually dimorphic characteristics; for example, in men with lower voice pitch, pitch is the main criterion. On the other hand, one can postulate that the mechanisms involved are gradually established since childhood through exposure to the environment, and thus the prosodic elements could take precedence in everyday life communication as it conveys information about the speaker's attitude (willingness to communicate, interest toward the interlocutors). Our study focuses on voice pleasantness and its relationship with social impression formation, exploring both the spectral aspects (pitch, timbre) and the prosodic ones. In our study, we recorded the voices through two vocal corpus (five vowels and a reading text) of 25 French males speaking French and 25 Iranian males speaking Farsi. French listeners (40 male/40 female) listened to the French voices and made a judgment either on the voice's pleasantness or on the speaker (judgment about his intelligence, honesty, sociability). The regression analyses from our acoustic measures showed that the prosodic elements (for example, the intonation and the speech rate) are the most important criteria concerning pleasantness, whatever the corpus or the listener's gender. Moreover, the correlation analyses showed that the speakers with the voices judged as the most pleasant are considered the most intelligent, sociable, and honest. The voices in Farsi have been judged by 80 other French listeners (40 male/40 female), and we found the same effect of intonation concerning the judgment of pleasantness with the corpus «vowel» whereas with the corpus «text» the pitch is more important than the prosody. It may suggest that voice perception contains some elements invariant across culture/language, whereas others are influenced by the cultural/linguistic background of the listener. Shortly in the future, Iranian people will be asked to listen either to the French voices for half of them or to the Farsi voices for the other half and produce the same judgments as the French listeners. This experimental design could potentially make it possible to distinguish what is linked to culture and what is linked to language in the case of differences in voice perception.

Keywords: cross-cultural psychology, impression formation, pleasantness, voice perception

Procedia PDF Downloads 41
1902 Android – Based Wireless Electronic Stethoscope

Authors: Aw Adi Arryansyah

Abstract:

Using electronic stethoscope for detecting heartbeat sound, and breath sounds, are the effective way to investigate cardiovascular diseases. On the other side, technology is growing towards mobile. Almost everyone has a smartphone. Smartphone has many platforms. Creating mobile applications also became easier. We also can use HTML5 technology to creating mobile apps. Android is the most widely used type. This is the reason for us to make a wireless electronic stethoscope based on Android mobile. Android based Wireless Electronic Stethoscope designed by a simple system, uses sound sensors mounted membrane, then connected with Bluetooth module which will send the heart auscultation voice input data by Bluetooth signal to an android platform. On the software side, android will read the voice input then it will translate to beautiful visualization and release the voice output which can be regulated about how much of it is going to be released. We can change the heart beat sound into BPM data, and heart beat analysis, like normal beat, bradycardia or tachycardia.

Keywords: wireless, HTML 5, auscultation, bradycardia, tachycardia

Procedia PDF Downloads 326
1901 An Evaluation of Neural Network Efficacies for Image Recognition on Edge-AI Computer Vision Platform

Authors: Jie Zhao, Meng Su

Abstract:

Image recognition, as one of the most critical technologies in computer vision, works to help machine-like robotics understand a scene, that is, if deployed appropriately, will trigger the revolution in remote sensing and industry automation. With the developments of AI technologies, there are many prevailing and sophisticated neural networks as technologies developed for image recognition. However, computer vision platforms as hardware, supporting neural networks for image recognition, as crucial as the neural network technologies, need to be more congruently addressed as the research subjects. In contrast, different computer vision platforms are deterministic to leverage the performance of different neural networks for recognition. In this paper, three different computer vision platforms – Jetson Nano(with 4GB), a standalone laptop(with RTX 3000s, using CUDA), and Google Colab (web-based, using GPU) are explored and four prominent neural network architectures (including AlexNet, VGG(16/19), GoogleNet, and ResNet(18/34/50)), are investigated. In the context of pairwise usage between different computer vision platforms and distinctive neural networks, with the merits of recognition accuracy and time efficiency, the performances are evaluated. In the case study using public imageNets, our findings provide a nuanced perspective on optimizing image recognition tasks across Edge-AI platforms, offering guidance on selecting appropriate neural network structures to maximize performance under hardware constraints.

Keywords: alexNet, VGG, googleNet, resNet, Jetson nano, CUDA, COCO-NET, cifar10, imageNet large scale visual recognition challenge (ILSVRC), google colab

Procedia PDF Downloads 53
1900 Deep Learning Based Unsupervised Sport Scene Recognition and Highlights Generation

Authors: Ksenia Meshkova

Abstract:

With increasing amount of multimedia data, it is very important to automate and speed up the process of obtaining meta. This process means not just recognition of some object or its movement, but recognition of the entire scene versus separate frames and having timeline segmentation as a final result. Labeling datasets is time consuming, besides, attributing characteristics to particular scenes is clearly difficult due to their nature. In this article, we will consider autoencoders application to unsupervised scene recognition and clusterization based on interpretable features. Further, we will focus on particular types of auto encoders that relevant to our study. We will take a look at the specificity of deep learning related to information theory and rate-distortion theory and describe the solutions empowering poor interpretability of deep learning in media content processing. As a conclusion, we will present the results of the work of custom framework, based on autoencoders, capable of scene recognition as was deeply studied above, with highlights generation resulted out of this recognition. We will not describe in detail the mathematical description of neural networks work but will clarify the necessary concepts and pay attention to important nuances.

Keywords: neural networks, computer vision, representation learning, autoencoders

Procedia PDF Downloads 95
1899 A Weighted Approach to Unconstrained Iris Recognition

Authors: Yao-Hong Tsai

Abstract:

This paper presents a weighted approach to unconstrained iris recognition. Nowadays, commercial systems are usually characterized by strong acquisition constraints based on the subject’s cooperation. However, it is not always achievable for real scenarios in our daily life. Researchers have been focused on reducing these constraints and maintaining the performance of the system by new techniques at the same time. With large variation in the environment, there are two main improvements to develop the proposed iris recognition system. For solving extremely uneven lighting condition, statistic based illumination normalization is first used on eye region to increase the accuracy of iris feature. The detection of the iris image is based on Adaboost algorithm. Secondly, the weighted approach is designed by Gaussian functions according to the distance to the center of the iris. Furthermore, local binary pattern (LBP) histogram is then applied to texture classification with the weight. Experiment showed that the proposed system provided users a more flexible and feasible way to interact with the verification system through iris recognition.

Keywords: authentication, iris recognition, adaboost, local binary pattern

Procedia PDF Downloads 193
1898 Efficient Feature Fusion for Noise Iris in Unconstrained Environment

Authors: Yao-Hong Tsai

Abstract:

This paper presents an efficient fusion algorithm for iris images to generate stable feature for recognition in unconstrained environment. Recently, iris recognition systems are focused on real scenarios in our daily life without the subject’s cooperation. Under large variation in the environment, the objective of this paper is to combine information from multiple images of the same iris. The result of image fusion is a new image which is more stable for further iris recognition than each original noise iris image. A wavelet-based approach for multi-resolution image fusion is applied in the fusion process. The detection of the iris image is based on Adaboost algorithm and then local binary pattern (LBP) histogram is then applied to texture classification with the weighting scheme. Experiment showed that the generated features from the proposed fusion algorithm can improve the performance for verification system through iris recognition.

Keywords: image fusion, iris recognition, local binary pattern, wavelet

Procedia PDF Downloads 344
1897 Online Handwritten Character Recognition for South Indian Scripts Using Support Vector Machines

Authors: Steffy Maria Joseph, Abdu Rahiman V, Abdul Hameed K. M.

Abstract:

Online handwritten character recognition is a challenging field in Artificial Intelligence. The classification success rate of current techniques decreases when the dataset involves similarity and complexity in stroke styles, number of strokes and stroke characteristics variations. Malayalam is a complex south indian language spoken by about 35 million people especially in Kerala and Lakshadweep islands. In this paper, we consider the significant feature extraction for the similar stroke styles of Malayalam. This extracted feature set are suitable for the recognition of other handwritten south indian languages like Tamil, Telugu and Kannada. A classification scheme based on support vector machines (SVM) is proposed to improve the accuracy in classification and recognition of online malayalam handwritten characters. SVM Classifiers are the best for real world applications. The contribution of various features towards the accuracy in recognition is analysed. Performance for different kernels of SVM are also studied. A graphical user interface has developed for reading and displaying the character. Different writing styles are taken for each of the 44 alphabets. Various features are extracted and used for classification after the preprocessing of input data samples. Highest recognition accuracy of 97% is obtained experimentally at the best feature combination with polynomial kernel in SVM.

Keywords: SVM, matlab, malayalam, South Indian scripts, onlinehandwritten character recognition

Procedia PDF Downloads 547
1896 Gender Recognition with Deep Belief Networks

Authors: Xiaoqi Jia, Qing Zhu, Hao Zhang, Su Yang

Abstract:

A gender recognition system is able to tell the gender of the given person through a few of frontal facial images. An effective gender recognition approach enables to improve the performance of many other applications, including security monitoring, human-computer interaction, image or video retrieval and so on. In this paper, we present an effective method for gender classification task in frontal facial images based on deep belief networks (DBNs), which can pre-train model and improve accuracy a little bit. Our experiments have shown that the pre-training method with DBNs for gender classification task is feasible and achieves a little improvement of accuracy on FERET and CAS-PEAL-R1 facial datasets.

Keywords: gender recognition, beep belief net-works, semi-supervised learning, greedy-layer wise RBMs

Procedia PDF Downloads 422
1895 Emotion Recognition Using Artificial Intelligence

Authors: Rahul Mohite, Lahcen Ouarbya

Abstract:

This paper focuses on the interplay between humans and computer systems and the ability of these systems to understand and respond to human emotions, including non-verbal communication. Current emotion recognition systems are based solely on either facial or verbal expressions. The limitation of these systems is that it requires large training data sets. The paper proposes a system for recognizing human emotions that combines both speech and emotion recognition. The system utilizes advanced techniques such as deep learning and image recognition to identify facial expressions and comprehend emotions. The results show that the proposed system, based on the combination of facial expression and speech, outperforms existing ones, which are based solely either on facial or verbal expressions. The proposed system detects human emotion with an accuracy of 86%, whereas the existing systems have an accuracy of 70% using verbal expression only and 76% using facial expression only. In this paper, the increasing significance and demand for facial recognition technology in emotion recognition are also discussed.

Keywords: facial reputation, expression reputation, deep gaining knowledge of, photo reputation, facial technology, sign processing, photo type

Procedia PDF Downloads 79
1894 Improving Activity Recognition Classification of Repetitious Beginner Swimming Using a 2-Step Peak/Valley Segmentation Method with Smoothing and Resampling for Machine Learning

Authors: Larry Powell, Seth Polsley, Drew Casey, Tracy Hammond

Abstract:

Human activity recognition (HAR) systems have shown positive performance when recognizing repetitive activities like walking, running, and sleeping. Water-based activities are a reasonably new area for activity recognition. However, water-based activity recognition has largely focused on supporting the elite and competitive swimming population, which already has amazing coordination and proper form. Beginner swimmers are not perfect, and activity recognition needs to support the individual motions to help beginners. Activity recognition algorithms are traditionally built around short segments of timed sensor data. Using a time window input can cause performance issues in the machine learning model. The window’s size can be too small or large, requiring careful tuning and precise data segmentation. In this work, we present a method that uses a time window as the initial segmentation, then separates the data based on the change in the sensor value. Our system uses a multi-phase segmentation method that pulls all peaks and valleys for each axis of an accelerometer placed on the swimmer’s lower back. This results in high recognition performance using leave-one-subject-out validation on our study with 20 beginner swimmers, with our model optimized from our final dataset resulting in an F-Score of 0.95.

Keywords: time window, peak/valley segmentation, feature extraction, beginner swimming, activity recognition

Procedia PDF Downloads 86
1893 Myanmar Consonants Recognition System Based on Lip Movements Using Active Contour Model

Authors: T. Thein, S. Kalyar Myo

Abstract:

Human uses visual information for understanding the speech contents in noisy conditions or in situations where the audio signal is not available. The primary advantage of visual information is that it is not affected by the acoustic noise and cross talk among speakers. Using visual information from the lip movements can improve the accuracy and robustness of automatic speech recognition. However, a major challenge with most automatic lip reading system is to find a robust and efficient method for extracting the linguistically relevant speech information from a lip image sequence. This is a difficult task due to variation caused by different speakers, illumination, camera setting and the inherent low luminance and chrominance contrast between lip and non-lip region. Several researchers have been developing methods to overcome these problems; the one is lip reading. Moreover, it is well known that visual information about speech through lip reading is very useful for human speech recognition system. Lip reading is the technique of a comprehensive understanding of underlying speech by processing on the movement of lips. Therefore, lip reading system is one of the different supportive technologies for hearing impaired or elderly people, and it is an active research area. The need for lip reading system is ever increasing for every language. This research aims to develop a visual teaching method system for the hearing impaired persons in Myanmar, how to pronounce words precisely by identifying the features of lip movement. The proposed research will work a lip reading system for Myanmar Consonants, one syllable consonants (င (Nga)၊ ည (Nya)၊ မ (Ma)၊ လ (La)၊ ၀ (Wa)၊ သ (Tha)၊ ဟ (Ha)၊ အ (Ah) ) and two syllable consonants ( က(Ka Gyi)၊ ခ (Kha Gway)၊ ဂ (Ga Nge)၊ ဃ (Ga Gyi)၊ စ (Sa Lone)၊ ဆ (Sa Lain)၊ ဇ (Za Gwe) ၊ ဒ (Da Dway)၊ ဏ (Na Gyi)၊ န (Na Nge)၊ ပ (Pa Saug)၊ ဘ (Ba Gone)၊ ရ (Ya Gaug)၊ ဠ (La Gyi) ). In the proposed system, there are three subsystems, the first one is the lip localization system, which localizes the lips in the digital inputs. The next one is the feature extraction system, which extracts features of lip movement suitable for visual speech recognition. And the final one is the classification system. In the proposed research, Two Dimensional Discrete Cosine Transform (2D-DCT) and Linear Discriminant Analysis (LDA) with Active Contour Model (ACM) will be used for lip movement features extraction. Support Vector Machine (SVM) classifier is used for finding class parameter and class number in training set and testing set. Then, experiments will be carried out for the recognition accuracy of Myanmar consonants using the only visual information on lip movements which are useful for visual speech of Myanmar languages. The result will show the effectiveness of the lip movement recognition for Myanmar Consonants. This system will help the hearing impaired persons to use as the language learning application. This system can also be useful for normal hearing persons in noisy environments or conditions where they can find out what was said by other people without hearing voice.

Keywords: feature extraction, lip reading, lip localization, Active Contour Model (ACM), Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Two Dimensional Discrete Cosine Transform (2D-DCT)

Procedia PDF Downloads 261
1892 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 68
1891 Third Language Perception of English Initial Plosives by Mandarin-Japanese Bilinguals

Authors: Rika Aoki

Abstract:

The aim of this paper is to investigate whether being bilinguals facilitates or impedes the perception of a third language. The present study conducted a perception experiment in which Mandarin-Japanese bilinguals categorized a Voice-Onset-Time (VOT) continuum into English /b/ or /p/. The results show that early bilinguals were influenced by both Mandarin and Japanese, while late bilinguals behaved in a similar manner to Mandarin monolinguals Thus, it can be concluded that in the present study having two languages did not help bilinguals to perceive L3 stop contrast native-likely.

Keywords: bilinguals, perception, third language acquisition, voice-onset-time

Procedia PDF Downloads 263
1890 Hear My Voice: The Educational Experiences of Disabled Students

Authors: Karl Baker-Green, Ian Woolsey

Abstract:

Historically, a variety of methods have been used to access the student voice within higher education, including module evaluations and informal classroom feedback. However, currently, the views articulated in student-staff-committee meetings bear the most weight and can therefore have the most significant impact on departmental policy. Arguably, these forums are exclusionary as several students, including those who experience severe anxiety, might feel unable to participate in this face-to-face (large) group activities. Similarly, students who declare a disability, but are not in possession of a learning contract, are more likely to withdraw from their studies than those whose additional needs have been formally recognised. It is also worth noting that whilst the number of disabled students in Higher Education has increased in recent years, the percentage of those who have been issued a learning contract has decreased. These issues foreground the need to explore the educational experiences of students with or without a learning contract in order to identify their respective aspirations and needs and therefore help shape education policy. This is in keeping with the ‘Nothing about us without us’, agenda, which recognises that disabled individuals are best placed to understand their own requirements and the most effective strategies to meet these.

Keywords: education, student voice, student experience, student retention

Procedia PDF Downloads 75
1889 Make Up Flash: Web Application for the Improvement of Physical Appearance in Images Based on Recognition Methods

Authors: Stefania Arguelles Reyes, Octavio José Salcedo Parra, Alberto Acosta López

Abstract:

This paper presents a web application for the improvement of images through recognition. The web application is based on the analysis of picture-based recognition methods that allow an improvement on the physical appearance of people posting in social networks. The basis relies on the study of tools that can correct or improve some features of the face, with the help of a wide collection of user images taken as reference to build a facial profile. Automatic facial profiling can be achieved with a deeper study of the Object Detection Library. It was possible to improve the initial images with the help of MATLAB and its filtering functions. The user can have a direct interaction with the program and manually adjust his preferences.

Keywords: Matlab, make up, recognition methods, web application

Procedia PDF Downloads 112
1888 Hand Gesture Recognition for Sign Language: A New Higher Order Fuzzy HMM Approach

Authors: Saad M. Darwish, Magda M. Madbouly, Murad B. Khorsheed

Abstract:

Sign Languages (SL) are the most accomplished forms of gestural communication. Therefore, their automatic analysis is a real challenge, which is interestingly implied to their lexical and syntactic organization levels. Hidden Markov models (HMM’s) have been used prominently and successfully in speech recognition and, more recently, in handwriting recognition. Consequently, they seem ideal for visual recognition of complex, structured hand gestures such as are found in sign language. In this paper, several results concerning static hand gesture recognition using an algorithm based on Type-2 Fuzzy HMM (T2FHMM) are presented. The features used as observables in the training as well as in the recognition phases are based on Singular Value Decomposition (SVD). SVD is an extension of Eigen decomposition to suit non-square matrices to reduce multi attribute hand gesture data to feature vectors. SVD optimally exposes the geometric structure of a matrix. In our approach, we replace the basic HMM arithmetic operators by some adequate Type-2 fuzzy operators that permits us to relax the additive constraint of probability measures. Therefore, T2FHMMs are able to handle both random and fuzzy uncertainties existing universally in the sequential data. Experimental results show that T2FHMMs can effectively handle noise and dialect uncertainties in hand signals besides a better classification performance than the classical HMMs. The recognition rate of the proposed system is 100% for uniform hand images and 86.21% for cluttered hand images.

Keywords: hand gesture recognition, hand detection, type-2 fuzzy logic, hidden Markov Model

Procedia PDF Downloads 431
1887 Developing an AI-Driven Application for Real-Time Emotion Recognition from Human Vocal Patterns

Authors: Sayor Ajfar Aaron, Mushfiqur Rahman, Sajjat Hossain Abir, Ashif Newaz

Abstract:

This study delves into the development of an artificial intelligence application designed for real-time emotion recognition from human vocal patterns. Utilizing advanced machine learning algorithms, including deep learning and neural networks, the paper highlights both the technical challenges and potential opportunities in accurately interpreting emotional cues from speech. Key findings demonstrate the critical role of diverse training datasets and the impact of ambient noise on recognition accuracy, offering insights into future directions for improving robustness and applicability in real-world scenarios.

Keywords: artificial intelligence, convolutional neural network, emotion recognition, vocal pattern

Procedia PDF Downloads 8
1886 Fine Grained Action Recognition of Skateboarding Tricks

Authors: Frederik Calsius, Mirela Popa, Alexia Briassouli

Abstract:

In the field of machine learning, it is common practice to use benchmark datasets to prove the working of a method. The domain of action recognition in videos often uses datasets like Kinet-ics, Something-Something, UCF-101 and HMDB-51 to report results. Considering the properties of the datasets, there are no datasets that focus solely on very short clips (2 to 3 seconds), and on highly-similar fine-grained actions within one specific domain. This paper researches how current state-of-the-art action recognition methods perform on a dataset that consists of highly similar, fine-grained actions. To do so, a dataset of skateboarding tricks was created. The performed analysis highlights both benefits and limitations of state-of-the-art methods, while proposing future research directions in the activity recognition domain. The conducted research shows that the best results are obtained by fusing RGB data with OpenPose data for the Temporal Shift Module.

Keywords: activity recognition, fused deep representations, fine-grained dataset, temporal modeling

Procedia PDF Downloads 201
1885 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features

Authors: Kyi Pyar Zaw, Zin Mar Kyu

Abstract:

Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.

Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation

Procedia PDF Downloads 292
1884 Intelligent Human Pose Recognition Based on EMG Signal Analysis and Machine 3D Model

Authors: Si Chen, Quanhong Jiang

Abstract:

In the increasingly mature posture recognition technology, human movement information is widely used in sports rehabilitation, human-computer interaction, medical health, human posture assessment, and other fields today; this project uses the most original ideas; it is proposed to use the collection equipment for the collection of myoelectric data, reflect the muscle posture change on a degree of freedom through data processing, carry out data-muscle three-dimensional model joint adjustment, and realize basic pose recognition. Based on this, bionic aids or medical rehabilitation equipment can be further developed with the help of robotic arms and cutting-edge technology, which has a bright future and unlimited development space.

Keywords: pose recognition, 3D animation, electromyography, machine learning, bionics

Procedia PDF Downloads 48