Search results for: automatic speech recognition
2768 Frequency of Consonant Production Errors in Children with Speech Sound Disorder: A Retrospective-Descriptive Study
Authors: Amulya P. Rao, Prathima S., Sreedevi N.
Abstract:
Speech sound disorders (SSD) encompass the major concern in younger population of India with highest prevalence rate among the speech disorders. Children with SSD if not identified and rehabilitated at the earliest, are at risk for academic difficulties. This necessitates early identification using screening tools assessing the frequently misarticulated speech sounds. The literature on frequently misarticulated speech sounds is ample in English and other western languages targeting individuals with various communication disorders. Articulation is language specific, and there are limited studies reporting the same in Kannada, a Dravidian Language. Hence, the present study aimed to identify the frequently misarticulated consonants in Kannada and also to examine the error type. A retrospective, descriptive study was carried out using secondary data analysis of 41 participants (34-phonetic type and 7-phonemic type) with SSD in the age range 3-to 12-years. All the consonants of Kannada were analyzed by considering three words for each speech sound from the Kannada Diagnostic Photo Articulation test (KDPAT). Picture naming task was carried out, and responses were audio recorded. The recorded data were transcribed using IPA 2018 broad transcription. A criterion of 2/3 or 3/3 error productions was set to consider the speech sound to be an error. Number of error productions was calculated for each consonant in each participant. Then, the percentage of participants meeting the criteria were documented for each consonant to identify the frequently misarticulated speech sound. Overall results indicated that velar /k/ (48.78%) and /g/ (43.90%) were frequently misarticulated followed by voiced retroflex /ɖ/ (36.58%) and trill /r/ (36.58%). The lateral retroflex /ɭ/ was misarticulated by 31.70% of the children with SSD. Dentals (/t/, /n/), bilabials (/p/, /b/, /m/) and labiodental /v/ were produced correctly by all the participants. The highly misarticulated velars /k/ and /g/ were frequently substituted by dentals /t/ and /d/ respectively or omitted. Participants with SSD-phonemic type had multiple substitutions for one speech sound whereas, SSD-phonetic type had consistent single sound substitutions. Intra- and inter-judge reliability for 10% of the data using Cronbach’s Alpha revealed good reliability (0.8 ≤ α < 0.9). Analyzing a larger sample by replicating such studies will validate the present study results.Keywords: consonant, frequently misarticulated, Kannada, SSD
Procedia PDF Downloads 1342767 The Effect of Speech-Shaped Noise and Speaker’s Voice Quality on First-Grade Children’s Speech Perception and Listening Comprehension
Authors: I. Schiller, D. Morsomme, A. Remacle
Abstract:
Children’s ability to process spoken language develops until the late teenage years. At school, where efficient spoken language processing is key to academic achievement, listening conditions are often unfavorable. High background noise and poor teacher’s voice represent typical sources of interference. It can be assumed that these factors particularly affect primary school children, because their language and literacy skills are still low. While it is generally accepted that background noise and impaired voice impede spoken language processing, there is an increasing need for analyzing impacts within specific linguistic areas. Against this background, the aim of the study was to investigate the effect of speech-shaped noise and imitated dysphonic voice on first-grade primary school children’s speech perception and sentence comprehension. Via headphones, 5 to 6-year-old children, recruited within the French-speaking community of Belgium, listened to and performed a minimal-pair discrimination task and a sentence-picture matching task. Stimuli were randomly presented according to four experimental conditions: (1) normal voice / no noise, (2) normal voice / noise, (3) impaired voice / no noise, and (4) impaired voice / noise. The primary outcome measure was task score. How did performance vary with respect to listening condition? Preliminary results will be presented with respect to speech perception and sentence comprehension and carefully interpreted in the light of past findings. This study helps to support our understanding of children’s language processing skills under adverse conditions. Results shall serve as a starting point for probing new measures to optimize children’s learning environment.Keywords: impaired voice, sentence comprehension, speech perception, speech-shaped noise, spoken language processing
Procedia PDF Downloads 1922766 Programmed Speech to Text Summarization Using Graph-Based Algorithm
Authors: Hamsini Pulugurtha, P. V. S. L. Jagadamba
Abstract:
Programmed Speech to Text and Text Summarization Using Graph-based Algorithms can be utilized in gatherings to get the short depiction of the gathering for future reference. This gives signature check utilizing Siamese neural organization to confirm the personality of the client and convert the client gave sound record which is in English into English text utilizing the discourse acknowledgment bundle given in python. At times just the outline of the gathering is required, the answer for this text rundown. Thus, the record is then summed up utilizing the regular language preparing approaches, for example, solo extractive text outline calculationsKeywords: Siamese neural network, English speech, English text, natural language processing, unsupervised extractive text summarization
Procedia PDF Downloads 2172765 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition
Abstract:
The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network
Procedia PDF Downloads 932764 Toward Automatic Chest CT Image Segmentation
Authors: Angely Sim Jia Wun, Sasa Arsovski
Abstract:
Numerous studies have been conducted on the segmentation of medical images. Segmenting the lungs is one of the common research topics in those studies. Our research stemmed from the lack of solutions for automatic bone, airway, and vessel segmentation, despite the existence of multiple lung segmentation techniques. Consequently, currently, available software tools used for medical image segmentation do not provide automatic lung, bone, airway, and vessel segmentation. This paper presents segmentation techniques along with an interactive software tool architecture for segmenting bone, lung, airway, and vessel tissues. Additionally, we propose a method for creating binary masks from automatically generated segments. The key contribution of our approach is the technique for automatic image thresholding using adjustable Hounsfield values and binary mask extraction. Generated binary masks can be successfully used as a training dataset for deep-learning solutions in medical image segmentation. In this paper, we also examine the current software tools used for medical image segmentation, discuss our approach, and identify its advantages.Keywords: lung segmentation, binary masks, U-Net, medical software tools
Procedia PDF Downloads 982763 Scar Removal Stretegy for Fingerprint Using Diffusion
Authors: Mohammad A. U. Khan, Tariq M. Khan, Yinan Kong
Abstract:
Fingerprint image enhancement is one of the most important step in an automatic fingerprint identification recognition (AFIS) system which directly affects the overall efficiency of AFIS. The conventional fingerprint enhancement like Gabor and Anisotropic filters do fill the gaps in ridge lines but they fail to tackle scar lines. To deal with this problem we are proposing a method for enhancing the ridges and valleys with scar so that true minutia points can be extracted with accuracy. Our results have shown an improved performance in terms of enhancement.Keywords: fingerprint image enhancement, removing noise, coherence, enhanced diffusion
Procedia PDF Downloads 5152762 Design and Performance Optimization of Isostatic Pressing Working Cylinder Automatic Exhaust Valve
Authors: Wei-Zhao, Yannian-Bao, Xing-Fan, Lei-Cao
Abstract:
An isostatic pressing working cylinder automatic exhaust valve is designed. The finite element models of valve core and valve body under ultra-high pressure work environment are built to study the influence of interact of valve core and valve body to sealing performance. The contact stresses of metal sealing surface with different sizes are calculated and the automatic exhaust valve is optimized. The result of simulation and experiment shows that the sealing of optimized exhaust valve is more reliable and the service life is greatly improved. The optimized exhaust valve has been used in the warm isostatic pressing equipment.Keywords: exhaust valve, sealing, ultra-high pressure, isostatic pressing
Procedia PDF Downloads 3072761 Reconstructed Phase Space Features for Estimating Post Traumatic Stress Disorder
Authors: Andre Wittenborn, Jarek Krajewski
Abstract:
Trauma-related sadness in speech can alter the voice in several ways. The generation of non-linear aerodynamic phenomena within the vocal tract is crucial when analyzing trauma-influenced speech production. They include non-laminar flow and formation of jets rather than well-behaved laminar flow aspects. Especially state-space reconstruction methods based on chaotic dynamics and fractal theory have been suggested to describe these aerodynamic turbulence-related phenomena of the speech production system. To extract the non-linear properties of the speech signal, we used the time delay embedding method to reconstruct from a scalar time series (reconstructed phase space, RPS). This approach results in the extraction of 7238 Features per .wav file (N= 47, 32 m, 15 f). The speech material was prompted by telling about autobiographical related sadness-inducing experiences (sampling rate 16 kHz, 8-bit resolution). After combining these features in a support vector machine based machine learning approach (leave-one-sample out validation), we achieved a correlation of r = .41 with the well-established, self-report ground truth measure (RATS) of post-traumatic stress disorder (PTSD).Keywords: non-linear dynamics features, post traumatic stress disorder, reconstructed phase space, support vector machine
Procedia PDF Downloads 1022760 On the Implementation of The Pulse Coupled Neural Network (PCNN) in the Vision of Cognitive Systems
Authors: Hala Zaghloul, Taymoor Nazmy
Abstract:
One of the great challenges of the 21st century is to build a robot that can perceive and act within its environment and communicate with people, while also exhibiting the cognitive capabilities that lead to performance like that of people. The Pulse Coupled Neural Network, PCNN, is a relative new ANN model that derived from a neural mammal model with a great potential in the area of image processing as well as target recognition, feature extraction, speech recognition, combinatorial optimization, compressed encoding. PCNN has unique feature among other types of neural network, which make it a candid to be an important approach for perceiving in cognitive systems. This work show and emphasis on the potentials of PCNN to perform different tasks related to image processing. The main drawback or the obstacle that prevent the direct implementation of such technique, is the need to find away to control the PCNN parameters toward perform a specific task. This paper will evaluate the performance of PCNN standard model for processing images with different properties, and select the important parameters that give a significant result, also, the approaches towards find a way for the adaptation of the PCNN parameters to perform a specific task.Keywords: cognitive system, image processing, segmentation, PCNN kernels
Procedia PDF Downloads 2802759 Fine Grained Action Recognition of Skateboarding Tricks
Authors: Frederik Calsius, Mirela Popa, Alexia Briassouli
Abstract:
In the field of machine learning, it is common practice to use benchmark datasets to prove the working of a method. The domain of action recognition in videos often uses datasets like Kinet-ics, Something-Something, UCF-101 and HMDB-51 to report results. Considering the properties of the datasets, there are no datasets that focus solely on very short clips (2 to 3 seconds), and on highly-similar fine-grained actions within one specific domain. This paper researches how current state-of-the-art action recognition methods perform on a dataset that consists of highly similar, fine-grained actions. To do so, a dataset of skateboarding tricks was created. The performed analysis highlights both benefits and limitations of state-of-the-art methods, while proposing future research directions in the activity recognition domain. The conducted research shows that the best results are obtained by fusing RGB data with OpenPose data for the Temporal Shift Module.Keywords: activity recognition, fused deep representations, fine-grained dataset, temporal modeling
Procedia PDF Downloads 2312758 Speech Perception by Video Hosting Services Actors: Urban Planning Conflicts
Authors: M. Pilgun
Abstract:
The report presents the results of a study of the specifics of speech perception by actors of video hosting services on the material of urban planning conflicts. To analyze the content, the multimodal approach using neural network technologies is employed. Analysis of word associations and associative networks of relevant stimulus revealed the evaluative reactions of the actors. Analysis of the data identified key topics that generated negative and positive perceptions from the participants. The calculation of social stress and social well-being indices based on user-generated content made it possible to build a rating of road transport construction objects according to the degree of negative and positive perception by actors.Keywords: social media, speech perception, video hosting, networks
Procedia PDF Downloads 1472757 Functions and Pragmatic Aspects of English Nonsense
Authors: Natalia V. Ursul
Abstract:
In linguistic studies, the question of nonsense is attracting increasing interest. Nonsense is usually defined as spoken or written words that have no meaning. However, this definition is likely to be outdated as any speech act is generated due to the speaker’s pragmatic reasons, thus it cannot be purely illogical or meaningless. In the current paper a new working definition of nonsense as a linguistic medium will be formulated; moreover, the pragmatic peculiarities of newly coined linguistic patterns and possible ways of their interpretation will be discussed.Keywords: nonsense, nonse verse, pragmatics, speech act
Procedia PDF Downloads 5192756 Preliminary Study of the Phonological Development in Three and Four Year Old Bulgarian Children
Authors: Tsvetomira Braynova, Miglena Simonska
Abstract:
The article presents the results of research on phonological processes in three and four-year-old children. For the purpose of the study, an author's test was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the icing is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, an elision of sound, metathesis of sound, elision of a syllable, and elision of consonants clustered in a syllable. All examined children were identified with the articulatory disorder from type bilabial lambdacism. Measuring the correlation between the average length of repeated speech and the average length of generated speech, the analysis proves that the more words a child can repeat in part “repeated speech,” the more words they can be expected to generate in part “generating sentence.” The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.Keywords: assessment, phonology, articulation, speech-language development
Procedia PDF Downloads 1862755 Semi-Automatic Segmentation of Mitochondria on Transmission Electron Microscopy Images Using Live-Wire and Surface Dragging Methods
Authors: Mahdieh Farzin Asanjan, Erkan Unal Mumcuoglu
Abstract:
Mitochondria are cytoplasmic organelles of the cell, which have a significant role in the variety of cellular metabolic functions. Mitochondria act as the power plants of the cell and are surrounded by two membranes. Significant morphological alterations are often due to changes in mitochondrial functions. A powerful technique in order to study the three-dimensional (3D) structure of mitochondria and its alterations in disease states is Electron microscope tomography. Detection of mitochondria in electron microscopy images due to the presence of various subcellular structures and imaging artifacts is a challenging problem. Another challenge is that each image typically contains more than one mitochondrion. Hand segmentation of mitochondria is tedious and time-consuming and also special knowledge about the mitochondria is needed. Fully automatic segmentation methods lead to over-segmentation and mitochondria are not segmented properly. Therefore, semi-automatic segmentation methods with minimum manual effort are required to edit the results of fully automatic segmentation methods. Here two editing tools were implemented by applying spline surface dragging and interactive live-wire segmentation tools. These editing tools were applied separately to the results of fully automatic segmentation. 3D extension of these tools was also studied and tested. Dice coefficients of 2D and 3D for surface dragging using splines were 0.93 and 0.92. This metric for 2D and 3D for live-wire method were 0.94 and 0.91 respectively. The root mean square symmetric surface distance values of 2D and 3D for surface dragging was measured as 0.69, 0.93. The same metrics for live-wire tool were 0.60 and 2.11. Comparing the results of these editing tools with the results of automatic segmentation method, it shows that these editing tools, led to better results and these results were more similar to ground truth image but the required time was higher than hand-segmentation timeKeywords: medical image segmentation, semi-automatic methods, transmission electron microscopy, surface dragging using splines, live-wire
Procedia PDF Downloads 1692754 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features
Authors: Kyi Pyar Zaw, Zin Mar Kyu
Abstract:
Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation
Procedia PDF Downloads 3202753 Intelligent Human Pose Recognition Based on EMG Signal Analysis and Machine 3D Model
Authors: Si Chen, Quanhong Jiang
Abstract:
In the increasingly mature posture recognition technology, human movement information is widely used in sports rehabilitation, human-computer interaction, medical health, human posture assessment, and other fields today; this project uses the most original ideas; it is proposed to use the collection equipment for the collection of myoelectric data, reflect the muscle posture change on a degree of freedom through data processing, carry out data-muscle three-dimensional model joint adjustment, and realize basic pose recognition. Based on this, bionic aids or medical rehabilitation equipment can be further developed with the help of robotic arms and cutting-edge technology, which has a bright future and unlimited development space.Keywords: pose recognition, 3D animation, electromyography, machine learning, bionics
Procedia PDF Downloads 792752 Effects of Therapeutic Horseback Riding in Speech and Communication Skills of Children with Autism
Authors: Aristi Alopoudi, Sofia Beloka, Vassiliki Pliogou
Abstract:
Autism is a complex neuro-developmental disorder with a variety of difficulties in many aspects such as social interaction, communication skills and verbal communication (speech). The aim of this study was to examine the impact of therapeutic horseback riding in improving the verbal and communication skills of children diagnosed with autism during 16 sessions. The researcher examined whether the expression of speech, the use of vocabulary, semantics, pragmatics, echolalia and communication skills were influenced by the therapeutic horseback riding when we increase the frequency of the sessions. The researcher observed two subjects of primary-school aged, in a two case observation design, with autism during 16 therapeutic horseback riding sessions (one riding session per week). Compared to baseline, at the end of the 16th therapeutic session, therapeutic horseback riding increased both verbal skills such as vocabulary, semantics, pragmatics, formation of sentences and communication skills such as eye contact, greeting, participation in dialogue and spontaneous speech. It was noticeable that echolalia remained stable. Increased frequency of therapeutic horseback riding was beneficial for significant improvement in verbal and communication skills. More specifically, from the first to the last riding session there was a great increase of vocabulary, semantics, and formation of sentences. Pragmatics reached a lower level than semantics but the same as the right usage of the first person (for example, I make a hug) and echolalia used for that. A great increase of spontaneous speech was noticed. The eye contact was presented in a lower level, and there was a slow but important raise at the greeting as well as the participation in dialogue. Last but not least; this is a first study conducted in therapeutic horseback riding studying the verbal communication and communication skills in autistic children. According to the references, therapeutic horseback riding is a therapy with a variety of benefits, thus; this research made clear that in the benefits of this therapy there should be included the improvement of verbal speech and communication.Keywords: Autism, communication skills, speech, therapeutic horseback riding
Procedia PDF Downloads 2742751 Smartphone-Based Human Activity Recognition by Machine Learning Methods
Authors: Yanting Cao, Kazumitsu Nawata
Abstract:
As smartphones upgrading, their software and hardware are getting smarter, so the smartphone-based human activity recognition will be described as more refined, complex, and detailed. In this context, we analyzed a set of experimental data obtained by observing and measuring 30 volunteers with six activities of daily living (ADL). Due to the large sample size, especially a 561-feature vector with time and frequency domain variables, cleaning these intractable features and training a proper model becomes extremely challenging. After a series of feature selection and parameters adjustment, a well-performed SVM classifier has been trained.Keywords: smart sensors, human activity recognition, artificial intelligence, SVM
Procedia PDF Downloads 1432750 Low-Income African-American Fathers' Gendered Relationships with Their Children: A Study Examining the Impact of Child Gender on Father-Child Interactions
Authors: M. Lim Haslip
Abstract:
This quantitative study explores the correlation between child gender and father-child interactions. The author analyzes data from videotaped interactions between African-American fathers and their boy or girl toddler to explain how African-American fathers and toddlers interact with each other and whether these interactions differ by child gender. The purpose of this study is to investigate the research question: 'How, if at all, do fathers’ speech and gestures differ when interacting with their two-year-old sons versus daughters during free play?' The objectives of this study are to describe how child gender impacts African-American fathers’ verbal communication, examine how fathers gesture and speak to their toddler by gender, and to guide interventions for low-income African-American families and their children in early language development. This study involves a sample of 41 low-income African-American fathers and their 24-month-old toddlers. The videotape data will be used to observe 10-minute father-child interactions during free play. This study uses the already transcribed and coded data provided by Dr. Meredith Rowe, who did her study on the impact of African-American fathers’ verbal input on their children’s language development. The Child Language Data Exchange System (CHILDES program), created to study conversational interactions, was used for transcription and coding of the videotape data. The findings focus on the quantity of speech, diversity of speech, complexity of speech, and the quantity of gesture to inform the vocabulary usage, number of spoken words, length of speech, and the number of object pointings observed during father-toddler interactions in a free play setting. This study will help intervention and prevention scientists understand early language development in the African-American population. It will contribute to knowledge of the role of African-American fathers’ interactions on their children’s language development. It will guide interventions for the early language development of African-American children.Keywords: parental engagement, early language development, African-American families, quantity of speech, diversity of speech, complexity of speech and the quantity of gesture
Procedia PDF Downloads 1052749 Influence of Loudness Compression on Hearing with Bone Anchored Hearing Implants
Authors: Anja Kurz, Marc Flynn, Tobias Good, Marco Caversaccio, Martin Kompis
Abstract:
Bone Anchored Hearing Implants (BAHI) are routinely used in patients with conductive or mixed hearing loss, e.g. if conventional air conduction hearing aids cannot be used. New sound processors and new fitting software now allow the adjustment of parameters such as loudness compression ratios or maximum power output separately. Today it is unclear, how the choice of these parameters influences aided speech understanding in BAHI users. In this prospective experimental study, the effect of varying the compression ratio and lowering the maximum power output in a BAHI were investigated. Twelve experienced adult subjects with a mixed hearing loss participated in this study. Four different compression ratios (1.0; 1.3; 1.6; 2.0) were tested along with two different maximum power output settings, resulting in a total of eight different programs. Each participant tested each program during two weeks. A blinded Latin square design was used to minimize bias. For each of the eight programs, speech understanding in quiet and in noise was assessed. For speech in quiet, the Freiburg number test and the Freiburg monosyllabic word test at 50, 65, and 80 dB SPL were used. For speech in noise, the Oldenburg sentence test was administered. Speech understanding in quiet and in noise was improved significantly in the aided condition in any program, when compared to the unaided condition. However, no significant differences were found between any of the eight programs. In contrast, on a subjective level there was a significant preference for medium compression ratios of 1.3 to 1.6 and higher maximum power output.Keywords: Bone Anchored Hearing Implant, baha, compression, maximum power output, speech understanding
Procedia PDF Downloads 3872748 Hate Speech Detection in Tunisian Dialect
Authors: Helmi Baazaoui, Mounir Zrigui
Abstract:
This study addresses the challenge of hate speech detection in Tunisian Arabic text, a critical issue for online safety and moderation. Leveraging the strengths of the AraBERT model, we fine-tuned and evaluated its performance against the Bi-LSTM model across four distinct datasets: T-HSAB, TNHS, TUNIZI-Dataset, and a newly compiled dataset with diverse labels such as Offensive Language, Racism, and Religious Intolerance. Our experimental results demonstrate that AraBERT significantly outperforms Bi-LSTM in terms of Recall, Precision, F1-Score, and Accuracy across all datasets. The findings underline the robustness of AraBERT in capturing the nuanced features of Tunisian Arabic and its superior capability in classification tasks. This research not only advances the technology for hate speech detection but also provides practical implications for social media moderation and policy-making in Tunisia. Future work will focus on expanding the datasets and exploring more sophisticated architectures to further enhance detection accuracy, thus promoting safer online interactions.Keywords: hate speech detection, Tunisian Arabic, AraBERT, Bi-LSTM, Gemini annotation tool, social media moderation
Procedia PDF Downloads 112747 Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms
Authors: Aqsa Ali, Aleem Mushtaq, Attaullah Memon, Monna
Abstract:
In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.Keywords: American sign language, assistive hand gesture interpreter, human-machine interface, machine learning, sensing glove
Procedia PDF Downloads 3012746 Performance Evaluation of Contemporary Classifiers for Automatic Detection of Epileptic EEG
Authors: K. E. Ch. Vidyasagar, M. Moghavvemi, T. S. S. T. Prabhat
Abstract:
Epilepsy is a global problem, and with seizures eluding even the smartest of diagnoses a requirement for automatic detection of the same using electroencephalogram (EEG) would have a huge impact in diagnosis of the disorder. Among a multitude of methods for automatic epilepsy detection, one should find the best method out, based on accuracy, for classification. This paper reasons out, and rationalizes, the best methods for classification. Accuracy is based on the classifier, and thus this paper discusses classifiers like quadratic discriminant analysis (QDA), classification and regression tree (CART), support vector machine (SVM), naive Bayes classifier (NBC), linear discriminant analysis (LDA), K-nearest neighbor (KNN) and artificial neural networks (ANN). Results show that ANN is the most accurate of all the above stated classifiers with 97.7% accuracy, 97.25% specificity and 98.28% sensitivity in its merit. This is followed closely by SVM with 1% variation in result. These results would certainly help researchers choose the best classifier for detection of epilepsy.Keywords: classification, seizure, KNN, SVM, LDA, ANN, epilepsy
Procedia PDF Downloads 5202745 Multimodal Employee Attendance Management System
Authors: Khaled Mohammed
Abstract:
This paper presents novel face recognition and identification approaches for the real-time attendance management problem in large companies/factories and government institutions. The proposed uses the Minimum Ratio (MR) approach for employee identification. Capturing the authentic face variability from a sequence of video frames has been considered for the recognition of faces and resulted in system robustness against the variability of facial features. Experimental results indicated an improvement in the performance of the proposed system compared to the Previous approaches at a rate between 2% to 5%. In addition, it decreased the time two times if compared with the Previous techniques, such as Extreme Learning Machine (ELM) & Multi-Scale Structural Similarity index (MS-SSIM). Finally, it achieved an accuracy of 99%.Keywords: attendance management system, face detection and recognition, live face recognition, minimum ratio
Procedia PDF Downloads 1552744 Human Gait Recognition Using Moment with Fuzzy
Authors: Jyoti Bharti, Navneet Manjhi, M. K.Gupta, Bimi Jain
Abstract:
A reliable gait features are required to extract the gait sequences from an images. In this paper suggested a simple method for gait identification which is based on moments. Moment values are extracted on different number of frames of gray scale and silhouette images of CASIA database. These moment values are considered as feature values. Fuzzy logic and nearest neighbour classifier are used for classification. Both achieved higher recognition.Keywords: gait, fuzzy logic, nearest neighbour, recognition rate, moments
Procedia PDF Downloads 7572743 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction
Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar
Abstract:
Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation
Procedia PDF Downloads 1432742 Image Recognition and Anomaly Detection Powered by GANs: A Systematic Review
Authors: Agastya Pratap Singh
Abstract:
Generative Adversarial Networks (GANs) have emerged as powerful tools in the fields of image recognition and anomaly detection due to their ability to model complex data distributions and generate realistic images. This systematic review explores recent advancements and applications of GANs in both image recognition and anomaly detection tasks. We discuss various GAN architectures, such as DCGAN, CycleGAN, and StyleGAN, which have been tailored to improve accuracy, robustness, and efficiency in visual data analysis. In image recognition, GANs have been used to enhance data augmentation, improve classification models, and generate high-quality synthetic images. In anomaly detection, GANs have proven effective in identifying rare and subtle abnormalities across various domains, including medical imaging, cybersecurity, and industrial inspection. The review also highlights the challenges and limitations associated with GAN-based methods, such as instability during training and mode collapse, and suggests future research directions to overcome these issues. Through this review, we aim to provide researchers with a comprehensive understanding of the capabilities and potential of GANs in transforming image recognition and anomaly detection practices.Keywords: generative adversarial networks, image recognition, anomaly detection, DCGAN, CycleGAN, StyleGAN, data augmentation
Procedia PDF Downloads 202741 Recognition of Cursive Arabic Handwritten Text Using Embedded Training Based on Hidden Markov Models (HMMs)
Authors: Rabi Mouhcine, Amrouch Mustapha, Mahani Zouhir, Mammass Driss
Abstract:
In this paper, we present a system for offline recognition cursive Arabic handwritten text based on Hidden Markov Models (HMMs). The system is analytical without explicit segmentation used embedded training to perform and enhance the character models. Extraction features preceded by baseline estimation are statistical and geometric to integrate both the peculiarities of the text and the pixel distribution characteristics in the word image. These features are modelled using hidden Markov models and trained by embedded training. The experiments on images of the benchmark IFN/ENIT database show that the proposed system improves recognition.Keywords: recognition, handwriting, Arabic text, HMMs, embedded training
Procedia PDF Downloads 3542740 An Adaptive CFAR Algorithm Based on Automatic Censoring in Heterogeneous Environments
Authors: Naime Boudemagh
Abstract:
In this work, we aim to improve the detection performances of radar systems. To this end, we propose and analyze a novel censoring technique of undesirable samples, of priori unknown positions, that may be present in the environment under investigation. Therefore, we consider heterogeneous backgrounds characterized by the presence of some irregularities such that clutter edge transitions and/or interfering targets. The proposed detector, termed automatic censoring constant false alarm (AC-CFAR), operates exclusively in a Gaussian background. It is built to allow the segmentation of the environment to regions and switch automatically to the appropriate detector; namely, the cell averaging CFAR (CA-CFAR), the censored mean level CFAR (CMLD-CFAR) or the order statistic CFAR (OS-CFAR). Monte Carlo simulations show that the AC-CFAR detector performs like the CA-CFAR in a homogeneous background. Moreover, the proposed processor exhibits considerable robustness in a heterogeneous background.Keywords: CFAR, automatic censoring, heterogeneous environments, radar systems
Procedia PDF Downloads 6022739 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach
Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik
Abstract:
We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.Keywords: noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping
Procedia PDF Downloads 408