Search results for: speech dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1827

Search results for: speech dataset

1557 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —

Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno

Abstract:

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.

Keywords: rule induction, decision table, missing data, noise

Procedia PDF Downloads 375
1556 Sentiment Classification Using Enhanced Contextual Valence Shifters

Authors: Vo Ngoc Phu, Phan Thi Tuoi

Abstract:

We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.

Keywords: sentiment classification, sentiment orientation, valence shifters, contextual, valence shifters, term counting

Procedia PDF Downloads 487
1555 Using the Smith-Waterman Algorithm to Extract Features in the Classification of Obesity Status

Authors: Rosa Figueroa, Christopher Flores

Abstract:

Text categorization is the problem of assigning a new document to a set of predetermined categories, on the basis of a training set of free-text data that contains documents whose category membership is known. To train a classification model, it is necessary to extract characteristics in the form of tokens that facilitate the learning and classification process. In text categorization, the feature extraction process involves the use of word sequences also known as N-grams. In general, it is expected that documents belonging to the same category share similar features. The Smith-Waterman (SW) algorithm is a dynamic programming algorithm that performs a local sequence alignment in order to determine similar regions between two strings or protein sequences. This work explores the use of SW algorithm as an alternative to feature extraction in text categorization. The dataset used for this purpose, contains 2,610 annotated documents with the classes Obese/Non-Obese. This dataset was represented in a matrix form using the Bag of Word approach. The score selected to represent the occurrence of the tokens in each document was the term frequency-inverse document frequency (TF-IDF). In order to extract features for classification, four experiments were conducted: the first experiment used SW to extract features, the second one used unigrams (single word), the third one used bigrams (two word sequence) and the last experiment used a combination of unigrams and bigrams to extract features for classification. To test the effectiveness of the extracted feature set for the four experiments, a Support Vector Machine (SVM) classifier was tuned using 20% of the dataset. The remaining 80% of the dataset together with 5-Fold Cross Validation were used to evaluate and compare the performance of the four experiments of feature extraction. Results from the tuning process suggest that SW performs better than the N-gram based feature extraction. These results were confirmed by using the remaining 80% of the dataset, where SW performed the best (accuracy = 97.10%, weighted average F-measure = 97.07%). The second best was obtained by the combination of unigrams-bigrams (accuracy = 96.04, weighted average F-measure = 95.97) closely followed by the bigrams (accuracy = 94.56%, weighted average F-measure = 94.46%) and finally unigrams (accuracy = 92.96%, weighted average F-measure = 92.90%).

Keywords: comorbidities, machine learning, obesity, Smith-Waterman algorithm

Procedia PDF Downloads 278
1554 An Application-Driven Procedure for Optimal Signal Digitization of Automotive-Grade Ultrasonic Sensors

Authors: Mohamed Shawki Elamir, Heinrich Gotzig, Raoul Zoellner, Patrick Maeder

Abstract:

In this work, a methodology is presented for identifying the optimal digitization parameters for the analog signal of ultrasonic sensors. These digitization parameters are the resolution of the analog to digital conversion and the sampling rate. This is accomplished through the derivation of characteristic curves based on Fano inequality and the calculation of the mutual information content over a given dataset. The mutual information is calculated between the examples in the dataset and the corresponding variation in the feature that needs to be estimated. The optimal parameters are identified in a manner that ensures optimal estimation performance while preventing inefficiency in using unnecessarily powerful analog to digital converters.

Keywords: analog to digital conversion, digitization, sampling rate, ultrasonic

Procedia PDF Downloads 187
1553 Auditory Function in MP3 Users and Association with Hidden Hearing Loss

Authors: Nana Saralidze, Nino Sharashenidze, Zurab Kevanishvili

Abstract:

Hidden hearing loss may occur in humans exposed to prolonged high-level sound. It is the loss of ability to hear high-level background noise while having normal hearing in quiet. We compared the hearing of people who regularly listen 3 hours and more to personal music players and those who do not. Forty participants aged 18-30 years were divided into two groups: regular users of music players and people who had never used them. And the third group – elders aged 50-55 years, had 15 participants. Pure-tone audiometry (125-16000 Hz), auditory brainstem response (ABR) (70dB SPL), and ability to identify speech in noise (4-talker babble with a 65-dB signal-to-noise ratio at 80 dB) were measured in all participants. All participants had normal pure-tone audiometry (all thresholds < 25 dB HL). A significant difference between groups was observed in that regular users of personal audio systems correctly identified 53% of words, whereas the non-users identified 74% and the elder group – 63%. This contributes evidence supporting the presence of a hidden hearing loss in humans and demonstrates that speech-in-noise audiometry is an effective method and can be considered as the GOLD standard for detecting hidden hearing loss.

Keywords: mp3 player, hidden hearing loss, speech audiometry, pure tone audiometry

Procedia PDF Downloads 55
1552 An Exploratory Study of the Effects of Head Movement on Engagement within a Telepresence Environment

Authors: B. S. Bamoallem, A. J. Wodehouse, G. M. Mair

Abstract:

Communication takes place not only through speech, but also by means of gestures such as facial expressions, gaze, head movements, hand movements and body posture, and though there has been rapid development, communication platforms still lack this type of behavior. We believe communication platforms need to fully achieve this verbal and non-verbal behavior in order to make interactions more engaging and more efficient. In this study we decided to focus our research on the head rather than any other body part as it is a rich source of information for speech-related movement Thus we aim to investigate the value of incorporating head movements into the use of telepresence robots as communication platforms; this will be done by investigating a system that reproduces head movement manually as closely as possible.

Keywords: engagement, nonverbal behaviours, head movements, face-to-face interaction, telepresence robot

Procedia PDF Downloads 442
1551 Efficacy of Phonological Awareness Intervention for People with Language Impairment

Authors: I. Wardana Ketut, I. Suparwa Nyoman

Abstract:

This study investigated the form and characteristic of speech sound produced by three Balinese subjects who have recovered from aphasia as well as intervened their language impairment on side of linguistic and neuronal aspects of views. The failure of judging the speech sound was caused by impairment of motor cortex that indicated there were lesions in left hemispheric language zone. Sound articulation phenomena were in the forms of phonemes deletion, replacement or assimilation in individual words and meaning building for anomic aphasia. Therefore, the Balinese sound patterns were stimulated by showing pictures to the subjects and recorded to recognize what individual consonants or vowels they unclearly produced and to find out how the sound disorder occurred. The physiology of sound production by subject’s speech organs could not only show the accuracy of articulation but also any level of severity the lesion they suffered from. The subjects’ speech sounds were investigated, classified and analyzed to know how poor the lingual units were and observed to clarify weaknesses of sound characters occurred either for place or manner of articulation. Many fricative and stopped consonants were replaced by glottal or palatal sounds because the cranial nerve, such as facial, trigeminal, and hypoglossal underwent impairment after the stroke. The phonological intervention was applied through a technique called phonemic articulation drill and the examination was conducted to know any change has been obtained. The finding informed that some weak articulation turned into clearer sound and simple meaning of language has been conveyed. The hierarchy of functional parts of brain played important role of language formulation and processing. From this finding, it can be clearly emphasized that this study supports the role of right hemisphere in recovery from aphasia is associated with functional brain reorganization.

Keywords: aphasia, intervention, phonology, stroke

Procedia PDF Downloads 182
1550 Testifying in Court as a Victim of Crime for Persons with Little or No Functional Speech: Vocabulary Implications

Authors: Robyn White, Juan Bornman, Ensa Johnson

Abstract:

People with disabilities are at a high risk of becoming victims of crime. Individuals with little or no functional speech (LNFS) face an even higher risk. One way of reducing the risk of remaining a victim of crime is to face the alleged perpetrator in court as a witness – therefore it is important for a person with LNFS who has been a victim of crime to have the required vocabulary to testify in court. The aim of this study was to identify and describe the core and fringe legal vocabulary required by illiterate victims of crime, who have little or no functional speech, to testify in court as witnesses. A mixed-method, the exploratory sequential design consisting of two distinct phases was used to address the aim of the research. The first phase was of a qualitative nature and included two different data sources, namely in-depth semi-structured interviews and focus group discussions. The overall aim of this phase was to identify and describe core and fringe legal vocabulary and to develop a measurement instrument based on these results. Results from Phase 1 were used in Phase 2, the quantitative phase, during which the measurement instrument (a custom-designed questionnaire) was socially validated. The results produced six distinct vocabulary categories that represent the legal core vocabulary and 99 words that represent the legal fringe vocabulary. The findings suggested that communication boards should be individualised to the individual and the specific crime. It is believed that the vocabulary lists developed in this study act as a valid and reliable springboard from which communication boards can be developed. Recommendations were therefore made to develop an Alternative and Augmentative Communication Resource Tool Kit to assist the legal justice system.

Keywords: augmentative and alternative communication, person with little or no functional speech, sexual crimes, testifying in court, victim of crime, witness competency

Procedia PDF Downloads 460
1549 Multi-Objective Optimal Threshold Selection for Similarity Functions in Siamese Networks for Semantic Textual Similarity Tasks

Authors: Kriuk Boris, Kriuk Fedor

Abstract:

This paper presents a comparative study of fundamental similarity functions for Siamese networks in semantic textual similarity (STS) tasks. We evaluate various similarity functions using the STS Benchmark dataset, analyzing their performance and stability. Additionally, we introduce a multi-objective approach for optimal threshold selection. Our findings provide insights into the effectiveness of different similarity functions and offer a straightforward method for threshold selection optimization, contributing to the advancement of Siamese network architectures in STS applications.

Keywords: siamese networks, semantic textual similarity, similarity functions, STS benchmark dataset, threshold selection

Procedia PDF Downloads 10
1548 Tongue Image Retrieval Based Using Machine Learning

Authors: Ahmad FAROOQ, Xinfeng Zhang, Fahad Sabah, Raheem Sarwar

Abstract:

In Traditional Chinese Medicine, tongue diagnosis is a vital inspection tool (TCM). In this study, we explore the potential of machine learning in tongue diagnosis. It begins with the cataloguing of the various classifications and characteristics of the human tongue. We infer 24 kinds of tongues from the material and coating of the tongue, and we identify 21 attributes of the tongue. The next step is to apply machine learning methods to the tongue dataset. We use the Weka machine learning platform to conduct the experiment for performance analysis. The 457 instances of the tongue dataset are used to test the performance of five different machine learning methods, including SVM, Random Forests, Decision Trees, and Naive Bayes. Based on accuracy and Area under the ROC Curve, the Support Vector Machine algorithm was shown to be the most effective for tongue diagnosis (AUC).

Keywords: medical imaging, image retrieval, machine learning, tongue

Procedia PDF Downloads 52
1547 Hybridization of Manually Extracted and Convolutional Features for Classification of Chest X-Ray of COVID-19

Authors: M. Bilal Ishfaq, Adnan N. Qureshi

Abstract:

COVID-19 is the most infectious disease these days, it was first reported in Wuhan, the capital city of Hubei in China then it spread rapidly throughout the whole world. Later on 11 March 2020, the World Health Organisation (WHO) declared it a pandemic. Since COVID-19 is highly contagious, it has affected approximately 219M people worldwide and caused 4.55M deaths. It has brought the importance of accurate diagnosis of respiratory diseases such as pneumonia and COVID-19 to the forefront. In this paper, we propose a hybrid approach for the automated detection of COVID-19 using medical imaging. We have presented the hybridization of manually extracted and convolutional features. Our approach combines Haralick texture features and convolutional features extracted from chest X-rays and CT scans. We also employ a minimum redundancy maximum relevance (MRMR) feature selection algorithm to reduce computational complexity and enhance classification performance. The proposed model is evaluated on four publicly available datasets, including Chest X-ray Pneumonia, COVID-19 Pneumonia, COVID-19 CTMaster, and VinBig data. The results demonstrate high accuracy and effectiveness, with 0.9925 on the Chest X-ray pneumonia dataset, 0.9895 on the COVID-19, Pneumonia and Normal Chest X-ray dataset, 0.9806 on the Covid CTMaster dataset, and 0.9398 on the VinBig dataset. We further evaluate the effectiveness of the proposed model using ROC curves, where the AUC for the best-performing model reaches 0.96. Our proposed model provides a promising tool for the early detection and accurate diagnosis of COVID-19, which can assist healthcare professionals in making informed treatment decisions and improving patient outcomes. The results of the proposed model are quite plausible and the system can be deployed in a clinical or research setting to assist in the diagnosis of COVID-19.

Keywords: COVID-19, feature engineering, artificial neural networks, radiology images

Procedia PDF Downloads 62
1546 ARABEX: Automated Dotted Arabic Expiration Date Extraction using Optimized Convolutional Autoencoder and Custom Convolutional Recurrent Neural Network

Authors: Hozaifa Zaki, Ghada Soliman

Abstract:

In this paper, we introduced an approach for Automated Dotted Arabic Expiration Date Extraction using Optimized Convolutional Autoencoder (ARABEX) with bidirectional LSTM. This approach is used for translating the Arabic dot-matrix expiration dates into their corresponding filled-in dates. A custom lightweight Convolutional Recurrent Neural Network (CRNN) model is then employed to extract the expiration dates. Due to the lack of available dataset images for the Arabic dot-matrix expiration date, we generated synthetic images by creating an Arabic dot-matrix True Type Font (TTF) matrix to address this limitation. Our model was trained on a realistic synthetic dataset of 3287 images, covering the period from 2019 to 2027, represented in the format of yyyy/mm/dd. We then trained our custom CRNN model using the generated synthetic images to assess the performance of our model (ARABEX) by extracting expiration dates from the translated images. Our proposed approach achieved an accuracy of 99.4% on the test dataset of 658 images, while also achieving a Structural Similarity Index (SSIM) of 0.46 for image translation on our dataset. The ARABEX approach demonstrates its ability to be applied to various downstream learning tasks, including image translation and reconstruction. Moreover, this pipeline (ARABEX+CRNN) can be seamlessly integrated into automated sorting systems to extract expiry dates and sort products accordingly during the manufacturing stage. By eliminating the need for manual entry of expiration dates, which can be time-consuming and inefficient for merchants, our approach offers significant results in terms of efficiency and accuracy for Arabic dot-matrix expiration date recognition.

Keywords: computer vision, deep learning, image processing, character recognition

Procedia PDF Downloads 59
1545 Speech and LanguageTherapists’ Advices for Multilingual Children with Developmental Language Disorders

Authors: Rudinë Fetahaj, Flaka Isufi, Kristina Hansson

Abstract:

While evidence shows that in most European countries’ multilingualism is rising, unfortunately, the focus of Speech and Language Therapy (SLT) is still monolingualism. Furthermore, there is sparse information on how the needs of multilingual children with language disorders such as Developmental Language Disorder (DLD) are being met and which factors affect the intervention approach of SLTs when treating DLD. This study aims to examine the relationship and correlation between the number of languages SLTs speak, years of experience, and length of education with the advice they give to parents of multilingual children with DLD regarding which language to be spoken. This is a cross-sectional study where a survey was completed online by 2608 SLTs across Europe and data has been used from a 2017 COST-action project. IBM-SPSS-28 was used where descriptive analysis, correlation and Kruskal-Wallis test were performed.SLTs mainly advise the parents of multilingual children with DLD to speak their native language at home. Besides years of experience, language status and the level of education showed to have no association with the type of advice SLTs give. Results showed a non-significant moderate positive correlation between SLTs years of experience and their advice regarding the native language, whereas language status and length of education showed no correlation with the advice SLTs give to parents.

Keywords: quantitative study, developmental language disorders, multilingualism, speech and language therapy, children, European context

Procedia PDF Downloads 67
1544 Refusal Speech Acts in French Learners of Mandarin Chinese

Authors: Jui-Hsueh Hu

Abstract:

This study investigated various models of refusal speech acts among three target groups: French learners of Mandarin Chinese (FM), Taiwanese native Mandarin speakers (TM), and native French speakers (NF). The refusal responses were analyzed in terms of their options, frequencies, and sequences and the contents of their semantic formulas. This study also examined differences in refusal strategies, as determined by social status and social distance, among the three groups. The difficulties of refusal speech acts encountered by FM were then generalized. The results indicated that Mandarin instructors of NF should focus on the different reasons for the pragmatic failure of French learners and should assist these learners in mastering refusal speech acts that rely on abundant cultural information. In this study, refusal policies were mainly classified according to the research of Beebe et al. (1990). Discourse completion questionnaires were collected from TM, FM, and NF, and their responses were compared to determine how refusal policies differed among the groups. This study not only emphasized the dissimilarities of refusal strategies between native Mandarin speakers and second-language Mandarin learners but also used NF as a control group. The results of this study demonstrated that regarding overall strategies, FM were biased toward NF in terms of strategy choice, order, and content, resulting in pragmatic transfer under the influence of social factors such as 'social status' and 'social distance,' strategy choices of FM were still closer to those of NF, and the phenomenon of pragmatic transfer of FM was revealed. Regarding the refusal difficulties among the three groups, the F-test in the analysis of variance revealed statistical significance was achieved for Role Playing Items 13 and 14 (P < 0.05). A difference was observed in the average number of refusal difficulties between the participants. However, after multiple comparisons, it was found that item 13 (unrecognized heterosexual junior colleague requesting contacts) was significantly more difficult for NF than for TM and FM; item 14 (contacts requested by an unrecognized classmate of the opposite sex) was significantly more difficult to refuse for NF than for TM. This study summarized the pragmatic language errors that most FM often perform, including the misuse or absence of modal words, hedging expressions, and empty words at the end of sentences, as the reasons for pragmatic failures. The common social pragmatic failures of FM include inaccurately applying the level of directness and formality.

Keywords: French Mandarin, interlanguage refusal, pragmatic transfer, speech acts

Procedia PDF Downloads 239
1543 Comparison Study of Machine Learning Classifiers for Speech Emotion Recognition

Authors: Aishwarya Ravindra Fursule, Shruti Kshirsagar

Abstract:

In the intersection of artificial intelligence and human-centered computing, this paper delves into speech emotion recognition (SER). It presents a comparative analysis of machine learning models such as K-Nearest Neighbors (KNN),logistic regression, support vector machines (SVM), decision trees, ensemble classifiers, and random forests, applied to SER. The research employs four datasets: Crema D, SAVEE, TESS, and RAVDESS. It focuses on extracting salient audio signal features like Zero Crossing Rate (ZCR), Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), root mean square (RMS) value, and MelSpectogram. These features are used to train and evaluate the models’ ability to recognize eight types of emotions from speech: happy, sad, neutral, angry, calm, disgust, fear, and surprise. Among the models, the Random Forest algorithm demonstrated superior performance, achieving approximately 79% accuracy. This suggests its suitability for SER within the parameters of this study. The research contributes to SER by showcasing the effectiveness of various machine learning algorithms and feature extraction techniques. The findings hold promise for the development of more precise emotion recognition systems in the future. This abstract provides a succinct overview of the paper’s content, methods, and results.

Keywords: comparison, ML classifiers, KNN, decision tree, SVM, random forest, logistic regression, ensemble classifiers

Procedia PDF Downloads 26
1542 The Mirage of Progress? a Longitudinal Study of Japanese Students’ L2 Oral Grammar

Authors: Robert Long, Hiroaki Watanabe

Abstract:

This longitudinal study examines the grammatical errors of Japanese university students’ dialogues with a native speaker over an academic year. The L2 interactions of 15 Japanese speakers were taken from the JUSFC2018 corpus (April/May 2018) and the JUSFC2019 corpus (January/February). The corpora were based on a self-introduction monologue and a three-question dialogue; however, this study examines the grammatical accuracy found in the dialogues. Research questions focused on a possible significant difference in grammatical accuracy from the first interview session in 2018 and the second one the following year, specifically regarding errors in clauses per 100 words, global errors and local errors, and with specific errors related to parts of speech. The investigation also focused on which forms showed the least improvement or had worsened? Descriptive statistics showed that error-free clauses/errors per 100 words decreased slightly while clauses with errors/100 words increased by one clause. Global errors showed a significant decline, while local errors increased from 97 to 158 errors. For errors related to parts of speech, a t-test confirmed there was a significant difference between the two speech corpora with more error frequency occurring in the 2019 corpus. This data highlights the difficulty in having students self-edit themselves.

Keywords: clause analysis, global vs. local errors, grammatical accuracy, L2 output, longitudinal study

Procedia PDF Downloads 113
1541 Features Reduction Using Bat Algorithm for Identification and Recognition of Parkinson Disease

Authors: P. Shrivastava, A. Shukla, K. Verma, S. Rungta

Abstract:

Parkinson's disease is a chronic neurological disorder that directly affects human gait. It leads to slowness of movement, causes muscle rigidity and tremors. Gait serve as a primary outcome measure for studies aiming at early recognition of disease. Using gait techniques, this paper implements efficient binary bat algorithm for an early detection of Parkinson's disease by selecting optimal features required for classification of affected patients from others. The data of 166 people, both fit and affected is collected and optimal feature selection is done using PSO and Bat algorithm. The reduced dataset is then classified using neural network. The experiments indicate that binary bat algorithm outperforms traditional PSO and genetic algorithm and gives a fairly good recognition rate even with the reduced dataset.

Keywords: parkinson, gait, feature selection, bat algorithm

Procedia PDF Downloads 523
1540 Functional Outcome of Speech, Voice and Swallowing Following Excision of Glomus Jugulare Tumor

Authors: B. S. Premalatha, Kausalya Sahani

Abstract:

Background: Glomus jugulare tumors arise within the jugular foramen and are commonly seen in females particularly on the left side. Surgical excision of the tumor may cause lower cranial nerve deficits. Cranial nerve involvement produces hoarseness of voice, slurred speech, and dysphagia along with other physical symptoms, thereby affecting the quality of life of individuals. Though oncological clearance is mainly emphasized on while treating these individuals, little importance is given to their communication, voice and swallowing problems, which play a crucial part in daily functioning. Objective: To examine the functions of voice, speech and swallowing outcomes of the subjects, following excision of glomus jugulare tumor. Methods: Two female subjects aged 56 and 62 years had come with a complaint of change in voice, inability to swallow and reduced clarity of speech following surgery for left glomus jugulare tumor were participants of the study. Their surgical information revealed multiple cranial nerve palsies involving the left facial, left superior and recurrent branches of the vagus nerve, left pharyngeal, left soft palate, left hypoglossal and vestibular nerves. Functional outcomes of voice, speech and swallowing were evaluated by perceptual and objective assessment procedures. Assessment included the examination of oral structures and functions, dysarthria by Frenchey dysarthria assessment, cranial nerve functions and swallowing functions. MDVP and Dr. Speech software were used to evaluate acoustic parameters of voice and quality of voice respectively. Results: The study revealed that both the subjects, subsequent to excision of glomus jugulare tumor, showed a varied picture of affected oral structure and functions, articulation, voice and swallowing functions. The cranial nerve assessment showed impairment of the vagus, hypoglossal, facial and glossopharyngeal nerves. Voice examination indicated vocal cord paralysis associated with breathy quality of voice, weak voluntary cough, reduced pitch and loudness range, and poor respiratory support. Perturbation parameters as jitter, shimmer were affected along with s/z ratio indicative of voice fold pathology. Reduced MPD(Maximum Phonation Duration) of vowels indicated that disturbed coordination between respiratory and laryngeal systems. Hypernasality was found to be a prominent feature which reduced speech intelligibility. Imprecise articulation was seen in both the subjects as the hypoglossal nerve was affected following surgery. Injury to vagus, hypoglossal, gloss pharyngeal and facial nerves disturbed the function of swallowing. All the phases of swallow were affected. Aspiration was observed before and during the swallow, confirming the oropharyngeal dysphagia. All the subsystems were affected as per Frenchey Dysarthria Assessment signifying the diagnosis of flaccid dysarthria. Conclusion: There is an observable communication and swallowing difficulty seen following excision of glomus jugulare tumor. Even with complete resection, extensive rehabilitation may be necessary due to significant lower cranial nerve dysfunction. The finding of the present study stresses the need for involvement of as speech and swallowing therapist for pre-operative counseling and assessment of functional outcomes.

Keywords: functional outcome, glomus jugulare tumor excision, multiple cranial nerve impairment, speech and swallowing

Procedia PDF Downloads 232
1539 An Early Attempt of Artificial Intelligence-Assisted Language Oral Practice and Assessment

Authors: Paul Lam, Kevin Wong, Chi Him Chan

Abstract:

Constant practicing and accurate, immediate feedback are the keys to improving students’ speaking skills. However, traditional oral examination often fails to provide such opportunities to students. The traditional, face-to-face oral assessment is often time consuming – attending the oral needs of one student often leads to the negligence of others. Hence, teachers can only provide limited opportunities and feedback to students. Moreover, students’ incentive to practice is also reduced by their anxiety and shyness in speaking the new language. A mobile app was developed to use artificial intelligence (AI) to provide immediate feedback to students’ speaking performance as an attempt to solve the above-mentioned problems. Firstly, it was thought that online exercises would greatly increase the learning opportunities of students as they can now practice more without the needs of teachers’ presence. Secondly, the automatic feedback provided by the AI would enhance students’ motivation to practice as there is an instant evaluation of their performance. Lastly, students should feel less anxious and shy compared to directly practicing oral in front of teachers. Technically, the program made use of speech-to-text functions to generate feedback to students. To be specific, the software analyzes students’ oral input through certain speech-to-text AI engine and then cleans up the results further to the point that can be compared with the targeted text. The mobile app has invited English teachers for the pilot use and asked for their feedback. Preliminary trials indicated that the approach has limitations. Many of the users’ pronunciation were automatically corrected by the speech recognition function as wise guessing is already integrated into many of such systems. Nevertheless, teachers have confidence that the app can be further improved for accuracy. It has the potential to significantly improve oral drilling by giving students more chances to practice. Moreover, they believe that the success of this mobile app confirms the potential to extend the AI-assisted assessment to other language skills, such as writing, reading, and listening.

Keywords: artificial Intelligence, mobile learning, oral assessment, oral practice, speech-to-text function

Procedia PDF Downloads 88
1538 A Transformer-Based Question Answering Framework for Software Contract Risk Assessment

Authors: Qisheng Hu, Jianglei Han, Yue Yang, My Hoa Ha

Abstract:

When a company is considering purchasing software for commercial use, contract risk assessment is critical to identify risks to mitigate the potential adverse business impact, e.g., security, financial and regulatory risks. Contract risk assessment requires reviewers with specialized knowledge and time to evaluate the legal documents manually. Specifically, validating contracts for a software vendor requires the following steps: manual screening, interpreting legal documents, and extracting risk-prone segments. To automate the process, we proposed a framework to assist legal contract document risk identification, leveraging pre-trained deep learning models and natural language processing techniques. Given a set of pre-defined risk evaluation problems, our framework utilizes the pre-trained transformer-based models for question-answering to identify risk-prone sections in a contract. Furthermore, the question-answering model encodes the concatenated question-contract text and predicts the start and end position for clause extraction. Due to the limited labelled dataset for training, we leveraged transfer learning by fine-tuning the models with the CUAD dataset to enhance the model. On a dataset comprising 287 contract documents and 2000 labelled samples, our best model achieved an F1 score of 0.687.

Keywords: contract risk assessment, NLP, transfer learning, question answering

Procedia PDF Downloads 112
1537 Problems in Computational Phylogenetics: The Germano-Italo-Celtic Clade

Authors: Laura Mclean

Abstract:

A recurring point of interest in computational phylogenetic analysis of Indo-European family trees is the inference of a Germano-Italo-Celtic clade in some versions of the trees produced. The presence of this clade in the models is intriguing as there is little evidence for innovations shared among Germanic, Italic, and Celtic, the evidence generally used in the traditional method to construct a subgroup. One source of this unexpected outcome could be the input to the models. The datasets in the various models used so far, for the most part, take as their basis the Swadesh list, a list compiled by Morris Swadesh and then revised several times, containing up to 207 words that he believed were resistant to change among languages. The judgments made by Swadesh for this list, however, were subjective and based on his intuition rather than rigorous analysis. Some scholars used the Swadesh 200 list as the basis for their Indo-European dataset and made cognacy judgements for each of the words on the list. Another dataset is largely based on the Swadesh 207 list as well although the authors include additional lexical and non-lexical data, and they implement ‘split coding’ to deal with cases of polymorphic characters. A different team of scholars uses a different dataset, IECoR, which combines several different lists, one of which is the Swadesh 200 list. In fact, the Swadesh list is used in some form in every study surveyed and each dataset has three words that, when they are coded as cognates, seemingly contribute to the inference of a Germano-Italo-Celtic clade which could happen due to these clades sharing three words among only themselves. These three words are ‘fish’, ‘flower’, and ‘man’ (in the case of ‘man’, one dataset includes Lithuanian in the cognacy coding and removes the word ‘man’ from the screened data). This collection of cognates shared among Germanic, Italic, and Celtic that were deemed important enough to be included on the Swadesh list, without the ability to account for possible reasons for shared cognates that are not shared innovations, gives an impression of affinity between the Germanic, Celtic, and Italic branches without adequate methodological support. However, by changing how cognacy is defined (ie. root cognates, borrowings vs inherited cognates etc.), we will be able to identify whether these three cognates are significant enough to infer a clade for Germanic, Celtic, and Italic. This paper examines the question of what definition of cognacy should be used for phylogenetic datasets by examining the Germano-Italo-Celtic clade as a case study and offers insights into the reconstruction of a Germano-Italo-Celtic clade.

Keywords: historical, computational, Italo-Celtic, Germanic

Procedia PDF Downloads 34
1536 Improving Chest X-Ray Disease Detection with Enhanced Data Augmentation Using Novel Approach of Diverse Conditional Wasserstein Generative Adversarial Networks

Authors: Malik Muhammad Arslan, Muneeb Ullah, Dai Shihan, Daniyal Haider, Xiaodong Yang

Abstract:

Chest X-rays are instrumental in the detection and monitoring of a wide array of diseases, including viral infections such as COVID-19, tuberculosis, pneumonia, lung cancer, and various cardiac and pulmonary conditions. To enhance the accuracy of diagnosis, artificial intelligence (AI) algorithms, particularly deep learning models like Convolutional Neural Networks (CNNs), are employed. However, these deep learning models demand a substantial and varied dataset to attain optimal precision. Generative Adversarial Networks (GANs) can be employed to create new data, thereby supplementing the existing dataset and enhancing the accuracy of deep learning models. Nevertheless, GANs have their limitations, such as issues related to stability, convergence, and the ability to distinguish between authentic and fabricated data. In order to overcome these challenges and advance the detection and classification of CXR normal and abnormal images, this study introduces a distinctive technique known as DCWGAN (Diverse Conditional Wasserstein GAN) for generating synthetic chest X-ray (CXR) images. The study evaluates the effectiveness of this Idiosyncratic DCWGAN technique using the ResNet50 model and compares its results with those obtained using the traditional GAN approach. The findings reveal that the ResNet50 model trained on the DCWGAN-generated dataset outperformed the model trained on the classic GAN-generated dataset. Specifically, the ResNet50 model utilizing DCWGAN synthetic images achieved impressive performance metrics with an accuracy of 0.961, precision of 0.955, recall of 0.970, and F1-Measure of 0.963. These results indicate the promising potential for the early detection of diseases in CXR images using this Inimitable approach.

Keywords: CNN, classification, deep learning, GAN, Resnet50

Procedia PDF Downloads 60
1535 Machine Learning Methods for Network Intrusion Detection

Authors: Mouhammad Alkasassbeh, Mohammad Almseidin

Abstract:

Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.

Keywords: IDS, DDoS, MLP, KDD

Procedia PDF Downloads 222
1534 Auditory and Language Skills Development after Cochlear Implantation in Children with Multiple Disabilities

Authors: Tamer Mesallam, Medhat Yousef, Ayna Almasaad

Abstract:

BACKGROUND: Cochlear implantation (CI) in children with additional disabilities can be a fundamental and supportive intervention. Although, there may be some positive impacts of CI on children with multiple disabilities such as better outcomes of communication skills, development, and quality of life, the families of those children complain from the post-implant habilitation efforts that considered as a burden. OBJECTIVE: To investigate the outcomes of CI children with different co-disabilities through using the Meaningful Auditory Integration Scale (MAIS) and the Meaningful Use of Speech Scale (MUSS) as outcome measurement tools. METHODS: The study sample comprised 25 hearing-impaired children with co-disability who received cochlear implantation. Age and gender-matched control group of 25 cochlear-implanted children without any other disability has been also included. The participants' auditory skills and speech outcomes were assessed using MAIS and MUSS tests. RESULTS: There was a statistically significant difference in the different outcomes measure between the two groups. However, the outcomes of some multiple disabilities subgroups were comparable to the control group. Around 40% of the participants with co-disabilities experienced advancement in their methods of communication from behavior to oral mode. CONCLUSION: Cochlear-implanted children with multiple disabilities showed variable degrees of auditory and speech outcomes. The degree of benefits depends on the type of the co-disability. Long-term follow-up is recommended for those children.

Keywords: children with disabilities, Cochlear implants, hearing impairment, language development

Procedia PDF Downloads 102
1533 Haiti and Power Symbolic: An Analysis Understanding of the Impact of the Presidential Political Speeches

Authors: Marc Arthur Bien Aimé, Julio da Silveira Moreira

Abstract:

This study examines the political speech in Haiti over the course of the decade 2011-2021, focusing on the speeches of the presidents Michel J. Martelly and Jovenel Moïse and their impacts on their awareness collective. In using a qualitative approach, we have analyzed the speech of the president pronounced in response to the political instability of countries, as well as interviews with a group of 20 Haitians living in Port- Au-Prince. Our results put in evidence their complex relationship between politics, awareness collective, and the influence of the powers imperialists. We show that the situation in Haiti's disastrous social and political situation is driven by personal political interests and the absence of a state political project. Moreover, the speeches of the president’s analysis are meaningless, transforming concepts such as social progress and justice in simple words. This political rhetoric contributes to the domination symbolic of the population of Haitian. This study is also linked to the theme “Constitutions, processes democratic and critical of the state in Latin America,” emphasizing the importance of analysis of political speech to understand the complexities of the democratic process and criticism of the State in their Latin American region. We suggest future research to deepen our understanding of these political dynamics and their impact on public policies and developments of the constitutions throughout Latin America.

Keywords: political discourse, conscience collective, inequality social, democratic processes, constitutions, Haiti

Procedia PDF Downloads 37
1532 Phonological Variation in the Speech of Grade 1 Teachers in Select Public Elementary Schools in the Philippines

Authors: M. Leonora D. Guerrero

Abstract:

The study attempted to uncover the most and least frequent phonological variation evident in the speech patterns of grade 1 teachers in select public elementary schools in the Philippines. It also determined the lectal description of the participants based on Tayao’s consonant charts for American and Philippine English. Descriptive method was utilized. A total of 24 grade 1 teachers participated in the study. The instrument used was word list. Each column in the word list is represented by words with the target consonant phonemes: labiodental fricatives f/ and /v/ and lingua-alveolar fricative /z/. These phonemes were in the initial, medial, and final positions, respectively. Findings of the study revealed that the most frequent variation happened when the participants read words with /z/ in the final position while the least frequent variation happened when the participants read words with /z/ in the initial position. The study likewise proved that the grade 1 teachers exhibited the segmental features of both the mesolect and basilect. Based on these results, it is suggested that teachers of English in the Philippines must aspire to manifest the features of the mesolect, if not, the acrolect since it is expected of the academicians not to be displaying the phonological features of the acrolects since this variety is only used by the 'uneducated.' This is especially so with grade 1 teachers who are often mimicked by their students who classify their speech as the 'standard.'

Keywords: consonant phonemes, lectal description, Philippine English, phonological variation

Procedia PDF Downloads 191
1531 Improving Lane Detection for Autonomous Vehicles Using Deep Transfer Learning

Authors: Richard O’Riordan, Saritha Unnikrishnan

Abstract:

Autonomous Vehicles (AVs) are incorporating an increasing number of ADAS features, including automated lane-keeping systems. In recent years, many research papers into lane detection algorithms have been published, varying from computer vision techniques to deep learning methods. The transition from lower levels of autonomy defined in the SAE framework and the progression to higher autonomy levels requires increasingly complex models and algorithms that must be highly reliable in their operation and functionality capacities. Furthermore, these algorithms have no room for error when operating at high levels of autonomy. Although the current research details existing computer vision and deep learning algorithms and their methodologies and individual results, the research also details challenges faced by the algorithms and the resources needed to operate, along with shortcomings experienced during their detection of lanes in certain weather and lighting conditions. This paper will explore these shortcomings and attempt to implement a lane detection algorithm that could be used to achieve improvements in AV lane detection systems. This paper uses a pre-trained LaneNet model to detect lane or non-lane pixels using binary segmentation as the base detection method using an existing dataset BDD100k followed by a custom dataset generated locally. The selected roads will be modern well-laid roads with up-to-date infrastructure and lane markings, while the second road network will be an older road with infrastructure and lane markings reflecting the road network's age. The performance of the proposed method will be evaluated on the custom dataset to compare its performance to the BDD100k dataset. In summary, this paper will use Transfer Learning to provide a fast and robust lane detection algorithm that can handle various road conditions and provide accurate lane detection.

Keywords: ADAS, autonomous vehicles, deep learning, LaneNet, lane detection

Procedia PDF Downloads 83
1530 Speech Emotion Recognition: A DNN and LSTM Comparison in Single and Multiple Feature Application

Authors: Thiago Spilborghs Bueno Meyer, Plinio Thomaz Aquino Junior

Abstract:

Through speech, which privileges the functional and interactive nature of the text, it is possible to ascertain the spatiotemporal circumstances, the conditions of production and reception of the discourse, the explicit purposes such as informing, explaining, convincing, etc. These conditions allow bringing the interaction between humans closer to the human-robot interaction, making it natural and sensitive to information. However, it is not enough to understand what is said; it is necessary to recognize emotions for the desired interaction. The validity of the use of neural networks for feature selection and emotion recognition was verified. For this purpose, it is proposed the use of neural networks and comparison of models, such as recurrent neural networks and deep neural networks, in order to carry out the classification of emotions through speech signals to verify the quality of recognition. It is expected to enable the implementation of robots in a domestic environment, such as the HERA robot from the RoboFEI@Home team, which focuses on autonomous service robots for the domestic environment. Tests were performed using only the Mel-Frequency Cepstral Coefficients, as well as tests with several characteristics of Delta-MFCC, spectral contrast, and the Mel spectrogram. To carry out the training, validation and testing of the neural networks, the eNTERFACE’05 database was used, which has 42 speakers from 14 different nationalities speaking the English language. The data from the chosen database are videos that, for use in neural networks, were converted into audios. It was found as a result, a classification of 51,969% of correct answers when using the deep neural network, when the use of the recurrent neural network was verified, with the classification with accuracy equal to 44.09%. The results are more accurate when only the Mel-Frequency Cepstral Coefficients are used for the classification, using the classifier with the deep neural network, and in only one case, it is possible to observe a greater accuracy by the recurrent neural network, which occurs in the use of various features and setting 73 for batch size and 100 training epochs.

Keywords: emotion recognition, speech, deep learning, human-robot interaction, neural networks

Procedia PDF Downloads 142
1529 Pragmatic Competence of Jordanian EFL Learners

Authors: Dina Mahmoud Hammouri

Abstract:

The study investigates the Jordanian EFL learners’ pragmatic competence through their production of the speech acts of responding to requests, making suggestions, making threats and expressing farewells. The sample of the study consists of 130 Jordanian EFL learners and native speakers. 2600 responses were collected through a Discourse Completion Test (DCT). The findings of the study revealed that the tested students showed similarities and differences in performing the strategies of four speech acts. Differences in the students’ performances led to pragmatic failure instances. The pragmatic failure committed by students refers to a lack of linguistic competence (i.e., pragmalinguistic failure), sociocultural differences and pragmatic transfer (i.e., sociopragmatic failure). EFL learners employed many mechanisms to maintain their communicative competence; the analysis of the test on speech acts showed learners’ tendency towards using particular strategies, resorting to modify strategies and relating them to their grammatical competence, prefabrication, performing long forms, buffing and transfer. The results were also suggestive of the learners’ lack of pragmalinguistic and sociopragmatic knowledge. The implications of this study are for language teachers to teach interlanguage pragmatics explicitly in EFL contexts to draw learners’ attention to both pragmalinguistic and sociopragmatic features, pay more attention to these areas and allocate more time and practice to solve learners’ problems in these areas. The implication of this study is also for pedagogical material designers to provide sufficient and well-organized pragmatic input.

Keywords: pragmatic failure, Jordanian EFL learner, sociopragmatic competence, pragmalinguistic competence

Procedia PDF Downloads 55
1528 Understanding Cognitive Fatigue From FMRI Scans With Self-supervised Learning

Authors: Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Fillia Makedon, Glenn Wylie

Abstract:

Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that records neural activations in the brain by capturing the blood oxygen level in different regions based on the task performed by a subject. Given fMRI data, the problem of predicting the state of cognitive fatigue in a person has not been investigated to its full extent. This paper proposes tackling this issue as a multi-class classification problem by dividing the state of cognitive fatigue into six different levels, ranging from no-fatigue to extreme fatigue conditions. We built a spatio-temporal model that uses convolutional neural networks (CNN) for spatial feature extraction and a long short-term memory (LSTM) network for temporal modeling of 4D fMRI scans. We also applied a self-supervised method called MoCo (Momentum Contrast) to pre-train our model on a public dataset BOLD5000 and fine-tuned it on our labeled dataset to predict cognitive fatigue. Our novel dataset contains fMRI scans from Traumatic Brain Injury (TBI) patients and healthy controls (HCs) while performing a series of N-back cognitive tasks. This method establishes a state-of-the-art technique to analyze cognitive fatigue from fMRI data and beats previous approaches to solve this problem.

Keywords: fMRI, brain imaging, deep learning, self-supervised learning, contrastive learning, cognitive fatigue

Procedia PDF Downloads 166