Search results for: speech dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1868

Search results for: speech dataset

1028 Wireless Sensor Anomaly Detection Using Soft Computing

Authors: Mouhammd Alkasassbeh, Alaa Lasasmeh

Abstract:

We live in an era of rapid development as a result of significant scientific growth. Like other technologies, wireless sensor networks (WSNs) are playing one of the main roles. Based on WSNs, ZigBee adds many features to devices, such as minimum cost and power consumption, and increasing the range and connect ability of sensor nodes. ZigBee technology has come to be used in various fields, including science, engineering, and networks, and even in medicinal aspects of intelligence building. In this work, we generated two main datasets, the first being based on tree topology and the second on star topology. The datasets were evaluated by three machine learning (ML) algorithms: J48, meta.j48 and multilayer perceptron (MLP). Each topology was classified into normal and abnormal (attack) network traffic. The dataset used in our work contained simulated data from network simulation 2 (NS2). In each database, the Bayesian network meta.j48 classifier achieved the highest accuracy level among other classifiers, of 99.7% and 99.2% respectively.

Keywords: IDS, Machine learning, WSN, ZigBee technology

Procedia PDF Downloads 542
1027 Naïve Bayes: A Classical Approach for the Epileptic Seizures Recognition

Authors: Bhaveek Maini, Sanjay Dhanka, Surita Maini

Abstract:

Electroencephalography (EEG) is used to classify several epileptic seizures worldwide. It is a very crucial task for the neurologist to identify the epileptic seizure with manual EEG analysis, as it takes lots of effort and time. Human error is always at high risk in EEG, as acquiring signals needs manual intervention. Disease diagnosis using machine learning (ML) has continuously been explored since its inception. Moreover, where a large number of datasets have to be analyzed, ML is acting as a boon for doctors. In this research paper, authors proposed two different ML models, i.e., logistic regression (LR) and Naïve Bayes (NB), to predict epileptic seizures based on general parameters. These two techniques are applied to the epileptic seizures recognition dataset, available on the UCI ML repository. The algorithms are implemented on an 80:20 train test ratio (80% for training and 20% for testing), and the performance of the model was validated by 10-fold cross-validation. The proposed study has claimed accuracy of 81.87% and 95.49% for LR and NB, respectively.

Keywords: epileptic seizure recognition, logistic regression, Naïve Bayes, machine learning

Procedia PDF Downloads 53
1026 Kitchenary Metaphors In Hindi-urdu: A Cognitive Analysis

Authors: Bairam Khan, Premlata Vaishnava

Abstract:

The ability to conceptualize one entity in terms of another allows us to communicate through metaphors. This central feature of human cognition has evolved with the development of language, and the processing of metaphors is without any conscious appraisal and is quite effortless. South Asians, like other speech communities, have been using the kitchenary [culinary] metaphor in a very simple yet interesting way and are known for bringing into new and unique constellations wherever they are. This composite feature of our language is used to communicate in a precise and compact manner and maneuvers the expression. The present study explores the role of kitchenary metaphors in the making and shaping of idioms by applying Cognitive Metaphor Theories. Drawing on examples from a corpus of adverts, print, and electronic media, the study looks at the metaphorical language used by real people in real situations. The overarching theme throughout the course is that kitchenary metaphors are powerful tools of expression in Hindi-Urdu.

Keywords: cognitive metaphor theory, source domain, target domain, signifier- signified, kitchenary, ethnocultural elements of south asia and hindi- urdu language

Procedia PDF Downloads 76
1025 Early Stage Suicide Ideation Detection Using Supervised Machine Learning and Neural Network Classifier

Authors: Devendra Kr Tayal, Vrinda Gupta, Aastha Bansal, Khushi Singh, Sristi Sharma, Hunny Gaur

Abstract:

In today's world, suicide is a serious problem. In order to save lives, early suicide attempt detection and prevention should be addressed. A good number of at-risk people utilize social media platforms to talk about their issues or find knowledge on related chores. Twitter and Reddit are two of the most common platforms that are used for expressing oneself. Extensive research has already been done in this field. Through supervised classification techniques like Nave Bayes, Bernoulli Nave Bayes, and Multiple Layer Perceptron on a Reddit dataset, we demonstrate the early recognition of suicidal ideation. We also performed comparative analysis on these approaches and used accuracy, recall score, F1 score, and precision score for analysis.

Keywords: machine learning, suicide ideation detection, supervised classification, natural language processing

Procedia PDF Downloads 87
1024 Breast Cancer Prediction Using Score-Level Fusion of Machine Learning and Deep Learning Models

Authors: Sam Khozama, Ali M. Mayya

Abstract:

Breast cancer is one of the most common types in women. Early prediction of breast cancer helps physicians detect cancer in its early stages. Big cancer data needs a very powerful tool to analyze and extract predictions. Machine learning and deep learning are two of the most efficient tools for predicting cancer based on textual data. In this study, we developed a fusion model of two machine learning and deep learning models. To obtain the final prediction, Long-Short Term Memory (LSTM) and ensemble learning with hyper parameters optimization are used, and score-level fusion is used. Experiments are done on the Breast Cancer Surveillance Consortium (BCSC) dataset after balancing and grouping the class categories. Five different training scenarios are used, and the tests show that the designed fusion model improved the performance by 3.3% compared to the individual models.

Keywords: machine learning, deep learning, cancer prediction, breast cancer, LSTM, fusion

Procedia PDF Downloads 157
1023 Testing the Capital Structure Behavior of Malaysian Firms: Shariah vs. Non-Shariah Compliant

Authors: Asyraf Abdul Halim, Mohd Edil Abd Sukor, Obiyathulla Ismath Bacha

Abstract:

This paper attempts to investigate the capital structure behavior of Shariah compliant firms of various levels as well those firms who are consistently Shariah non-compliant in Malaysia. The paper utilizes a unique dataset of firms of the heterogeneous level of Shariah-compliancy status over a 20 year period from the year 1997 to 2016. The paper focuses on the effects of dynamic forces behind capital structure variation such as the optimal capital structure behavior based on the trade-off, pecking order, market timing and firmly fixed effect models of capital structure. This study documents significant evidence in support of the trade-off theory with a high speed of adjustment (SOA) as well as for the time-invariant firm fixed effects across all Shariah compliance group.

Keywords: capital structure, market timing, trade-off theory, equity risk premium, Shariah-compliant firms

Procedia PDF Downloads 306
1022 The Social Origin Pay Gap in the UK Household Longitudinal Study

Authors: Michael Vallely

Abstract:

This paper uses data from waves 1 to 10 (2009-2019) of the UK Household Longitudinal Study to examine the social origin pay gap in the UK labour market. We find that regardless of how we proxy social origin, whether it be using the dominance approach, total parental occupation, parental education, total parental education, or the higher parental occupation and higher parental education, the results have one thing in common; in all cases, we observe a significant social origin pay gap for those from the lower social origins with the largest pay gap observed for those from the ‘lowest’ social origin. The results may indicate that when we consider the occupational status and education of both parents, previous estimates of social origin pay gaps and the number of individuals affected may have been underestimated. We also observe social origin pay gaps within educational attainment groups, such as degree holders, and within professional and managerial occupations. Therefore, this paper makes a valuable contribution to the social origin pay gap literature as it provides empirical evidence of a social origin pay gap using a large-scale UK dataset and challenges the argument that education is the great ‘social leveller’.

Keywords: social class, social origin, pay gaps, wage inequality

Procedia PDF Downloads 141
1021 OPEN-EmoRec-II-A Multimodal Corpus of Human-Computer Interaction

Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue

Abstract:

OPEN-EmoRecII is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (mimic reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and mimic annotations.

Keywords: open multimodal emotion corpus, annotated labels, intelligent interaction

Procedia PDF Downloads 413
1020 Artificial Intelligence Methods in Estimating the Minimum Miscibility Pressure Required for Gas Flooding

Authors: Emad A. Mohammed

Abstract:

Utilizing the capabilities of Data Mining and Artificial Intelligence in the prediction of the minimum miscibility pressure (MMP) required for multi-contact miscible (MCM) displacement of reservoir petroleum by hydrocarbon gas flooding using Fuzzy Logic models and Artificial Neural Network models will help a lot in giving accurate results. The factors affecting the (MMP) as it is proved from the literature and from the dataset are as follows: XC2-6: Intermediate composition in the oil-containing C2-6, CO2 and H2S, in mole %, XC1: Amount of methane in the oil (%),T: Temperature (°C), MwC7+: Molecular weight of C7+ (g/mol), YC2+: Mole percent of C2+ composition in injected gas (%), MwC2+: Molecular weight of C2+ in injected gas. Fuzzy Logic and Neural Networks have been used widely in prediction and classification, with relatively high accuracy, in different fields of study. It is well known that the Fuzzy Inference system can handle uncertainty within the inputs such as in our case. The results of this work showed that our proposed models perform better with higher performance indices than other emprical correlations.

Keywords: MMP, gas flooding, artificial intelligence, correlation

Procedia PDF Downloads 139
1019 An Interdisciplinary Approach to Investigating Style: A Case Study of a Chinese Translation of Gilbert’s (2006) Eat Pray Love

Authors: Elaine Y. L. Ng

Abstract:

Elizabeth Gilbert’s (2006) biography Eat, Pray, Love describes her travels to Italy, India, and Indonesia after a painful divorce. The author’s experiences with love, loss, search for happiness, and meaning have resonated with a huge readership. As regards the translation of Gilbert’s (2006) Eat, Pray, Love into Chinese, it was first translated by a Taiwanese translator He Pei-Hua and published in Taiwan in 2007 by Make Boluo Wenhua Chubanshe with the fairly catching title “Enjoy! Traveling Alone.” The same translation was translocated to China, republished in simplified Chinese characters by Shanxi Shifan Daxue Chubanshe in 2008 and renamed in China, entitled “To Be a Girl for the Whole Life.” Later on, the same translation in simplified Chinese characters was reprinted by Hunan Wenyi Chubanshe in 2013. This study employs Munday’s (2002) systemic model for descriptive translation studies to investigate the translation of Gilbert’s (2006) Eat, Pray, Love into Chinese by the Taiwanese translator Hu Pei-Hua. It employs an interdisciplinary approach, combining systemic functional linguistics and corpus stylistics with sociohistorical research within a descriptive framework to study the translator’s discursive presence in the text. The research consists of three phases. The first phase is to locate the target text within its socio-cultural context. The target-text context concerning the para-texts, readers’ responses, and the publishers’ orientation will be explored. The second phase is to compare the source text and the target text for the categorization of translation shifts by using the methodological tools of systemic functional linguistics and corpus stylistics. The investigation concerns the rendering of mental clauses and speech and thought presentation. The final phase is an explanation of the causes of translation shifts. The linguistic findings are related to the extra-textual information collected in an effort to ascertain the motivations behind the translator’s choices. There exist sets of possible factors that may have contributed to shaping the textual features of the given translation within a specific socio-cultural context. The study finds that the translator generally reproduces the mental clauses and speech and thought presentation closely according to the original. Nevertheless, the language of the translation has been widely criticized to be unidiomatic and stiff, losing the elegance of the original. In addition, the several Chinese translations of the given text produced by one Taiwanese and two Chinese publishers are basically the same. They are repackaged slightly differently, mainly with the change of the book cover and its captions for each version. By relating the textual findings to the extra-textual data of the study, it is argued that the popularity of the Chinese translation of Gilbert’s (2006) Eat, Pray, Love may not be attributed to the quality of the translation. Instead, it may have to do with the way the work is promoted strategically by the social media manipulated by the four e-bookstores promoting and selling the book online in China.

Keywords: chinese translation of eat pray love, corpus stylistics, motivations for translation shifts, systemic approach to translation studies

Procedia PDF Downloads 173
1018 Investigation and Analysis of Vortex-Induced Vibrations in Sliding Gate Valves Using Computational Fluid Dynamics

Authors: Kianoosh Ahadi, Mustafa Ergil

Abstract:

In this study, the event of vibrations caused by vortexes and the distribution of induced hydrodynamic forces due to vortexes on the sliding gate valves has been investigated. For this reason, a sliding valve with the help of computational fluid dynamics (CFD) software was simulated in two-dimensional )2D(, where the flow and turbulence equations were solved for three different valve openings (full, half, and 16.7 %) models. The variety of vortexes formed within the vicinity of the valve structure was investigated based on time where the trend of fluctuations and their occurrence regions have been detected. From the gathered solution dataset of the numerical simulations, the pressure coefficient (CP), the lift force coefficient (CL), the drag force coefficient (CD), and the momentum coefficient due to hydrodynamic forces (CM) were examined, and relevant figures were generated were from these results, the vortex-induced vibrations were analyzed.

Keywords: induced vibrations, computational fluid dynamics, sliding gate valves, vortexes

Procedia PDF Downloads 113
1017 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: data augmentation, mutex task generation, meta-learning, text classification.

Procedia PDF Downloads 89
1016 Deep Reinforcement Learning with Leonard-Ornstein Processes Based Recommender System

Authors: Khalil Bachiri, Ali Yahyaouy, Nicoleta Rogovschi

Abstract:

Improved user experience is a goal of contemporary recommender systems. Recommender systems are starting to incorporate reinforcement learning since it easily satisfies this goal of increasing a user’s reward every session. In this paper, we examine the most effective Reinforcement Learning agent tactics on the Movielens (1M) dataset, balancing precision and a variety of recommendations. The absence of variability in final predictions makes simplistic techniques, although able to optimize ranking quality criteria, worthless for consumers of the recommendation system. Utilizing the stochasticity of Leonard-Ornstein processes, our suggested strategy encourages the agent to investigate its surroundings. Research demonstrates that raising the NDCG (Discounted Cumulative Gain) and HR (HitRate) criterion without lowering the Ornstein-Uhlenbeck process drift coefficient enhances the diversity of suggestions.

Keywords: recommender systems, reinforcement learning, deep learning, DDPG, Leonard-Ornstein process

Procedia PDF Downloads 136
1015 Language and Power Relations in Selected Political Crisis Speeches in Nigeria: A Critical Discourse Analysis

Authors: Isaiah Ifeanyichukwu Agbo

Abstract:

Human speech is capable of serving many purposes. Power and control are not always exercised overtly by linguistic acts, but maybe enacted and exercised in the myriad of taken-for-granted actions of everyday life. Domination, power control, discrimination and mind control exist in human speech and may lead to asymmetrical power relations. In discourse, there are persuasive and manipulative linguistic acts that serve to establish solidarity and identification with the 'we group' and polarize with the 'they group'. Political discourse is crafted to defend and promote the problematic narrative of outright controversial events in a nation’s history thereby sustaining domination, marginalization, manipulation, inequalities and injustices, often without the dominated and marginalized group being aware of them. They are designed and positioned to serve the political and social needs of the producers. Political crisis speeches in Nigeria, just like in other countries concentrate on positive self-image, de-legitimization of political opponents, reframing accusation to one’s advantage, redefining problematic terms and adopting reversal strategy. In most cases, the people are ignorant of the hidden ideological positions encoded in the text. Few researches have been conducted adopting the frameworks of critical discourse analysis and systemic functional linguistics to investigate this situation in the political crisis speeches in Nigeria. In this paper, we focus attention on the analyses of the linguistic, semantic, and ideological elements in selected political crisis speeches in Nigeria to investigate if they create and sustain unequal power relations and manipulative tendencies from the perspectives of Critical Discourse Analysis (CDA) and Systemic Functional Linguistics (SFL). Critical Discourse Analysis unpacks both opaque and transparent structural relationships of power dominance, power relations and control as manifested in language. Critical discourse analysis emerged from a critical theory of language study which sees the use of language as a form of social practice where social relations are reproduced or contested and different interests are served. Systemic function linguistics relates the structure of texts to their function. Fairclough’s model of CDA and Halliday’s systemic functional approach to language study are adopted in this paper. This paper probes into language use that perpetuates inequalities. This study demystifies the hidden implicature of the selected political crisis speeches and reveals the existence of information that is not made explicit in what the political actors actually say. The analysis further reveals the ideological configurations present in the texts. These ideological standpoints are the basis for naturalizing implicit ideologies and hegemonic influence in the texts. The analyses of the texts further uncovered the linguistic and discursive strategies deployed by text producers to manipulate the unsuspecting members of the public both mentally and conceptually in order to enact, sustain and maintain unhealthy power relations at crisis times in the Nigerian political history.

Keywords: critical discourse analysis, language, political crisis, power relations, systemic functional linguistics

Procedia PDF Downloads 338
1014 Cervical Cell Classification Using Random Forests

Authors: Dalwinder Singh, Amandeep Verma, Manpreet Kaur, Birmohan Singh

Abstract:

The detection of pre-cancerous changes using a Pap smear test of cervical cell is the important step for the early diagnosis of cervical cancer. The Pap smear test consists of a sample of human cells taken from the cervix which are analysed to detect cancerous and pre-cancerous stage of the given subject. The manual analysis of these cells is labor intensive and time consuming process which relies on expert cytotechnologist. In this paper, a computer assisted system for the automated analysis of the cervical cells has been proposed. We propose a morphology based approach to the nucleus detection and segmentation of the cytoplasmic region of the given single or multiple overlapped cell. Further, various texture and region based features are calculated from these cells to classify these into normal and abnormal cell. Experimental results on public available dataset show that our system has achieved satisfactory success rate.

Keywords: cervical cancer, cervical tissue, mathematical morphology, texture features

Procedia PDF Downloads 521
1013 Topic-to-Essay Generation with Event Element Constraints

Authors: Yufen Qin

Abstract:

Topic-to-Essay generation is a challenging task in Natural language processing, which aims to generate novel, diverse, and topic-related text based on user input. Previous research has overlooked the generation of articles under the constraints of event elements, resulting in issues such as incomplete event elements and logical inconsistencies in the generated results. To fill this gap, this paper proposes an event-constrained approach for a topic-to-essay generation that enforces the completeness of event elements during the generation process. Additionally, a language model is employed to verify the logical consistency of the generated results. Experimental results demonstrate that the proposed model achieves a better BLEU-2 score and performs better than the baseline in terms of subjective evaluation on a real dataset, indicating its capability to generate higher-quality topic-related text.

Keywords: event element, language model, natural language processing, topic-to-essay generation.

Procedia PDF Downloads 232
1012 Key Findings on Rapid Syntax Screening Test for Children

Authors: Shyamani Hettiarachchi, Thilini Lokubalasuriya, Shakeela Saleem, Dinusha Nonis, Isuru Dharmaratne, Lakshika Udugama

Abstract:

Introduction: Late identification of language difficulties in children could result in long-term negative consequences for communication, literacy and self-esteem. This highlights the need for early identification and intervention for speech, language and communication difficulties. Speech and language therapy is a relatively new profession in Sri Lanka and at present, there are no formal standardized screening tools to assess language skills in Sinhala-speaking children. The development and validation of a short, accurate screening tool to enable the identification of children with syntactic difficulties in Sinhala is a current need. Aims: 1) To develop test items for a Sinhala Syntactic Structures (S3 Short Form) test on children aged between 3;0 to 5;0 years 2) To validate the test of Sinhala Syntactic Structures (S3 Short Form) on children aged between 3; 0 to 5; 0 years Methods: The Sinhala Syntactic Structures (S3 Short Form) was devised based on the Renfrew Action Picture Test. As Sinhala contains post-positions in contrast to English, the principles of the Renfrew Action Picture Test were followed to gain an information score and a grammar score but the test devised reflected the linguistic-specificity and complexity of Sinhala and the pictures were in keeping with the culture of the country. This included the dative case marker ‘to give something to her’ (/ejɑ:ʈə/ meaning ‘to her’), the instrumental case marker ‘to get something from’ (/ejɑ:gən/ meaning ‘from him’ or /gɑhən/ meaning ‘from the tree’), possessive noun (/ɑmmɑge:/ meaning ‘mother’s’ or /gɑhe:/ meaning ‘of the tree’ or /male:/ meaning ‘of the flower’) and plural markers (/bɑllɑ:/ bɑllo:/ meaning ‘dog/dogs’, /mɑlə/mɑl/ meaning ‘flower/flowers’, /gɑsə/gɑs/ meaning ‘tree/trees’ and /wɑlɑ:kulə/wɑlɑ:kulu/ meaning ‘cloud/clouds’). The picture targets included socio-culturally appropriate scenes of the Sri Lankan New Year celebration, elephant procession and the Buddhist ‘Wesak’ ceremony. The test was piloted with a group of 60 participants and necessary changes made. In phase 1, the test was administered to 100 Sinhala-speaking children aged between 3; 0 and 5; 0 years in one district. In this presentation on phase 2, the test was administered to another 100 Sinhala-speaking children aged between 3; 0 to 5; 0 in three districts. In phase 2, the selection of the test items was assessed via measures of content validity, test-retest reliability and inter-rater reliability. The age of acquisition of each syntactic structure was determined using content and grammar scores which were statistically analysed using t-tests and one-way ANOVAs. Results: High percentage agreement was found on test-retest reliability on content validity and Pearson correlation measures and on inter-rater reliability. As predicted, there was a statistically significant influence of age on the production of syntactic structures at p<0.05. Conclusions: As the target test items included generated the information and the syntactic structures expected, the test could be used as a quick syntactic screening tool with preschool children.

Keywords: Sinhala, screening, syntax, language

Procedia PDF Downloads 337
1011 A Context-Sensitive Algorithm for Media Similarity Search

Authors: Guang-Ho Cha

Abstract:

This paper presents a context-sensitive media similarity search algorithm. One of the central problems regarding media search is the semantic gap between the low-level features computed automatically from media data and the human interpretation of them. This is because the notion of similarity is usually based on high-level abstraction but the low-level features do not sometimes reflect the human perception. Many media search algorithms have used the Minkowski metric to measure similarity between image pairs. However those functions cannot adequately capture the aspects of the characteristics of the human visual system as well as the nonlinear relationships in contextual information given by images in a collection. Our search algorithm tackles this problem by employing a similarity measure and a ranking strategy that reflect the nonlinearity of human perception and contextual information in a dataset. Similarity search in an image database based on this contextual information shows encouraging experimental results.

Keywords: context-sensitive search, image search, similarity ranking, similarity search

Procedia PDF Downloads 360
1010 Automated Prediction of HIV-associated Cervical Cancer Patients Using Data Mining Techniques for Survival Analysis

Authors: O. J. Akinsola, Yinan Zheng, Rose Anorlu, F. T. Ogunsola, Lifang Hou, Robert Leo-Murphy

Abstract:

Cervical Cancer (CC) is the 2nd most common cancer among women living in low and middle-income countries, with no associated symptoms during formative periods. With the advancement and innovative medical research, there are numerous preventive measures being utilized, but the incidence of cervical cancer cannot be truncated with the application of only screening tests. The mortality associated with this invasive cervical cancer can be nipped in the bud through the important role of early-stage detection. This study research selected an array of different top features selection techniques which was aimed at developing a model that could validly diagnose the risk factors of cervical cancer. A retrospective clinic-based cohort study was conducted on 178 HIV-associated cervical cancer patients in Lagos University teaching Hospital, Nigeria (U54 data repository) in April 2022. The outcome measure was the automated prediction of the HIV-associated cervical cancer cases, while the predictor variables include: demographic information, reproductive history, birth control, sexual history, cervical cancer screening history for invasive cervical cancer. The proposed technique was assessed with R and Python programming software to produce the model by utilizing the classification algorithms for the detection and diagnosis of cervical cancer disease. Four machine learning classification algorithms used are: the machine learning model was split into training and testing dataset into ratio 80:20. The numerical features were also standardized while hyperparameter tuning was carried out on the machine learning to train and test the data. Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and K-Nearest Neighbor (KNN). Some fitting features were selected for the detection and diagnosis of cervical cancer diseases from selected characteristics in the dataset using the contribution of various selection methods for the classification cervical cancer into healthy or diseased status. The mean age of patients was 49.7±12.1 years, mean age at pregnancy was 23.3±5.5 years, mean age at first sexual experience was 19.4±3.2 years, while the mean BMI was 27.1±5.6 kg/m2. A larger percentage of the patients are Married (62.9%), while most of them have at least two sexual partners (72.5%). Age of patients (OR=1.065, p<0.001**), marital status (OR=0.375, p=0.011**), number of pregnancy live-births (OR=1.317, p=0.007**), and use of birth control pills (OR=0.291, p=0.015**) were found to be significantly associated with HIV-associated cervical cancer. On top ten 10 features (variables) considered in the analysis, RF claims the overall model performance, which include: accuracy of (72.0%), the precision of (84.6%), a recall of (84.6%) and F1-score of (74.0%) while LR has: an accuracy of (74.0%), precision of (70.0%), recall of (70.0%) and F1-score of (70.0%). The RF model identified 10 features predictive of developing cervical cancer. The age of patients was considered as the most important risk factor, followed by the number of pregnancy livebirths, marital status, and use of birth control pills, The study shows that data mining techniques could be used to identify women living with HIV at high risk of developing cervical cancer in Nigeria and other sub-Saharan African countries.

Keywords: associated cervical cancer, data mining, random forest, logistic regression

Procedia PDF Downloads 82
1009 FLEX: A Backdoor Detection and Elimination Method in Federated Scenario

Authors: Shuqi Zhang

Abstract:

Federated learning allows users to participate in collaborative model training without sending data to third-party servers, reducing the risk of user data privacy leakage, and is widely used in smart finance and smart healthcare. However, the distributed architecture design of federation learning itself and the existence of secure aggregation protocols make it inherently vulnerable to backdoor attacks. To solve this problem, the federated learning backdoor defense framework FLEX based on group aggregation, cluster analysis, and neuron pruning is proposed, and inter-compatibility with secure aggregation protocols is achieved. The good performance of FLEX is verified by building a horizontal federated learning framework on the CIFAR-10 dataset for experiments, which achieves 98% success rate of backdoor detection and reduces the success rate of backdoor tasks to 0% ~ 10%.

Keywords: federated learning, secure aggregation, backdoor attack, cluster analysis, neuron pruning

Procedia PDF Downloads 90
1008 Generating Music with More Refined Emotions

Authors: Shao-Di Feng, Von-Wun Soo

Abstract:

To generate symbolic music with specific emotions is a challenging task due to symbolic music datasets that have emotion labels are scarce and incomplete. This research aims to generate more refined emotions based on the training datasets that are only labeled with four quadrants in Russel’s 2D emotion model. We focus on the theory of Music Fadernet and map arousal and valence to the low-level attributes, and build a symbolic music generation model by combining transformer and GM-VAE. We adopt an in-attention mechanism for the model and improve it by allowing modulation by conditional information. And we show the music generation model could control the generation of music according to the emotions specified by users in terms of high-level linguistic expression and by manipulating their corresponding low-level musical attributes. Finally, we evaluate the model performance using a pre-trained emotion classifier against a pop piano midi dataset called EMOPIA, and by subjective listening evaluation, we demonstrate that the model could generate music with more refined emotions correctly.

Keywords: music generation, music emotion controlling, deep learning, semi-supervised learning

Procedia PDF Downloads 82
1007 Determining Antecedents of Employee Turnover: A Study on Blue Collar vs White Collar Workers on Marco Level

Authors: Evy Rombaut, Marie-Anne Guerry

Abstract:

Predicting voluntary turnover of employees is an important topic of study, both in academia and industry. Researchers try to uncover determinants for a broader understanding and possible prevention of turnover. In the current study, we use a data set based approach to reveal determinants for turnover, differing for blue and white collar workers. Our data set based approach made it possible to study actual turnover for more than 500000 employees in 15692 Belgian corporations. We use logistic regression to calculate individual turnover probabilities and test the goodness of our model with the AUC (area under the ROC-curve) method. The results of the study confirm the relationship of known determinants to employee turnover such as age, seniority, pay and work distance. In addition, the study unravels unknown and verifies known differences between blue and white collar workers. It shows opposite relationships to turnover for gender, marital status, the number of children, nationality, and pay.

Keywords: employee turnover, blue collar, white collar, dataset analysis

Procedia PDF Downloads 284
1006 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: mutex task generation, data augmentation, meta-learning, text classification.

Procedia PDF Downloads 134
1005 Evolution under Length Constraints for Convolutional Neural Networks Architecture Design

Authors: Ousmane Youme, Jean Marie Dembele, Eugene Ezin, Christophe Cambier

Abstract:

In recent years, the convolutional neural networks (CNN) architectures designed by evolution algorithms have proven to be competitive with handcrafted architectures designed by experts. However, these algorithms need a lot of computational power, which is beyond the capabilities of most researchers and engineers. To overcome this problem, we propose an evolution architecture under length constraints. It consists of two algorithms: a search length strategy to find an optimal space and a search architecture strategy based on a genetic algorithm to find the best individual in the optimal space. Our algorithms drastically reduce resource costs and also keep good performance. On the Cifar-10 dataset, our framework presents outstanding performance with an error rate of 5.12% and only 4.6 GPU a day to converge to the optimal individual -22 GPU a day less than the lowest cost automatic evolutionary algorithm in the peer competition.

Keywords: CNN architecture, genetic algorithm, evolution algorithm, length constraints

Procedia PDF Downloads 125
1004 Printed Thai Character Recognition Using Particle Swarm Optimization Algorithm

Authors: Phawin Sangsuvan, Chutimet Srinilta

Abstract:

This Paper presents the applications of Particle Swarm Optimization (PSO) Method for Thai optical character recognition (OCR). OCR consists of the pre-processing, character recognition and post-processing. Before enter into recognition process. The Character must be “Prepped” by pre-processing process. The PSO is an optimization method that belongs to the swarm intelligence family based on the imitation of social behavior patterns of animals. Route of each particle is determined by an individual data among neighborhood particles. The interaction of the particles with neighbors is the advantage of Particle Swarm to determine the best solution. So PSO is interested by a lot of researchers in many difficult problems including character recognition. As the previous this research used a Projection Histogram to extract printed digits features and defined the simple Fitness Function for PSO. The results reveal that PSO gives 67.73% for testing dataset. So in the future there can be explored enhancement the better performance of PSO with improve the Fitness Function.

Keywords: character recognition, histogram projection, particle swarm optimization, pattern recognition techniques

Procedia PDF Downloads 469
1003 A Comparative Assessment Method For Map Alignment Techniques

Authors: Rema Daher, Theodor Chakhachiro, Daniel Asmar

Abstract:

In the era of autonomous robot mapping, assessing the goodness of the generated maps is important, and is usually performed by aligning them to ground truth. Map alignment is difficult for two reasons: first, the query maps can be significantly distorted from ground truth, and second, establishing what constitutes ground truth for different settings is challenging. Most map alignment techniques to this date have addressed the first problem, while paying too little importance to the second. In this paper, we propose a benchmark dataset, which consists of synthetically transformed maps with their corresponding displacement fields. Furthermore, we propose a new system for comparison, where the displacement field of any map alignment technique can be computed and compared to the ground truth using statistical measures. The local information in displacement fields renders the evaluation system applicable to any alignment technique, whether it is linear or not. In our experiments, the proposed method was applied to different alignment methods from the literature, allowing for a comparative assessment between them all.

Keywords: assessment methods, benchmark, image deformation, map alignment, robot mapping, robot motion

Procedia PDF Downloads 114
1002 Data-Centric Anomaly Detection with Diffusion Models

Authors: Sheldon Liu, Gordon Wang, Lei Liu, Xuefeng Liu

Abstract:

Anomaly detection, also referred to as one-class classification, plays a crucial role in identifying product images that deviate from the expected distribution. This study introduces Data-centric Anomaly Detection with Diffusion Models (DCADDM), presenting a systematic strategy for data collection and further diversifying the data with image generation via diffusion models. The algorithm addresses data collection challenges in real-world scenarios and points toward data augmentation with the integration of generative AI capabilities. The paper explores the generation of normal images using diffusion models. The experiments demonstrate that with 30% of the original normal image size, modeling in an unsupervised setting with state-of-the-art approaches can achieve equivalent performances. With the addition of generated images via diffusion models (10% equivalence of the original dataset size), the proposed algorithm achieves better or equivalent anomaly localization performance.

Keywords: diffusion models, anomaly detection, data-centric, generative AI

Procedia PDF Downloads 79
1001 Diversity and Phylogenetic Placement of Seven Inocybe (Inocybaceae, Fungi) from Benin

Authors: Hyppolite Aignon, Souleymane Yorou, Martin Ryberg, Anneli Svanholm

Abstract:

Climate change and human actions cause the extinction of wild mushrooms. In Benin, the diversity of fungi is large and may still contain species new to science but the inventory effort remains low and focuses on particularly edible species (Russula, Lactarius, Lactifluus, and also Amanita). In addition, inventories have started recently and some groups of fungi are not sufficiently sampled, however, the degradation of fungal habitat continues to increase and some species are already disappearing. (Yorou and De Kesel, 2011), however, the degradation of fungi habitat continues to increase and some species may disappear without being known. This genus (Inocybe) overlooked has a worldwide distribution and includes more than 700 species with many undiscovered or poorly known species worldwide and particularly in tropical Africa. It is therefore important to orient the inventory to other genera or important families such as Inocybe (Fungi, Agaricales) in order to highlight their diversity and also to know their phylogenetic positions with a combined approach of gene regions. This study aims to evaluate the species richness and phylogenetic position of Inocybe species and affiliated taxa in West Africa. Thus, in North Benin, we visited the Forest Reserve of Ouémé Supérieur, the Okpara forest and the Alibori Supérieur Forest Reserve. In the center, we targeted the Forest Reserve of Toui-Kilibo. The surveys have been carried during the raining season in the study area meaning from June to October. A total of 24 taxa were collected, photographed and described. The DNA was extracted, the Polymerase Chain Reaction was carried out using primers (ITS1-F, ITS4-B) for Internal transcribed spacer (ITS), (LROR, LWRB, LR7, LR5) for nuclear ribosomal (LSU), (RPB2-f5F, RPB2-b6F, RPB2- b6R2, RPB2-b7R) for RNA polymerase II gene (RPB2) and sequenced. The ITS sequences of the 24 collections of Inocybaceae were edited in Staden and all the sequences were aligned and edited with Aliview v1.17. The sequences were examined by eye for sufficient similarity to be considered the same species. 13 different species were present in the collections. In addition, sequences similar to the ITS sequences of the thirteen final species were searched using BLAST. The nLSU and RPB2 markers for these species have been inserted in a complete alignment, where species from all major Inocybaceae clades as well as from all continents except Antarctica are present. Our new sequences for nLSU and RPB2 have been manually aligned in this dataset. Phylogenetic analysis was performed using the RAxML v7.2.6 maximum likelihood software. Bootstrap replications have been set to 100 and no partitioning of the dataset has been performed. The resulting tree was viewed and edited with FigTree v1.4.3. The preliminary tree resulting from the analysis of maximum likelihood shows us that these species coming from Benin are much diversified and are distributed in four different clades (Inosperma, Inocybe, Mallocybe and Pseudosperma) on the seven clades of Inocybaceae but the phylogeny position of 7 is currently known. This study marks the diversity of Inocybe in Benin and the investigations will continue and a protection plan will be developed in the coming years.

Keywords: Benin, diversity, Inocybe, phylogeny placement

Procedia PDF Downloads 138
1000 Investigating the Impacts of Climate Change on Soil Erosion: A Case Study of Kasilian Watershed, Northern Iran

Authors: Mohammad Zare, Mahbubeh Sheikh

Abstract:

Many of the impact of climate change will material through change in soil erosion which were rarely addressed in Iran. This paper presents an investigation of the impacts of climate change soil erosin for the Kasilian basin. LARS-WG5 was used to downscale the IPCM4 and GFCM21 predictions of the A2 scenarios for the projected periods of 1985-2030 and 2080-2099. This analysis was carried out by means of the dataset the International Centre for Theoretical Physics (ICTP) of Trieste. Soil loss modeling using Revised Universal Soil Loss Equation (RUSLE). Results indicate that soil erosion increase or decrease, depending on which climate scenarios are considered. The potential for climate change to increase soil loss rate, soil erosion in future periods was established, whereas considerable decreases in erosion are projected when land use is increased from baseline periods.

Keywords: Kasilian watershed, climatic change, soil erosion, LARS-WG5 Model, RUSLE

Procedia PDF Downloads 501
999 Clinical Validation of an Automated Natural Language Processing Algorithm for Finding COVID-19 Symptoms and Complications in Patient Notes

Authors: Karolina Wieczorek, Sophie Wiliams

Abstract:

Introduction: Patient data is often collected in Electronic Health Record Systems (EHR) for purposes such as providing care as well as reporting data. This information can be re-used to validate data models in clinical trials or in epidemiological studies. Manual validation of automated tools is vital to pick up errors in processing and to provide confidence in the output. Mentioning a disease in a discharge letter does not necessarily mean that a patient suffers from this disease. Many of them discuss a diagnostic process, different tests, or discuss whether a patient has a certain disease. The COVID-19 dataset in this study used natural language processing (NLP), an automated algorithm which extracts information related to COVID-19 symptoms, complications, and medications prescribed within the hospital. Free-text patient clinical patient notes are rich sources of information which contain patient data not captured in a structured form, hence the use of named entity recognition (NER) to capture additional information. Methods: Patient data (discharge summary letters) were exported and screened by an algorithm to pick up relevant terms related to COVID-19. Manual validation of automated tools is vital to pick up errors in processing and to provide confidence in the output. A list of 124 Systematized Nomenclature of Medicine (SNOMED) Clinical Terms has been provided in Excel with corresponding IDs. Two independent medical student researchers were provided with a dictionary of SNOMED list of terms to refer to when screening the notes. They worked on two separate datasets called "A” and "B”, respectively. Notes were screened to check if the correct term had been picked-up by the algorithm to ensure that negated terms were not picked up. Results: Its implementation in the hospital began on March 31, 2020, and the first EHR-derived extract was generated for use in an audit study on June 04, 2020. The dataset has contributed to large, priority clinical trials (including International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) by bulk upload to REDcap research databases) and local research and audit studies. Successful sharing of EHR-extracted datasets requires communicating the provenance and quality, including completeness and accuracy of this data. The results of the validation of the algorithm were the following: precision (0.907), recall (0.416), and F-score test (0.570). Percentage enhancement with NLP extracted terms compared to regular data extraction alone was low (0.3%) for relatively well-documented data such as previous medical history but higher (16.6%, 29.53%, 30.3%, 45.1%) for complications, presenting illness, chronic procedures, acute procedures respectively. Conclusions: This automated NLP algorithm is shown to be useful in facilitating patient data analysis and has the potential to be used in more large-scale clinical trials to assess potential study exclusion criteria for participants in the development of vaccines.

Keywords: automated, algorithm, NLP, COVID-19

Procedia PDF Downloads 97