Search results for: machine learning techniques
13994 Identification of Biological Pathways Causative for Breast Cancer Using Unsupervised Machine Learning
Authors: Karthik Mittal
Abstract:
This study performs an unsupervised machine learning analysis to find clusters of related SNPs which highlight biological pathways that are important for the biological mechanisms of breast cancer. Studying genetic variations in isolation is illogical because these genetic variations are known to modulate protein production and function; the downstream effects of these modifications on biological outcomes are highly interconnected. After extracting the SNPs and their effect on different types of breast cancer using the MRBase library, two unsupervised machine learning clustering algorithms were implemented on the genetic variants: a k-means clustering algorithm and a hierarchical clustering algorithm; furthermore, principal component analysis was executed to visually represent the data. These algorithms specifically used the SNP’s beta value on the three different types of breast cancer tested in this project (estrogen-receptor positive breast cancer, estrogen-receptor negative breast cancer, and breast cancer in general) to perform this clustering. Two significant genetic pathways validated the clustering produced by this project: the MAPK signaling pathway and the connection between the BRCA2 gene and the ESR1 gene. This study provides the first proof of concept showing the importance of unsupervised machine learning in interpreting GWAS summary statistics.Keywords: breast cancer, computational biology, unsupervised machine learning, k-means, PCA
Procedia PDF Downloads 14613993 Review of Different Machine Learning Algorithms
Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui
Abstract:
Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.Keywords: Data Mining, Web Mining, classification, ML Algorithms
Procedia PDF Downloads 30313992 Intrusion Detection Based on Graph Oriented Big Data Analytics
Authors: Ahlem Abid, Farah Jemili
Abstract:
Intrusion detection has been the subject of numerous studies in industry and academia, but cyber security analysts always want greater precision and global threat analysis to secure their systems in cyberspace. To improve intrusion detection system, the visualisation of the security events in form of graphs and diagrams is important to improve the accuracy of alerts. In this paper, we propose an approach of an IDS based on cloud computing, big data technique and using a machine learning graph algorithm which can detect in real time different attacks as early as possible. We use the MAWILab intrusion detection dataset . We choose Microsoft Azure as a unified cloud environment to load our dataset on. We implement the k2 algorithm which is a graphical machine learning algorithm to classify attacks. Our system showed a good performance due to the graphical machine learning algorithm and spark structured streaming engine.Keywords: Apache Spark Streaming, Graph, Intrusion detection, k2 algorithm, Machine Learning, MAWILab, Microsoft Azure Cloud
Procedia PDF Downloads 14713991 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li
Abstract:
Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation
Procedia PDF Downloads 23513990 Application of Deep Neural Networks to Assess Corporate Credit Rating
Authors: Parisa Golbayani, Dan Wang, Ionut¸ Florescu
Abstract:
In this work we implement machine learning techniques to financial statement reports in order to asses company’s credit rating. Specifically, the work analyzes the performance of four neural network architectures (MLP, CNN, CNN2D, LSTM) in predicting corporate credit rating as issued by Standard and Poor’s. The paper focuses on companies from the energy, financial, and healthcare sectors in the US. The goal of this analysis is to improve application of machine learning algorithms to credit assessment. To accomplish this, the study investigates three questions. First, we investigate if the algorithms perform better when using a selected subset of important features or whether better performance is obtained by allowing the algorithms to select features themselves. Second, we address the temporal aspect inherent in financial data and study whether it is important for the results obtained by a machine learning algorithm. Third, we aim to answer if one of the four particular neural network architectures considered consistently outperforms the others, and if so under which conditions. This work frames the problem as several case studies to answer these questions and analyze the results using ANOVA and multiple comparison testing procedures.Keywords: convolutional neural network, long short term memory, multilayer perceptron, credit rating
Procedia PDF Downloads 23513989 Software Defect Analysis- Eclipse Dataset
Authors: Amrane Meriem, Oukid Salyha
Abstract:
The presence of defects or bugs in software can lead to costly setbacks, operational inefficiencies, and compromised user experiences. The integration of Machine Learning(ML) techniques has emerged to predict and preemptively address software defects. ML represents a proactive strategy aimed at identifying potential anomalies, errors, or vulnerabilities within code before they manifest as operational issues. By analyzing historical data, such as code changes, feature im- plementations, and defect occurrences. This en- ables development teams to anticipate and mitigate these issues, thus enhancing software quality, reducing maintenance costs, and ensuring smoother user interactions. In this work, we used a recommendation system to improve the performance of ML models in terms of predicting the code severity and effort estimation.Keywords: software engineering, machine learning, bugs detection, effort estimation
Procedia PDF Downloads 8613988 Insider Theft Detection in Organizations Using Keylogger and Machine Learning
Authors: Shamatha Shetty, Sakshi Dhabadi, Prerana M., Indushree B.
Abstract:
About 66% of firms claim that insider attacks are more likely to happen. The frequency of insider incidents has increased by 47% in the last two years. The goal of this work is to prevent dangerous employee behavior by using keyloggers and the Machine Learning (ML) model. Every keystroke that the user enters is recorded by the keylogging program, also known as keystroke logging. Keyloggers are used to stop improper use of the system. This enables us to collect all textual data, save it in a CSV file, and analyze it using an ML algorithm and the VirusTotal API. Many large companies use it to methodically monitor how their employees use computers, the internet, and email. We are utilizing the SVM algorithm and the VirusTotal API to improve overall efficiency and accuracy in identifying specific patterns and words to automate and offer the report for improved monitoring.Keywords: cyber security, machine learning, cyclic process, email notification
Procedia PDF Downloads 5713987 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach
Authors: Rajvir Kaur, Jeewani Anupama Ginige
Abstract:
With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.Keywords: artificial neural networks, breast cancer, classifiers, cervical cancer, f-score, machine learning, precision, recall
Procedia PDF Downloads 27713986 Automatic Method for Classification of Informative and Noninformative Images in Colonoscopy Video
Authors: Nidhal K. Azawi, John M. Gauch
Abstract:
Colorectal cancer is one of the leading causes of cancer death in the US and the world, which is why millions of colonoscopy examinations are performed annually. Unfortunately, noise, specular highlights, and motion artifacts corrupt many images in a typical colonoscopy exam. The goal of our research is to produce automated techniques to detect and correct or remove these noninformative images from colonoscopy videos, so physicians can focus their attention on informative images. In this research, we first automatically extract features from images. Then we use machine learning and deep neural network to classify colonoscopy images as either informative or noninformative. Our results show that we achieve image classification accuracy between 92-98%. We also show how the removal of noninformative images together with image alignment can aid in the creation of image panoramas and other visualizations of colonoscopy images.Keywords: colonoscopy classification, feature extraction, image alignment, machine learning
Procedia PDF Downloads 25313985 Principal Component Analysis Combined Machine Learning Techniques on Pharmaceutical Samples by Laser Induced Breakdown Spectroscopy
Authors: Kemal Efe Eseller, Göktuğ Yazici
Abstract:
Laser-induced breakdown spectroscopy (LIBS) is a rapid optical atomic emission spectroscopy which is used for material identification and analysis with the advantages of in-situ analysis, elimination of intensive sample preparation, and micro-destructive properties for the material to be tested. LIBS delivers short pulses of laser beams onto the material in order to create plasma by excitation of the material to a certain threshold. The plasma characteristics, which consist of wavelength value and intensity amplitude, depends on the material and the experiment’s environment. In the present work, medicine samples’ spectrum profiles were obtained via LIBS. Medicine samples’ datasets include two different concentrations for both paracetamol based medicines, namely Aferin and Parafon. The spectrum data of the samples were preprocessed via filling outliers based on quartiles, smoothing spectra to eliminate noise and normalizing both wavelength and intensity axis. Statistical information was obtained and principal component analysis (PCA) was incorporated to both the preprocessed and raw datasets. The machine learning models were set based on two different train-test splits, which were 70% training – 30% test and 80% training – 20% test. Cross-validation was preferred to protect the models against overfitting; thus the sample amount is small. The machine learning results of preprocessed and raw datasets were subjected to comparison for both splits. This is the first time that all supervised machine learning classification algorithms; consisting of Decision Trees, Discriminant, naïve Bayes, Support Vector Machines (SVM), k-NN(k-Nearest Neighbor) Ensemble Learning and Neural Network algorithms; were incorporated to LIBS data of paracetamol based pharmaceutical samples, and their different concentrations on preprocessed and raw dataset in order to observe the effect of preprocessing.Keywords: machine learning, laser-induced breakdown spectroscopy, medicines, principal component analysis, preprocessing
Procedia PDF Downloads 8713984 Review on Implementation of Artificial Intelligence and Machine Learning for Controlling Traffic and Avoiding Accidents
Authors: Neha Singh, Shristi Singh
Abstract:
Accidents involving motor vehicles are more likely to cause serious injuries and fatalities. It also has a host of other perpetual issues, such as the regular loss of life and goods in accidents. To solve these issues, appropriate measures must be implemented, such as establishing an autonomous incident detection system that makes use of machine learning and artificial intelligence. In order to reduce traffic accidents, this article examines the overview of artificial intelligence and machine learning in autonomous event detection systems. The paper explores the major issues, prospective solutions, and use of artificial intelligence and machine learning in road transportation systems for minimising traffic accidents. There is a lot of discussion on additional, fresh, and developing approaches that less frequent accidents in the transportation industry. The study structured the following subtopics specifically: traffic management using machine learning and artificial intelligence and an incident detector with these two technologies. The internet of vehicles and vehicle ad hoc networks, as well as the use of wireless communication technologies like 5G wireless networks and the use of machine learning and artificial intelligence for the planning of road transportation systems, are elaborated. In addition, safety is the primary concern of road transportation. Route optimization, cargo volume forecasting, predictive fleet maintenance, real-time vehicle tracking, and traffic management, according to the review's key conclusions, are essential for ensuring the safety of road transportation networks. In addition to highlighting research trends, unanswered problems, and key research conclusions, the study also discusses the difficulties in applying artificial intelligence to road transport systems. Planning and managing the road transportation system might use the work as a resource.Keywords: artificial intelligence, machine learning, incident detector, road transport systems, traffic management, automatic incident detection, deep learning
Procedia PDF Downloads 11313983 Enhancing Fall Detection Accuracy with a Transfer Learning-Aided Transformer Model Using Computer Vision
Authors: Sheldon McCall, Miao Yu, Liyun Gong, Shigang Yue, Stefanos Kollias
Abstract:
Falls are a significant health concern for older adults globally, and prompt identification is critical to providing necessary healthcare support. Our study proposes a new fall detection method using computer vision based on modern deep learning techniques. Our approach involves training a trans- former model on a large 2D pose dataset for general action recognition, followed by transfer learning. Specifically, we freeze the first few layers of the trained transformer model and train only the last two layers for fall detection. Our experimental results demonstrate that our proposed method outperforms both classical machine learning and deep learning approaches in fall/non-fall classification. Overall, our study suggests that our proposed methodology could be a valuable tool for identifying falls.Keywords: healthcare, fall detection, transformer, transfer learning
Procedia PDF Downloads 14813982 Stack Overflow Detection and Prevention on Operating Systems Using Machine Learning and Control-Flow Enforcement Technology
Authors: Cao Jiayu, Lan Ximing, Huang Jingjia, Burra Venkata Durga Kumar
Abstract:
The first virus to attack personal computers was born in early 1986, called C-Brain, written by a pair of Pakistani brothers. In those days, people still used dos systems, manipulating computers with the most basic command lines. In the 21st century today, computer performance has grown geometrically. But computer viruses are also evolving and escalating. We never stop fighting against security problems. Stack overflow is one of the most common security vulnerabilities in operating systems. It may result in serious security issues for an operating system if a program in it has a vulnerability with administrator privileges. Certain viruses change the value of specific memory through a stack overflow, allowing computers to run harmful programs. This study developed a mechanism to detect and respond to time whenever a stack overflow occurs. We demonstrate the effectiveness of standard machine learning algorithms and control flow enforcement techniques in predicting computer OS security using generating suspicious vulnerability functions (SVFS) and associated suspect areas (SAS). The method can minimize the possibility of stack overflow attacks occurring.Keywords: operating system, security, stack overflow, buffer overflow, machine learning, control-flow enforcement technology
Procedia PDF Downloads 11513981 Machine Learning Prediction of Compressive Damage and Energy Absorption in Carbon Fiber-Reinforced Polymer Tubular Structures
Authors: Milad Abbasi
Abstract:
Carbon fiber-reinforced polymer (CFRP) composite structures are increasingly being utilized in the automotive industry due to their lightweight and specific energy absorption capabilities. Although it is impossible to predict composite mechanical properties directly using theoretical methods, various research has been conducted so far in the literature for accurate simulation of CFRP structures' energy-absorbing behavior. In this research, axial compression experiments were carried out on hand lay-up unidirectional CFRP composite tubes. The fabrication method allowed the authors to extract the material properties of the CFRPs using ASTM D3039, D3410, and D3518 standards. A neural network machine learning algorithm was then utilized to build a robust prediction model to forecast the axial compressive properties of CFRP tubes while reducing high-cost experimental efforts. The predicted results have been compared with the experimental outcomes in terms of load-carrying capacity and energy absorption capability. The results showed high accuracy and precision in the prediction of the energy-absorption capacity of the CFRP tubes. This research also demonstrates the effectiveness and challenges of machine learning techniques in the robust simulation of composites' energy-absorption behavior. Interestingly, the proposed method considerably condensed numerical and experimental efforts in the simulation and calibration of CFRP composite tubes subjected to compressive loading.Keywords: CFRP composite tubes, energy absorption, crushing behavior, machine learning, neural network
Procedia PDF Downloads 15313980 Deep Reinforcement Learning and Generative Adversarial Networks Approach to Thwart Intrusions and Adversarial Attacks
Authors: Fabrice Setephin Atedjio, Jean-Pierre Lienou, Frederica F. Nelson, Sachin S. Shetty, Charles A. Kamhoua
Abstract:
Malicious users exploit vulnerabilities in computer systems, significantly disrupting their performance and revealing the inadequacies of existing protective solutions. Even machine learning-based approaches, designed to ensure reliability, can be compromised by adversarial attacks that undermine their robustness. This paper addresses two critical aspects of enhancing model reliability. First, we focus on improving model performance and robustness against adversarial threats. To achieve this, we propose a strategy by harnessing deep reinforcement learning. Second, we introduce an approach leveraging generative adversarial networks to counter adversarial attacks effectively. Our results demonstrate substantial improvements over previous works in the literature, with classifiers exhibiting enhanced accuracy in classification tasks, even in the presence of adversarial perturbations. These findings underscore the efficacy of the proposed model in mitigating intrusions and adversarial attacks within the machine-learning landscape.Keywords: machine learning, reliability, adversarial attacks, deep-reinforcement learning, robustness
Procedia PDF Downloads 913979 Machine Learning Techniques for Estimating Ground Motion Parameters
Authors: Farid Khosravikia, Patricia Clayton
Abstract:
The main objective of this study is to evaluate the advantages and disadvantages of various machine learning techniques in forecasting ground-motion intensity measures given source characteristics, source-to-site distance, and local site condition. Intensity measures such as peak ground acceleration and velocity (PGA and PGV, respectively) as well as 5% damped elastic pseudospectral accelerations at different periods (PSA), are indicators of the strength of shaking at the ground surface. Estimating these variables for future earthquake events is a key step in seismic hazard assessment and potentially subsequent risk assessment of different types of structures. Typically, linear regression-based models, with pre-defined equations and coefficients, are used in ground motion prediction. However, due to the restrictions of the linear regression methods, such models may not capture more complex nonlinear behaviors that exist in the data. Thus, this study comparatively investigates potential benefits from employing other machine learning techniques as a statistical method in ground motion prediction such as Artificial Neural Network, Random Forest, and Support Vector Machine. The algorithms are adjusted to quantify event-to-event and site-to-site variability of the ground motions by implementing them as random effects in the proposed models to reduce the aleatory uncertainty. All the algorithms are trained using a selected database of 4,528 ground-motions, including 376 seismic events with magnitude 3 to 5.8, recorded over the hypocentral distance range of 4 to 500 km in Oklahoma, Kansas, and Texas since 2005. The main reason of the considered database stems from the recent increase in the seismicity rate of these states attributed to petroleum production and wastewater disposal activities, which necessities further investigation in the ground motion models developed for these states. Accuracy of the models in predicting intensity measures, generalization capability of the models for future data, as well as usability of the models are discussed in the evaluation process. The results indicate the algorithms satisfy some physically sound characteristics such as magnitude scaling distance dependency without requiring pre-defined equations or coefficients. Moreover, it is shown that, when sufficient data is available, all the alternative algorithms tend to provide more accurate estimates compared to the conventional linear regression-based method, and particularly, Random Forest outperforms the other algorithms. However, the conventional method is a better tool when limited data is available.Keywords: artificial neural network, ground-motion models, machine learning, random forest, support vector machine
Procedia PDF Downloads 12213978 Advanced Machine Learning Algorithm for Credit Card Fraud Detection
Authors: Manpreet Kaur
Abstract:
When legitimate credit card users are mistakenly labelled as fraudulent in numerous financial delated applications, there are numerous ethical problems. The innovative machine learning approach we have suggested in this research outperforms the current models and shows how to model a data set for credit card fraud detection while minimizing false positives. As a result, we advise using random forests as the best machine learning method for predicting and identifying credit card transaction fraud. The majority of victims of these fraudulent transactions were discovered to be credit card users over the age of 60, with a higher percentage of fraudulent transactions taking place between the specific hours.Keywords: automated fraud detection, isolation forest method, local outlier factor, ML algorithm, credit card
Procedia PDF Downloads 11313977 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course
Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu
Abstract:
This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN
Procedia PDF Downloads 4413976 Cardiokey: A Binary and Multi-Class Machine Learning Approach to Identify Individuals Using Electrocardiographic Signals on Wearable Devices
Authors: S. Chami, J. Chauvin, T. Demarest, Stan Ng, M. Straus, W. Jahner
Abstract:
Biometrics tools such as fingerprint and iris are widely used in industry to protect critical assets. However, their vulnerability and lack of robustness raise several worries about the protection of highly critical assets. Biometrics based on Electrocardiographic (ECG) signals is a robust identification tool. However, most of the state-of-the-art techniques have worked on clinical signals, which are of high quality and less noisy, extracted from wearable devices like a smartwatch. In this paper, we are presenting a complete machine learning pipeline that identifies people using ECG extracted from an off-person device. An off-person device is a wearable device that is not used in a medical context such as a smartwatch. In addition, one of the main challenges of ECG biometrics is the variability of the ECG of different persons and different situations. To solve this issue, we proposed two different approaches: per person classifier, and one-for-all classifier. The first approach suggests making binary classifier to distinguish one person from others. The second approach suggests a multi-classifier that distinguishes the selected set of individuals from non-selected individuals (others). The preliminary results, the binary classifier obtained a performance 90% in terms of accuracy within a balanced data. The second approach has reported a log loss of 0.05 as a multi-class score.Keywords: biometrics, electrocardiographic, machine learning, signals processing
Procedia PDF Downloads 14213975 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education
Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue
Abstract:
In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education
Procedia PDF Downloads 10813974 Enhancing Precision Agriculture through Object Detection Algorithms: A Study of YOLOv5 and YOLOv8 in Detecting Armillaria spp.
Authors: Christos Chaschatzis, Chrysoula Karaiskou, Pantelis Angelidis, Sotirios K. Goudos, Igor Kotsiuba, Panagiotis Sarigiannidis
Abstract:
Over the past few decades, the rapid growth of the global population has led to the need to increase agricultural production and improve the quality of agricultural goods. There is a growing focus on environmentally eco-friendly solutions, sustainable production, and biologically minimally fertilized products in contemporary society. Precision agriculture has the potential to incorporate a wide range of innovative solutions with the development of machine learning algorithms. YOLOv5 and YOLOv8 are two of the most advanced object detection algorithms capable of accurately recognizing objects in real time. Detecting tree diseases is crucial for improving the food production rate and ensuring sustainability. This research aims to evaluate the efficacy of YOLOv5 and YOLOv8 in detecting the symptoms of Armillaria spp. in sweet cherry trees and determining their health status, with the goal of enhancing the robustness of precision agriculture. Additionally, this study will explore Computer Vision (CV) techniques with machine learning algorithms to improve the detection process’s efficiency.Keywords: Armillaria spp., machine learning, precision agriculture, smart farming, sweet cherries trees, YOLOv5, YOLOv8
Procedia PDF Downloads 11313973 Injury Prediction for Soccer Players Using Machine Learning
Authors: Amiel Satvedi, Richard Pyne
Abstract:
Injuries in professional sports occur on a regular basis. Some may be minor, while others can cause huge impact on a player's career and earning potential. In soccer, there is a high risk of players picking up injuries during game time. This research work seeks to help soccer players reduce the risk of getting injured by predicting the likelihood of injury while playing in the near future and then providing recommendations for intervention. The injury prediction tool will use a soccer player's number of minutes played on the field, number of appearances, distance covered and performance data for the current and previous seasons as variables to conduct statistical analysis and provide injury predictive results using a machine learning linear regression model.Keywords: injury predictor, soccer injury prevention, machine learning in soccer, big data in soccer
Procedia PDF Downloads 18213972 Hate Speech Detection Using Machine Learning: A Survey
Authors: Edemealem Desalegn Kingawa, Kafte Tasew Timkete, Mekashaw Girmaw Abebe, Terefe Feyisa, Abiyot Bitew Mihretie, Senait Teklemarkos Haile
Abstract:
Currently, hate speech is a growing challenge for society, individuals, policymakers, and researchers, as social media platforms make it easy to anonymously create and grow online friends and followers and provide an online forum for debate about specific issues of community life, culture, politics, and others. Despite this, research on identifying and detecting hate speech is not satisfactory performance, and this is why future research on this issue is constantly called for. This paper provides a systematic review of the literature in this field, with a focus on approaches like word embedding techniques, machine learning, deep learning technologies, hate speech terminology, and other state-of-the-art technologies with challenges. In this paper, we have made a systematic review of the last six years of literature from Research Gate and Google Scholar. Furthermore, limitations, along with algorithm selection and use challenges, data collection, and cleaning challenges, and future research directions, are discussed in detail.Keywords: Amharic hate speech, deep learning approach, hate speech detection review, Afaan Oromo hate speech detection
Procedia PDF Downloads 17813971 Thick Data Analytics for Learning Cataract Severity: A Triplet Loss Siamese Neural Network Model
Authors: Jinan Fiaidhi, Sabah Mohammed
Abstract:
Diagnosing cataract severity is an important factor in deciding to undertake surgery. It is usually conducted by an ophthalmologist or through taking a variety of fundus photography that needs to be examined by the ophthalmologist. This paper carries out an investigation using a Siamese neural net that can be trained with small anchor samples to score cataract severity. The model used in this paper is based on a triplet loss function that takes the ophthalmologist best experience in rating positive and negative anchors to a specific cataract scaling system. This approach that takes the heuristics of the ophthalmologist is generally called the thick data approach, which is a kind of machine learning approach that learn from a few shots. Clinical Relevance: The lens of the eye is mostly made up of water and proteins. A cataract occurs when these proteins at the eye lens start to clump together and block lights causing impair vision. This research aims at employing thick data machine learning techniques to rate the severity of the cataract using Siamese neural network.Keywords: thick data analytics, siamese neural network, triplet-loss model, few shot learning
Procedia PDF Downloads 11113970 Mobile Learning: Toward Better Understanding of Compression Techniques
Authors: Farouk Lawan Gambo
Abstract:
Data compression shrinks files into fewer bits then their original presentation. It has more advantage on internet because the smaller a file, the faster it can be transferred but learning most of the concepts in data compression are abstract in nature therefore making them difficult to digest by some students (Engineers in particular). To determine the best approach toward learning data compression technique, this paper first study the learning preference of engineering students who tend to have strong active, sensing, visual and sequential learning preferences, the paper also study the advantage that mobility of learning have experienced; Learning at the point of interest, efficiency, connection, and many more. A survey is carried out with some reasonable number of students, through random sampling to see whether considering the learning preference and advantages in mobility of learning will give a promising improvement over the traditional way of learning. Evidence from data analysis using Ms-Excel as a point of concern for error-free findings shows that there is significance different in the students after using learning content provided on smart phone, also the result of the findings presented in, bar charts and pie charts interpret that mobile learning has to be promising feature of learning.Keywords: data analysis, compression techniques, learning content, traditional learning approach
Procedia PDF Downloads 34713969 Assessment of DNA Sequence Encoding Techniques for Machine Learning Algorithms Using a Universal Bacterial Marker
Authors: Diego Santibañez Oyarce, Fernanda Bravo Cornejo, Camilo Cerda Sarabia, Belén Díaz Díaz, Esteban Gómez Terán, Hugo Osses Prado, Raúl Caulier-Cisterna, Jorge Vergara-Quezada, Ana Moya-Beltrán
Abstract:
The advent of high-throughput sequencing technologies has revolutionized genomics, generating vast amounts of genetic data that challenge traditional bioinformatics methods. Machine learning addresses these challenges by leveraging computational power to identify patterns and extract information from large datasets. However, biological sequence data, being symbolic and non-numeric, must be converted into numerical formats for machine learning algorithms to process effectively. So far, some encoding methods, such as one-hot encoding or k-mers, have been explored. This work proposes additional approaches for encoding DNA sequences in order to compare them with existing techniques and determine if they can provide improvements or if current methods offer superior results. Data from the 16S rRNA gene, a universal marker, was used to analyze eight bacterial groups that are significant in the pulmonary environment and have clinical implications. The bacterial genes included in this analysis are Prevotella, Abiotrophia, Acidovorax, Streptococcus, Neisseria, Veillonella, Mycobacterium, and Megasphaera. These data were downloaded from the NCBI database in Genbank file format, followed by a syntactic analysis to selectively extract relevant information from each file. For data encoding, a sequence normalization process was carried out as the first step. From approximately 22,000 initial data points, a subset was generated for testing purposes. Specifically, 55 sequences from each bacterial group met the length criteria, resulting in an initial sample of approximately 440 sequences. The sequences were encoded using different methods, including one-hot encoding, k-mers, Fourier transform, and Wavelet transform. Various machine learning algorithms, such as support vector machines, random forests, and neural networks, were trained to evaluate these encoding methods. The performance of these models was assessed using multiple metrics, including the confusion matrix, ROC curve, and F1 Score, providing a comprehensive evaluation of their classification capabilities. The results show that accuracies between encoding methods vary by up to approximately 15%, with the Fourier transform obtaining the best results for the evaluated machine learning algorithms. These findings, supported by the detailed analysis using the confusion matrix, ROC curve, and F1 Score, provide valuable insights into the effectiveness of different encoding methods and machine learning algorithms for genomic data analysis, potentially improving the accuracy and efficiency of bacterial classification and related genomic studies.Keywords: DNA encoding, machine learning, Fourier transform, Fourier transformation
Procedia PDF Downloads 2313968 Prediction of MicroRNA-Target Gene by Machine Learning Algorithms in Lung Cancer Study
Authors: Nilubon Kurubanjerdjit, Nattakarn Iam-On, Ka-Lok Ng
Abstract:
MicroRNAs are small non-coding RNA found in many different species. They play crucial roles in cancer such as biological processes of apoptosis and proliferation. The identification of microRNA-target genes can be an essential first step towards to reveal the role of microRNA in various cancer types. In this paper, we predict miRNA-target genes for lung cancer by integrating prediction scores from miRanda and PITA algorithms used as a feature vector of miRNA-target interaction. Then, machine-learning algorithms were implemented for making a final prediction. The approach developed in this study should be of value for future studies into understanding the role of miRNAs in molecular mechanisms enabling lung cancer formation.Keywords: microRNA, miRNAs, lung cancer, machine learning, Naïve Bayes, SVM
Procedia PDF Downloads 39913967 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection
Authors: Yaojun Wang, Yaoqing Wang
Abstract:
Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.Keywords: case-based reasoning, decision tree, stock selection, machine learning
Procedia PDF Downloads 42013966 Development of Evolutionary Algorithm by Combining Optimization and Imitation Approach for Machine Learning in Gaming
Authors: Rohit Mittal, Bright Keswani, Amit Mithal
Abstract:
This paper provides a sense about the application of computational intelligence techniques used to develop computer games, especially car racing. For the deep sense and knowledge of artificial intelligence, this paper is divided into various sections that is optimization, imitation, innovation and combining approach of optimization and imitation. This paper is mainly concerned with combining approach which tells different aspects of using fitness measures and supervised learning techniques used to imitate aspects of behavior. The main achievement of this paper is based on modelling player behaviour and evolving new game content such as racing tracks as single car racing on single track.Keywords: evolution algorithm, genetic, optimization, imitation, racing, innovation, gaming
Procedia PDF Downloads 64613965 Comparative Study of Learning Achievement via Jigsaw I and IV Techniques
Authors: Phongkon Weerpiput
Abstract:
This research study aimed to compare learning achievement between Jigsaw I and jigsaw IV techniques. The target group was 70 Thai major sophomores enrolled in a course entitled Foreign Language in Thai at the Faculty of Education, Suan Sunandha Rajabhat University. The research methodology was quasi-experimental design. A control group was given the Jigsaw I technique while an experimental group experienced the Jigsaw IV technique. The treatment content focused on Khmer loanwords in Thai language executed for a period of 3 hours per week for total of 3 weeks. The instruments included learning management plans and multiple-choice test items. The result yields no significant difference at level .05 between learning achievement of both techniques.Keywords: Jigsaw I technique, Jigsaw IV technique, learning achievement, major sophomores
Procedia PDF Downloads 288