Search results for: machine learning applications
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 13685

Search results for: machine learning applications

13565 Machine Learning Automatic Detection on Twitter Cyberbullying

Authors: Raghad A. Altowairgi

Abstract:

With the wide spread of social media platforms, young people tend to use them extensively as the first means of communication due to their ease and modernity. But these platforms often create a fertile ground for bullies to practice their aggressive behavior against their victims. Platform usage cannot be reduced, but intelligent mechanisms can be implemented to reduce the abuse. This is where machine learning comes in. Understanding and classifying text can be helpful in order to minimize the act of cyberbullying. Artificial intelligence techniques have expanded to formulate an applied tool to address the phenomenon of cyberbullying. In this research, machine learning models are built to classify text into two classes; cyberbullying and non-cyberbullying. After preprocessing the data in 4 stages; removing characters that do not provide meaningful information to the models, tokenization, removing stop words, and lowering text. BoW and TF-IDF are used as the main features for the five classifiers, which are; logistic regression, Naïve Bayes, Random Forest, XGboost, and Catboost classifiers. Each of them scores 92%, 90%, 92%, 91%, 86% respectively.

Keywords: cyberbullying, machine learning, Bag-of-Words, term frequency-inverse document frequency, natural language processing, Catboost

Procedia PDF Downloads 106
13564 Application of Supervised Deep Learning-based Machine Learning to Manage Smart Homes

Authors: Ahmed Al-Adaileh

Abstract:

Renewable energy sources, domestic storage systems, controllable loads and machine learning technologies will be key components of future smart homes management systems. An energy management scheme that uses a Deep Learning (DL) approach to support the smart home management systems, which consist of a standalone photovoltaic system, storage unit, heating ventilation air-conditioning system and a set of conventional and smart appliances, is presented. The objective of the proposed scheme is to apply DL-based machine learning to predict various running parameters within a smart home's environment to achieve maximum comfort levels for occupants, reduced electricity bills, and less dependency on the public grid. The problem is using Reinforcement learning, where decisions are taken based on applying the Continuous-time Markov Decision Process. The main contribution of this research is the proposed framework that applies DL to enhance the system's supervised dataset to offer unlimited chances to effectively support smart home systems. A case study involving a set of conventional and smart appliances with dedicated processing units in an inhabited building can demonstrate the validity of the proposed framework. A visualization graph can show "before" and "after" results.

Keywords: smart homes systems, machine learning, deep learning, Markov Decision Process

Procedia PDF Downloads 173
13563 An Application for Risk of Crime Prediction Using Machine Learning

Authors: Luis Fonseca, Filipe Cabral Pinto, Susana Sargento

Abstract:

The increase of the world population, especially in large urban centers, has resulted in new challenges particularly with the control and optimization of public safety. Thus, in the present work, a solution is proposed for the prediction of criminal occurrences in a city based on historical data of incidents and demographic information. The entire research and implementation will be presented start with the data collection from its original source, the treatment and transformations applied to them, choice and the evaluation and implementation of the Machine Learning model up to the application layer. Classification models will be implemented to predict criminal risk for a given time interval and location. Machine Learning algorithms such as Random Forest, Neural Networks, K-Nearest Neighbors and Logistic Regression will be used to predict occurrences, and their performance will be compared according to the data processing and transformation used. The results show that the use of Machine Learning techniques helps to anticipate criminal occurrences, which contributed to the reinforcement of public security. Finally, the models were implemented on a platform that will provide an API to enable other entities to make requests for predictions in real-time. An application will also be presented where it is possible to show criminal predictions visually.

Keywords: crime prediction, machine learning, public safety, smart city

Procedia PDF Downloads 89
13562 Machine Learning in Agriculture: A Brief Review

Authors: Aishi Kundu, Elhan Raza

Abstract:

"Necessity is the mother of invention" - Rapid increase in the global human population has directed the agricultural domain toward machine learning. The basic need of human beings is considered to be food which can be satisfied through farming. Farming is one of the major revenue generators for the Indian economy. Agriculture is not only considered a source of employment but also fulfils humans’ basic needs. So, agriculture is considered to be the source of employment and a pillar of the economy in developing countries like India. This paper provides a brief review of the progress made in implementing Machine Learning in the agricultural sector. Accurate predictions are necessary at the right time to boost production and to aid the timely and systematic distribution of agricultural commodities to make their availability in the market faster and more effective. This paper includes a thorough analysis of various machine learning algorithms applied in different aspects of agriculture (crop management, soil management, water management, yield tracking, livestock management, etc.).Due to climate changes, crop production is affected. Machine learning can analyse the changing patterns and come up with a suitable approach to minimize loss and maximize yield. Machine Learning algorithms/ models (regression, support vector machines, bayesian models, artificial neural networks, decision trees, etc.) are used in smart agriculture to analyze and predict specific outcomes which can be vital in increasing the productivity of the Agricultural Food Industry. It is to demonstrate vividly agricultural works under machine learning to sensor data. Machine Learning is the ongoing technology benefitting farmers to improve gains in agriculture and minimize losses. This paper discusses how the irrigation and farming management systems evolve in real-time efficiently. Artificial Intelligence (AI) enabled programs to emerge with rich apprehension for the support of farmers with an immense examination of data.

Keywords: machine Learning, artificial intelligence, crop management, precision farming, smart farming, pre-harvesting, harvesting, post-harvesting

Procedia PDF Downloads 84
13561 Predicting Machine-Down of Woodworking Industrial Machines

Authors: Matteo Calabrese, Martin Cimmino, Dimos Kapetis, Martina Manfrin, Donato Concilio, Giuseppe Toscano, Giovanni Ciandrini, Giancarlo Paccapeli, Gianluca Giarratana, Marco Siciliano, Andrea Forlani, Alberto Carrotta

Abstract:

In this paper we describe a machine learning methodology for Predictive Maintenance (PdM) applied on woodworking industrial machines. PdM is a prominent strategy consisting of all the operational techniques and actions required to ensure machine availability and to prevent a machine-down failure. One of the challenges with PdM approach is to design and develop of an embedded smart system to enable the health status of the machine. The proposed approach allows screening simultaneously multiple connected machines, thus providing real-time monitoring that can be adopted with maintenance management. This is achieved by applying temporal feature engineering techniques and training an ensemble of classification algorithms to predict Remaining Useful Lifetime of woodworking machines. The effectiveness of the methodology is demonstrated by testing an independent sample of additional woodworking machines without presenting machine down event.

Keywords: predictive maintenance, machine learning, connected machines, artificial intelligence

Procedia PDF Downloads 200
13560 A Predictive Machine Learning Model of the Survival of Female-led and Co-Led Small and Medium Enterprises in the UK

Authors: Mais Khader, Xingjie Wei

Abstract:

This research sheds light on female entrepreneurs by providing new insights on the survival predictions of companies led by females in the UK. This study aims to build a predictive machine learning model of the survival of female-led & co-led small & medium enterprises (SMEs) in the UK over the period 2000-2020. The predictive model built utilised a combination of financial and non-financial features related to both companies and their directors to predict SMEs' survival. These features were studied in terms of their contribution to the resultant predictive model. Five machine learning models are used in the modelling: Decision tree, AdaBoost, Naïve Bayes, Logistic regression and SVM. The AdaBoost model had the highest performance of the five models, with an accuracy of 73% and an AUC of 80%. The results show high feature importance in predicting companies' survival for company size, management experience, financial performance, industry, region, and females' percentage in management.

Keywords: company survival, entrepreneurship, females, machine learning, SMEs

Procedia PDF Downloads 72
13559 Predictive Analytics of Student Performance Determinants

Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi

Abstract:

Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine, Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis, and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.

Keywords: student performance, supervised machine learning, classification, cross-validation, prediction

Procedia PDF Downloads 97
13558 Parkinson’s Disease Detection Analysis through Machine Learning Approaches

Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee

Abstract:

Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.

Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier

Procedia PDF Downloads 112
13557 A Machine Learning Approach for Performance Prediction Based on User Behavioral Factors in E-Learning Environments

Authors: Naduni Ranasinghe

Abstract:

E-learning environments are getting more popular than any other due to the impact of COVID19. Even though e-learning is one of the best solutions for the teaching-learning process in the academic process, it’s not without major challenges. Nowadays, machine learning approaches are utilized in the analysis of how behavioral factors lead to better adoption and how they related to better performance of the students in eLearning environments. During the pandemic, we realized the academic process in the eLearning approach had a major issue, especially for the performance of the students. Therefore, an approach that investigates student behaviors in eLearning environments using a data-intensive machine learning approach is appreciated. A hybrid approach was used to understand how each previously told variables are related to the other. A more quantitative approach was used referred to literature to understand the weights of each factor for adoption and in terms of performance. The data set was collected from previously done research to help the training and testing process in ML. Special attention was made to incorporating different dimensionality of the data to understand the dependency levels of each. Five independent variables out of twelve variables were chosen based on their impact on the dependent variable, and by considering the descriptive statistics, out of three models developed (Random Forest classifier, SVM, and Decision tree classifier), random forest Classifier (Accuracy – 0.8542) gave the highest value for accuracy. Overall, this work met its goals of improving student performance by identifying students who are at-risk and dropout, emphasizing the necessity of using both static and dynamic data.

Keywords: academic performance prediction, e learning, learning analytics, machine learning, predictive model

Procedia PDF Downloads 130
13556 A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning

Authors: Samina Khalid, Shamila Nasreen

Abstract:

Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.

Keywords: age related macular degeneration, feature selection feature subset selection feature extraction/transformation, FSA’s, relief, correlation based method, PCA, ICA

Procedia PDF Downloads 468
13555 Identification of Biological Pathways Causative for Breast Cancer Using Unsupervised Machine Learning

Authors: Karthik Mittal

Abstract:

This study performs an unsupervised machine learning analysis to find clusters of related SNPs which highlight biological pathways that are important for the biological mechanisms of breast cancer. Studying genetic variations in isolation is illogical because these genetic variations are known to modulate protein production and function; the downstream effects of these modifications on biological outcomes are highly interconnected. After extracting the SNPs and their effect on different types of breast cancer using the MRBase library, two unsupervised machine learning clustering algorithms were implemented on the genetic variants: a k-means clustering algorithm and a hierarchical clustering algorithm; furthermore, principal component analysis was executed to visually represent the data. These algorithms specifically used the SNP’s beta value on the three different types of breast cancer tested in this project (estrogen-receptor positive breast cancer, estrogen-receptor negative breast cancer, and breast cancer in general) to perform this clustering. Two significant genetic pathways validated the clustering produced by this project: the MAPK signaling pathway and the connection between the BRCA2 gene and the ESR1 gene. This study provides the first proof of concept showing the importance of unsupervised machine learning in interpreting GWAS summary statistics.

Keywords: breast cancer, computational biology, unsupervised machine learning, k-means, PCA

Procedia PDF Downloads 125
13554 Review of Different Machine Learning Algorithms

Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui

Abstract:

Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.

Keywords: Data Mining, Web Mining, classification, ML Algorithms

Procedia PDF Downloads 266
13553 Neural Network and Support Vector Machine for Prediction of Foot Disorders Based on Foot Analysis

Authors: Monireh Ahmadi Bani, Adel Khorramrouz, Lalenoor Morvarid, Bagheri Mahtab

Abstract:

Background:- Foot disorders are common in musculoskeletal problems. Plantar pressure distribution measurement is one the most important part of foot disorders diagnosis for quantitative analysis. However, the association of plantar pressure and foot disorders is not clear. With the growth of dataset and machine learning methods, the relationship between foot disorders and plantar pressures can be detected. Significance of the study:- The purpose of this study was to predict the probability of common foot disorders based on peak plantar pressure distribution and center of pressure during walking. Methodologies:- 2323 participants were assessed in a foot therapy clinic between 2015 and 2021. Foot disorders were diagnosed by an experienced physician and then they were asked to walk on a force plate scanner. After the data preprocessing, due to the difference in walking time and foot size, we normalized the samples based on time and foot size. Some of force plate variables were selected as input to a deep neural network (DNN), and the probability of any each foot disorder was measured. In next step, we used support vector machine (SVM) and run dataset for each foot disorder (classification of yes or no). We compared DNN and SVM for foot disorders prediction based on plantar pressure distributions and center of pressure. Findings:- The results demonstrated that the accuracy of deep learning architecture is sufficient for most clinical and research applications in the study population. In addition, the SVM approach has more accuracy for predictions, enabling applications for foot disorders diagnosis. The detection accuracy was 71% by the deep learning algorithm and 78% by the SVM algorithm. Moreover, when we worked with peak plantar pressure distribution, it was more accurate than center of pressure dataset. Conclusion:- Both algorithms- deep learning and SVM will help therapist and patients to improve the data pool and enhance foot disorders prediction with less expense and error after removing some restrictions properly.

Keywords: deep neural network, foot disorder, plantar pressure, support vector machine

Procedia PDF Downloads 321
13552 Intrusion Detection Based on Graph Oriented Big Data Analytics

Authors: Ahlem Abid, Farah Jemili

Abstract:

Intrusion detection has been the subject of numerous studies in industry and academia, but cyber security analysts always want greater precision and global threat analysis to secure their systems in cyberspace. To improve intrusion detection system, the visualisation of the security events in form of graphs and diagrams is important to improve the accuracy of alerts. In this paper, we propose an approach of an IDS based on cloud computing, big data technique and using a machine learning graph algorithm which can detect in real time different attacks as early as possible. We use the MAWILab intrusion detection dataset . We choose Microsoft Azure as a unified cloud environment to load our dataset on. We implement the k2 algorithm which is a graphical machine learning algorithm to classify attacks. Our system showed a good performance due to the graphical machine learning algorithm and spark structured streaming engine.

Keywords: Apache Spark Streaming, Graph, Intrusion detection, k2 algorithm, Machine Learning, MAWILab, Microsoft Azure Cloud

Procedia PDF Downloads 121
13551 Heart Attack Prediction Using Several Machine Learning Methods

Authors: Suzan Anwar, Utkarsh Goyal

Abstract:

Heart rate (HR) is a predictor of cardiovascular, cerebrovascular, and all-cause mortality in the general population, as well as in patients with cardio and cerebrovascular diseases. Machine learning (ML) significantly improves the accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment while avoiding unnecessary treatment of others. This research examines relationship between the individual's various heart health inputs like age, sex, cp, trestbps, thalach, oldpeaketc, and the likelihood of developing heart disease. Machine learning techniques like logistic regression and decision tree, and Python are used. The results of testing and evaluating the model using the Heart Failure Prediction Dataset show the chance of a person having a heart disease with variable accuracy. Logistic regression has yielded an accuracy of 80.48% without data handling. With data handling (normalization, standardscaler), the logistic regression resulted in improved accuracy of 87.80%, decision tree 100%, random forest 100%, and SVM 100%.

Keywords: heart rate, machine learning, SVM, decision tree, logistic regression, random forest

Procedia PDF Downloads 122
13550 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation

Procedia PDF Downloads 214
13549 Insider Theft Detection in Organizations Using Keylogger and Machine Learning

Authors: Shamatha Shetty, Sakshi Dhabadi, Prerana M., Indushree B.

Abstract:

About 66% of firms claim that insider attacks are more likely to happen. The frequency of insider incidents has increased by 47% in the last two years. The goal of this work is to prevent dangerous employee behavior by using keyloggers and the Machine Learning (ML) model. Every keystroke that the user enters is recorded by the keylogging program, also known as keystroke logging. Keyloggers are used to stop improper use of the system. This enables us to collect all textual data, save it in a CSV file, and analyze it using an ML algorithm and the VirusTotal API. Many large companies use it to methodically monitor how their employees use computers, the internet, and email. We are utilizing the SVM algorithm and the VirusTotal API to improve overall efficiency and accuracy in identifying specific patterns and words to automate and offer the report for improved monitoring.

Keywords: cyber security, machine learning, cyclic process, email notification

Procedia PDF Downloads 34
13548 Big Data in Telecom Industry: Effective Predictive Techniques on Call Detail Records

Authors: Sara ElElimy, Samir Moustafa

Abstract:

Mobile network operators start to face many challenges in the digital era, especially with high demands from customers. Since mobile network operators are considered a source of big data, traditional techniques are not effective with new era of big data, Internet of things (IoT) and 5G; as a result, handling effectively different big datasets becomes a vital task for operators with the continuous growth of data and moving from long term evolution (LTE) to 5G. So, there is an urgent need for effective Big data analytics to predict future demands, traffic, and network performance to full fill the requirements of the fifth generation of mobile network technology. In this paper, we introduce data science techniques using machine learning and deep learning algorithms: the autoregressive integrated moving average (ARIMA), Bayesian-based curve fitting, and recurrent neural network (RNN) are employed for a data-driven application to mobile network operators. The main framework included in models are identification parameters of each model, estimation, prediction, and final data-driven application of this prediction from business and network performance applications. These models are applied to Telecom Italia Big Data challenge call detail records (CDRs) datasets. The performance of these models is found out using a specific well-known evaluation criteria shows that ARIMA (machine learning-based model) is more accurate as a predictive model in such a dataset than the RNN (deep learning model).

Keywords: big data analytics, machine learning, CDRs, 5G

Procedia PDF Downloads 120
13547 Review on Implementation of Artificial Intelligence and Machine Learning for Controlling Traffic and Avoiding Accidents

Authors: Neha Singh, Shristi Singh

Abstract:

Accidents involving motor vehicles are more likely to cause serious injuries and fatalities. It also has a host of other perpetual issues, such as the regular loss of life and goods in accidents. To solve these issues, appropriate measures must be implemented, such as establishing an autonomous incident detection system that makes use of machine learning and artificial intelligence. In order to reduce traffic accidents, this article examines the overview of artificial intelligence and machine learning in autonomous event detection systems. The paper explores the major issues, prospective solutions, and use of artificial intelligence and machine learning in road transportation systems for minimising traffic accidents. There is a lot of discussion on additional, fresh, and developing approaches that less frequent accidents in the transportation industry. The study structured the following subtopics specifically: traffic management using machine learning and artificial intelligence and an incident detector with these two technologies. The internet of vehicles and vehicle ad hoc networks, as well as the use of wireless communication technologies like 5G wireless networks and the use of machine learning and artificial intelligence for the planning of road transportation systems, are elaborated. In addition, safety is the primary concern of road transportation. Route optimization, cargo volume forecasting, predictive fleet maintenance, real-time vehicle tracking, and traffic management, according to the review's key conclusions, are essential for ensuring the safety of road transportation networks. In addition to highlighting research trends, unanswered problems, and key research conclusions, the study also discusses the difficulties in applying artificial intelligence to road transport systems. Planning and managing the road transportation system might use the work as a resource.

Keywords: artificial intelligence, machine learning, incident detector, road transport systems, traffic management, automatic incident detection, deep learning

Procedia PDF Downloads 80
13546 Predictive Modeling of Student Behavior in Virtual Reality: A Machine Learning Approach

Authors: Gayathri Sadanala, Shibam Pokhrel, Owen Murphy

Abstract:

In the ever-evolving landscape of education, Virtual Reality (VR) environments offer a promising avenue for enhancing student engagement and learning experiences. However, understanding and predicting student behavior within these immersive settings remain challenging tasks. This paper presents a comprehensive study on the predictive modeling of student behavior in VR using machine learning techniques. We introduce a rich data set capturing student interactions, movements, and progress within a VR orientation program. The dataset is divided into training and testing sets, allowing us to develop and evaluate predictive models for various aspects of student behavior, including engagement levels, task completion, and performance. Our machine learning approach leverages a combination of feature engineering and model selection to reveal hidden patterns in the data. We employ regression and classification models to predict student outcomes, and the results showcase promising accuracy in forecasting behavior within VR environments. Furthermore, we demonstrate the practical implications of our predictive models for personalized VR-based learning experiences and early intervention strategies. By uncovering the intricate relationship between student behavior and VR interactions, we provide valuable insights for educators, designers, and developers seeking to optimize virtual learning environments.

Keywords: interaction, machine learning, predictive modeling, virtual reality

Procedia PDF Downloads 101
13545 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 26
13544 Analyzing Tools and Techniques for Classification In Educational Data Mining: A Survey

Authors: D. I. George Amalarethinam, A. Emima

Abstract:

Educational Data Mining (EDM) is one of the newest topics to emerge in recent years, and it is concerned with developing methods for analyzing various types of data gathered from the educational circle. EDM methods and techniques with machine learning algorithms are used to extract meaningful and usable information from huge databases. For scientists and researchers, realistic applications of Machine Learning in the EDM sectors offer new frontiers and present new problems. One of the most important research areas in EDM is predicting student success. The prediction algorithms and techniques must be developed to forecast students' performance, which aids the tutor, institution to boost the level of student’s performance. This paper examines various classification techniques in prediction methods and data mining tools used in EDM.

Keywords: classification technique, data mining, EDM methods, prediction methods

Procedia PDF Downloads 102
13543 Non-Targeted Adversarial Image Classification Attack-Region Modification Methods

Authors: Bandar Alahmadi, Lethia Jackson

Abstract:

Machine Learning model is used today in many real-life applications. The safety and security of such model is important, so the results of the model are as accurate as possible. One challenge of machine learning model security is the adversarial examples attack. Adversarial examples are designed by the attacker to cause the machine learning model to misclassify the input. We propose a method to generate adversarial examples to attack image classifiers. We are modifying the successfully classified images, so a classifier misclassifies them after the modification. In our method, we do not update the whole image, but instead we detect the important region, modify it, place it back to the original image, and then run it through a classifier. The algorithm modifies the detected region using two methods. First, it will add abstract image matrix on back of the detected image matrix. Then, it will perform a rotation attack to rotate the detected region around its axes, and embed the trace of image in image background. Finally, the attacked region is placed in its original position, from where it was removed, and a smoothing filter is applied to smooth the background with foreground. We test our method in cascade classifier, and the algorithm is efficient, the classifier confident has dropped to almost zero. We also try it in CNN (Convolutional neural network) with higher setting and the algorithm was successfully worked.

Keywords: adversarial examples, attack, computer vision, image processing

Procedia PDF Downloads 319
13542 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique

Authors: C. Manjula, Lilly Florence

Abstract:

Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.

Keywords: decision tree, genetic algorithm, machine learning, software defect prediction

Procedia PDF Downloads 313
13541 Injury Prediction for Soccer Players Using Machine Learning

Authors: Amiel Satvedi, Richard Pyne

Abstract:

Injuries in professional sports occur on a regular basis. Some may be minor, while others can cause huge impact on a player's career and earning potential. In soccer, there is a high risk of players picking up injuries during game time. This research work seeks to help soccer players reduce the risk of getting injured by predicting the likelihood of injury while playing in the near future and then providing recommendations for intervention. The injury prediction tool will use a soccer player's number of minutes played on the field, number of appearances, distance covered and performance data for the current and previous seasons as variables to conduct statistical analysis and provide injury predictive results using a machine learning linear regression model.

Keywords: injury predictor, soccer injury prevention, machine learning in soccer, big data in soccer

Procedia PDF Downloads 151
13540 A Survey in Techniques for Imbalanced Intrusion Detection System Datasets

Authors: Najmeh Abedzadeh, Matthew Jacobs

Abstract:

An intrusion detection system (IDS) is a software application that monitors malicious activities and generates alerts if any are detected. However, most network activities in IDS datasets are normal, and the relatively few numbers of attacks make the available data imbalanced. Consequently, cyber-attacks can hide inside a large number of normal activities, and machine learning algorithms have difficulty learning and classifying the data correctly. In this paper, a comprehensive literature review is conducted on different types of algorithms for both implementing the IDS and methods in correcting the imbalanced IDS dataset. The most famous algorithms are machine learning (ML), deep learning (DL), synthetic minority over-sampling technique (SMOTE), and reinforcement learning (RL). Most of the research use the CSE-CIC-IDS2017, CSE-CIC-IDS2018, and NSL-KDD datasets for evaluating their algorithms.

Keywords: IDS, imbalanced datasets, sampling algorithms, big data

Procedia PDF Downloads 291
13539 Fake News Detection for Korean News Using Machine Learning Techniques

Authors: Tae-Uk Yun, Pullip Chung, Kee-Young Kwahk, Hyunchul Ahn

Abstract:

Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection using machine learning techniques over the past years. But, there have been no prior studies proposed an automated fake news detection method for Korean news to our best knowledge. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (topic modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as logistic regression, backpropagation network, support vector machine, and deep neural network can be applied. To validate the effectiveness of the proposed method, we collected about 200 short Korean news from Seoul National University’s FactCheck. which provides with detailed analysis reports from 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Keywords: fake news detection, Korean news, machine learning, text mining

Procedia PDF Downloads 248
13538 Comparison of Different Machine Learning Algorithms for Solubility Prediction

Authors: Muhammet Baldan, Emel Timuçin

Abstract:

Molecular solubility prediction plays a crucial role in various fields, such as drug discovery, environmental science, and material science. In this study, we compare the performance of five machine learning algorithms—linear regression, support vector machines (SVM), random forests, gradient boosting machines (GBM), and neural networks—for predicting molecular solubility using the AqSolDB dataset. The dataset consists of 9981 data points with their corresponding solubility values. MACCS keys (166 bits), RDKit properties (20 properties), and structural properties(3) features are extracted for every smile representation in the dataset. A total of 189 features were used for training and testing for every molecule. Each algorithm is trained on a subset of the dataset and evaluated using metrics accuracy scores. Additionally, computational time for training and testing is recorded to assess the efficiency of each algorithm. Our results demonstrate that random forest model outperformed other algorithms in terms of predictive accuracy, achieving an 0.93 accuracy score. Gradient boosting machines and neural networks also exhibit strong performance, closely followed by support vector machines. Linear regression, while simpler in nature, demonstrates competitive performance but with slightly higher errors compared to ensemble methods. Overall, this study provides valuable insights into the performance of machine learning algorithms for molecular solubility prediction, highlighting the importance of algorithm selection in achieving accurate and efficient predictions in practical applications.

Keywords: random forest, machine learning, comparison, feature extraction

Procedia PDF Downloads 14
13537 One-Class Support Vector Machine for Sentiment Analysis of Movie Review Documents

Authors: Chothmal, Basant Agarwal

Abstract:

Sentiment analysis means to classify a given review document into positive or negative polar document. Sentiment analysis research has been increased tremendously in recent times due to its large number of applications in the industry and academia. Sentiment analysis models can be used to determine the opinion of the user towards any entity or product. E-commerce companies can use sentiment analysis model to improve their products on the basis of users’ opinion. In this paper, we propose a new One-class Support Vector Machine (One-class SVM) based sentiment analysis model for movie review documents. In the proposed approach, we initially extract features from one class of documents, and further test the given documents with the one-class SVM model if a given new test document lies in the model or it is an outlier. Experimental results show the effectiveness of the proposed sentiment analysis model.

Keywords: feature selection methods, machine learning, NB, one-class SVM, sentiment analysis, support vector machine

Procedia PDF Downloads 489
13536 Organizational Innovations of the 20th Century as High Tech of the 21st: Evidence from Patent Data

Authors: Valery Yakubovich, Shuping wu

Abstract:

Organization theorists have long claimed that organizational innovations are nontechnological, in part because they are unpatentable. The claim rests on the assumption that organizational innovations are abstract ideas embodied in persons and contexts rather than in context-free practical tools. However, over the last three decades, organizational knowledge has been increasingly embodied in digital tools which, in principle, can be patented. To provide the first empirical evidence regarding the patentability of organizational innovations, we trained two machine learning algorithms to identify a population of 205,434 patent applications for organizational technologies (OrgTech) and, among them, 141,285 applications that use organizational innovations accumulated over the 20th century. Our event history analysis of the probability of patenting an OrgTech invention shows that ideas from organizational innovations decrease the probability of patent allowance unless they describe a practical tool. We conclude that the present-day digital transformation places organizational innovations in the realm of high tech and turns the debate about organizational technologies into the challenge of designing practical organizational tools that embody big ideas about organizing. We outline an agenda for patent-based research on OrgTech as an emerging phenomenon.

Keywords: organizational innovation, organizational technology, high tech, patents, machine learning

Procedia PDF Downloads 100