Search results for: gradient boosting machines (GBM)
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1576

Search results for: gradient boosting machines (GBM)

1576 Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Authors: Essam Al Daoud

Abstract:

Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.

Keywords: gradient boosting, XGBoost, LightGBM, CatBoost, home credit

Procedia PDF Downloads 172
1575 Comparison of Different Machine Learning Algorithms for Solubility Prediction

Authors: Muhammet Baldan, Emel Timuçin

Abstract:

Molecular solubility prediction plays a crucial role in various fields, such as drug discovery, environmental science, and material science. In this study, we compare the performance of five machine learning algorithms—linear regression, support vector machines (SVM), random forests, gradient boosting machines (GBM), and neural networks—for predicting molecular solubility using the AqSolDB dataset. The dataset consists of 9981 data points with their corresponding solubility values. MACCS keys (166 bits), RDKit properties (20 properties), and structural properties(3) features are extracted for every smile representation in the dataset. A total of 189 features were used for training and testing for every molecule. Each algorithm is trained on a subset of the dataset and evaluated using metrics accuracy scores. Additionally, computational time for training and testing is recorded to assess the efficiency of each algorithm. Our results demonstrate that random forest model outperformed other algorithms in terms of predictive accuracy, achieving an 0.93 accuracy score. Gradient boosting machines and neural networks also exhibit strong performance, closely followed by support vector machines. Linear regression, while simpler in nature, demonstrates competitive performance but with slightly higher errors compared to ensemble methods. Overall, this study provides valuable insights into the performance of machine learning algorithms for molecular solubility prediction, highlighting the importance of algorithm selection in achieving accurate and efficient predictions in practical applications.

Keywords: random forest, machine learning, comparison, feature extraction

Procedia PDF Downloads 41
1574 The Use of Stochastic Gradient Boosting Method for Multi-Model Combination of Rainfall-Runoff Models

Authors: Phanida Phukoetphim, Asaad Y. Shamseldin

Abstract:

In this study, the novel Stochastic Gradient Boosting (SGB) combination method is addressed for producing daily river flows from four different rain-runoff models of Ohinemuri catchment, New Zealand. The selected rainfall-runoff models are two empirical black-box models: linear perturbation model and linear varying gain factor model, two conceptual models: soil moisture accounting and routing model and Nedbør-Afrstrømnings model. In this study, the simple average combination method and the weighted average combination method were used as a benchmark for comparing the results of the novel SGB combination method. The models and combination results are evaluated using statistical and graphical criteria. Overall results of this study show that the use of combination technique can certainly improve the simulated river flows of four selected models for Ohinemuri catchment, New Zealand. The results also indicate that the novel SGB combination method is capable of accurate prediction when used in a combination method of the simulated river flows in New Zealand.

Keywords: multi-model combination, rainfall-runoff modeling, stochastic gradient boosting, bioinformatics

Procedia PDF Downloads 339
1573 Design of Geochemical Maps of Industrial City Using Gradient Boosting and Geographic Information System

Authors: Ruslan Safarov, Zhanat Shomanova, Yuri Nossenko, Zhandos Mussayev, Ayana Baltabek

Abstract:

Geochemical maps of distribution of polluting elements V, Cr, Mn, Co, Ni, Cu, Zn, Mo, Cd, Pb on the territory of the Pavlodar city (Kazakhstan), which is an industrial hub were designed. The samples of soil were taken from 100 locations. Elemental analysis has been performed using XRF. The obtained data was used for training of the computational model with gradient boosting algorithm. The optimal parameters of model as well as the loss function were selected. The computational model was used for prediction of polluting elements concentration for 1000 evenly distributed points. Based on predicted data geochemical maps were created. Additionally, the total pollution index Zc was calculated for every from 1000 point. The spatial distribution of the Zc index was visualized using GIS (QGIS). It was calculated that the maximum coverage area of the territory of the Pavlodar city belongs to the moderately hazardous category (89.7%). The visualization of the obtained data allowed us to conclude that the main source of contamination goes from the industrial zones where the strategic metallurgical and refining plants are placed.

Keywords: Pavlodar, geochemical map, gradient boosting, CatBoost, QGIS, spatial distribution, heavy metals

Procedia PDF Downloads 83
1572 Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients

Authors: Karina Zaccari, Ernesto Cordeiro Marujo

Abstract:

This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.

Keywords: machine learning, medical diagnosis, meningitis detection, pediatric research

Procedia PDF Downloads 151
1571 Parkinson’s Disease Detection Analysis through Machine Learning Approaches

Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee

Abstract:

Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.

Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier

Procedia PDF Downloads 129
1570 Stacking Ensemble Approach for Combining Different Methods in Real Estate Prediction

Authors: Sol Girouard, Zona Kostic

Abstract:

A home is often the largest and most expensive purchase a person makes. Whether the decision leads to a successful outcome will be determined by a combination of critical factors. In this paper, we propose a method that efficiently handles all the factors in residential real estate and performs predictions given a feature space with high dimensionality while controlling for overfitting. The proposed method was built on gradient descent and boosting algorithms and uses a mixed optimizing technique to improve the prediction power. Usually, a single model cannot handle all the cases thus our approach builds multiple models based on different subsets of the predictors. The algorithm was tested on 3 million homes across the U.S., and the experimental results demonstrate the efficiency of this approach by outperforming techniques currently used in forecasting prices. With everyday changes on the real estate market, our proposed algorithm capitalizes from new events allowing more efficient predictions.

Keywords: real estate prediction, gradient descent, boosting, ensemble methods, active learning, training

Procedia PDF Downloads 277
1569 Heart Ailment Prediction Using Machine Learning Methods

Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula

Abstract:

The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.

Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting

Procedia PDF Downloads 51
1568 Artificial Intelligence-Based Detection of Individuals Suffering from Vestibular Disorder

Authors: Dua Hişam, Serhat İkizoğlu

Abstract:

Identifying the problem behind balance disorder is one of the most interesting topics in the medical literature. This study has considerably enhanced the development of artificial intelligence (AI) algorithms applying multiple machine learning (ML) models to sensory data on gait collected from humans to classify between normal people and those suffering from Vestibular System (VS) problems. Although AI is widely utilized as a diagnostic tool in medicine, AI models have not been used to perform feature extraction and identify VS disorders through training on raw data. In this study, three machine learning (ML) models, the Random Forest Classifier (RF), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN), have been trained to detect VS disorder, and the performance comparison of the algorithms has been made using accuracy, recall, precision, and f1-score. With an accuracy of 95.28 %, Random Forest Classifier (RF) was the most accurate model.

Keywords: vestibular disorder, machine learning, random forest classifier, k-nearest neighbor, extreme gradient boosting

Procedia PDF Downloads 69
1567 Application of Deep Learning and Ensemble Methods for Biomarker Discovery in Diabetic Nephropathy through Fibrosis and Propionate Metabolism Pathways

Authors: Oluwafunmibi Omotayo Fasanya, Augustine Kena Adjei

Abstract:

Diabetic nephropathy (DN) is a major complication of diabetes, with fibrosis and propionate metabolism playing critical roles in its progression. Identifying biomarkers linked to these pathways may provide novel insights into DN diagnosis and treatment. This study aims to identify biomarkers associated with fibrosis and propionate metabolism in DN. Analyze the biological pathways and regulatory mechanisms of these biomarkers. Develop a machine learning model to predict DN-related biomarkers and validate their functional roles. Publicly available transcriptome datasets related to DN (GSE96804 and GSE104948) were obtained from the GEO database (https://www.ncbi.nlm.nih.gov/gds), and 924 propionate metabolism-related genes (PMRGs) and 656 fibrosis-related genes (FRGs) were identified. The analysis began with the extraction of DN-differentially expressed genes (DN-DEGs) and propionate metabolism-related DEGs (PM-DEGs), followed by the intersection of these with fibrosis-related genes to identify key intersected genes. Instead of relying on traditional models, we employed a combination of deep neural networks (DNNs) and ensemble methods such as Gradient Boosting Machines (GBM) and XGBoost to enhance feature selection and biomarker discovery. Recursive feature elimination (RFE) was coupled with these advanced algorithms to refine the selection of the most critical biomarkers. Functional validation was conducted using convolutional neural networks (CNN) for gene set enrichment and immunoinfiltration analysis, revealing seven significant biomarkers—SLC37A4, ACOX2, GPD1, ACE2, SLC9A3, AGT, and PLG. These biomarkers are involved in critical biological processes such as fatty acid metabolism and glomerular development, providing a mechanistic link to DN progression. Furthermore, a TF–miRNA–mRNA regulatory network was constructed using natural language processing models to identify 8 transcription factors and 60 miRNAs that regulate these biomarkers, while a drug–gene interaction network revealed potential therapeutic targets such as UROKINASE–PLG and ATENOLOL–AGT. This integrative approach, leveraging deep learning and ensemble models, not only enhances the accuracy of biomarker discovery but also offers new perspectives on DN diagnosis and treatment, specifically targeting fibrosis and propionate metabolism pathways.

Keywords: diabetic nephropathy, deep neural networks, gradient boosting machines (GBM), XGBoost

Procedia PDF Downloads 10
1566 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates

Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe

Abstract:

Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.

Keywords: machine learning, MTB, WGS, drug resistant TB

Procedia PDF Downloads 52
1565 Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators

Authors: Jan Doutreloigne

Abstract:

This paper describes two methods for the reduction of the peak input current during the boosting of Dickson charge pumps. Both methods are implemented in the fully integrated Dickson charge pumps of a high-voltage display driver chip for smart-card applications. Experimental results reveal good correspondence with Spice simulations and show a reduction of the peak input current by a factor of 6 during boosting

Keywords: bi-stable display driver, Dickson charge pump, high-voltage generator, peak current reduction, sub-pump boosting, variable frequency boosting

Procedia PDF Downloads 457
1564 Advancements in Predicting Diabetes Biomarkers: A Machine Learning Epigenetic Approach

Authors: James Ladzekpo

Abstract:

Background: The urgent need to identify new pharmacological targets for diabetes treatment and prevention has been amplified by the disease's extensive impact on individuals and healthcare systems. A deeper insight into the biological underpinnings of diabetes is crucial for the creation of therapeutic strategies aimed at these biological processes. Current predictive models based on genetic variations fall short of accurately forecasting diabetes. Objectives: Our study aims to pinpoint key epigenetic factors that predispose individuals to diabetes. These factors will inform the development of an advanced predictive model that estimates diabetes risk from genetic profiles, utilizing state-of-the-art statistical and data mining methods. Methodology: We have implemented a recursive feature elimination with cross-validation using the support vector machine (SVM) approach for refined feature selection. Building on this, we developed six machine learning models, including logistic regression, k-Nearest Neighbors (k-NN), Naive Bayes, Random Forest, Gradient Boosting, and Multilayer Perceptron Neural Network, to evaluate their performance. Findings: The Gradient Boosting Classifier excelled, achieving a median recall of 92.17% and outstanding metrics such as area under the receiver operating characteristics curve (AUC) with a median of 68%, alongside median accuracy and precision scores of 76%. Through our machine learning analysis, we identified 31 genes significantly associated with diabetes traits, highlighting their potential as biomarkers and targets for diabetes management strategies. Conclusion: Particularly noteworthy were the Gradient Boosting Classifier and Multilayer Perceptron Neural Network, which demonstrated potential in diabetes outcome prediction. We recommend future investigations to incorporate larger cohorts and a wider array of predictive variables to enhance the models' predictive capabilities.

Keywords: diabetes, machine learning, prediction, biomarkers

Procedia PDF Downloads 55
1563 Investigation of Extreme Gradient Boosting Model Prediction of Soil Strain-Shear Modulus

Authors: Ehsan Mehryaar, Reza Bushehri

Abstract:

One of the principal parameters defining the clay soil dynamic response is the strain-shear modulus relation. Predicting the strain and, subsequently, shear modulus reduction of the soil is essential for performance analysis of structures exposed to earthquake and dynamic loadings. Many soil properties affect soil’s dynamic behavior. In order to capture those effects, in this study, a database containing 1193 data points consists of maximum shear modulus, strain, moisture content, initial void ratio, plastic limit, liquid limit, initial confining pressure resulting from dynamic laboratory testing of 21 clays is collected for predicting the shear modulus vs. strain curve of soil. A model based on an extreme gradient boosting technique is proposed. A tree-structured parzan estimator hyper-parameter tuning algorithm is utilized simultaneously to find the best hyper-parameters for the model. The performance of the model is compared to the existing empirical equations using the coefficient of correlation and root mean square error.

Keywords: XGBoost, hyper-parameter tuning, soil shear modulus, dynamic response

Procedia PDF Downloads 201
1562 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 134
1561 A Comprehensive Review of Axial Flux Machines and Its Applications

Authors: Shahbaz Amin, Sabir Hussain Shah, Sahib Khan

Abstract:

This paper presents a thorough review concerning the design types of axial flux permanent magnet machines (AFPM) in terms of different features such as construction, design, materials, and manufacturing. Particular emphasis is given on the design and performance analysis of AFPM machines. A comparison among different permanent magnet machines is also provided. First of all, early and modern axial flux machines are mentioned. Secondly, rotor construction of different axial flux machines is described, then different stator constructions are mentioned depending upon the presence of slots and stator back iron. Then according to the arrangement of the rotor stator structure the machines are classified into single, double and multi-stack arrangements. Advantages, disadvantages and applications of each type of rotor and stator are pointed out. Finally on the basis of the reviewed literature merits, demerits, features and application of different axial flux machines structures are explained and clarified. Thus, this paper provides connection between the machines that are currently being used in industry and the developments of AFPM throughout the years.

Keywords: axial flux machines, axial flux applications, coreless machines, PM machines

Procedia PDF Downloads 217
1560 Robust Recognition of Locomotion Patterns via Data-Driven Machine Learning in the Cloud Environment

Authors: Shinoy Vengaramkode Bhaskaran, Kaushik Sathupadi, Sandesh Achar

Abstract:

Human locomotion recognition is important in a variety of sectors, such as robotics, security, healthcare, fitness tracking and cloud computing. With the increasing pervasiveness of peripheral devices, particularly Inertial Measurement Units (IMUs) sensors, researchers have attempted to exploit these advancements in order to precisely and efficiently identify and categorize human activities. This research paper introduces a state-of-the-art methodology for the recognition of human locomotion patterns in a cloud environment. The methodology is based on a publicly available benchmark dataset. The investigation implements a denoising and windowing strategy to deal with the unprocessed data. Next, feature extraction is adopted to abstract the main cues from the data. The SelectKBest strategy is used to abstract optimal features from the data. Furthermore, state-of-the-art ML classifiers are used to evaluate the performance of the system, including logistic regression, random forest, gradient boosting and SVM have been investigated to accomplish precise locomotion classification. Finally, a detailed comparative analysis of results is presented to reveal the performance of recognition models.

Keywords: artificial intelligence, cloud computing, IoT, human locomotion, gradient boosting, random forest, neural networks, body-worn sensors

Procedia PDF Downloads 11
1559 Review on Quaternion Gradient Operator with Marginal and Vector Approaches for Colour Edge Detection

Authors: Nadia Ben Youssef, Aicha Bouzid

Abstract:

Gradient estimation is one of the most fundamental tasks in the field of image processing in general, and more particularly for color images since that the research in color image gradient remains limited. The widely used gradient method is Di Zenzo’s gradient operator, which is based on the measure of squared local contrast of color images. The proposed gradient mechanism, presented in this paper, is based on the principle of the Di Zenzo’s approach using quaternion representation. This edge detector is compared to a marginal approach based on multiscale product of wavelet transform and another vector approach based on quaternion convolution and vector gradient approach. The experimental results indicate that the proposed color gradient operator outperforms marginal approach, however, it is less efficient then the second vector approach.

Keywords: gradient, edge detection, color image, quaternion

Procedia PDF Downloads 234
1558 Mathematical Modeling of the Working Principle of Gravity Gradient Instrument

Authors: Danni Cong, Meiping Wu, Hua Mu, Xiaofeng He, Junxiang Lian, Juliang Cao, Shaokun Cai, Hao Qin

Abstract:

Gravity field is of great significance in geoscience, national economy and national security, and gravitational gradient measurement has been extensively studied due to its higher accuracy than gravity measurement. Gravity gradient sensor, being one of core devices of the gravity gradient instrument, plays a key role in measuring accuracy. Therefore, this paper starts from analyzing the working principle of the gravity gradient sensor by Newton’s law, and then considers the relative motion between inertial and non-inertial systems to build a relatively adequate mathematical model, laying a foundation for the measurement error calibration, measurement accuracy improvement.

Keywords: gravity gradient, gravity gradient sensor, accelerometer, single-axis rotation modulation

Procedia PDF Downloads 327
1557 MIMIC: A Multi Input Micro-Influencers Classifier

Authors: Simone Leonardi, Luca Ardito

Abstract:

Micro-influencers are effective elements in the marketing strategies of companies and institutions because of their capability to create an hyper-engaged audience around a specific topic of interest. In recent years, many scientific approaches and commercial tools have handled the task of detecting this type of social media users. These strategies adopt solutions ranging from rule based machine learning models to deep neural networks and graph analysis on text, images, and account information. This work compares the existing solutions and proposes an ensemble method to generalize them with different input data and social media platforms. The deployed solution combines deep learning models on unstructured data with statistical machine learning models on structured data. We retrieve both social media accounts information and multimedia posts on Twitter and Instagram. These data are mapped into feature vectors for an eXtreme Gradient Boosting (XGBoost) classifier. Sixty different topics have been analyzed to build a rule based gold standard dataset and to compare the performances of our approach against baseline classifiers. We prove the effectiveness of our work by comparing the accuracy, precision, recall, and f1 score of our model with different configurations and architectures. We obtained an accuracy of 0.91 with our best performing model.

Keywords: deep learning, gradient boosting, image processing, micro-influencers, NLP, social media

Procedia PDF Downloads 183
1556 A New Modification of Nonlinear Conjugate Gradient Coefficients with Global Convergence Properties

Authors: Ahmad Alhawarat, Mustafa Mamat, Mohd Rivaie, Ismail Mohd

Abstract:

Conjugate gradient method has been enormously used to solve large scale unconstrained optimization problems due to the number of iteration, memory, CPU time, and convergence property, in this paper we find a new class of nonlinear conjugate gradient coefficient with global convergence properties proved by exact line search. The numerical results for our new βK give a good result when it compared with well-known formulas.

Keywords: conjugate gradient method, conjugate gradient coefficient, global convergence

Procedia PDF Downloads 464
1555 Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain

Authors: Zachary Blanks, Solomon Sonya

Abstract:

Poaching presents a serious threat to endangered animal species, environment conservations, and human life. Additionally, some poaching activity has even been linked to supplying funds to support terrorist networks elsewhere around the world. Consequently, agencies dedicated to protecting wildlife habitats have a near intractable task of adequately patrolling an entire area (spanning several thousand kilometers) given limited resources, funds, and personnel at their disposal. Thus, agencies need predictive tools that are both high-performing and easily implementable by the user to help in learning how the significant features (e.g. animal population densities, topography, behavior patterns of the criminals within the area, etc) interact with each other in hopes of abating poaching. This research develops a classification model using machine learning algorithms to aid in forecasting future attacks that is both easy to train and performs well when compared to other models. In this research, we demonstrate how data imputation methods (specifically predictive mean matching, gradient boosting, and random forest multiple imputation) can be applied to analyze data and create significant predictions across a varied data set. Specifically, we apply these methods to improve the accuracy of adopted prediction models (Logistic Regression, Support Vector Machine, etc). Finally, we assess the performance of the model and the accuracy of our data imputation methods by learning on a real-world data set constituting four years of imputed data and testing on one year of non-imputed data. This paper provides three main contributions. First, we extend work done by the Teamcore and CREATE (Center for Risk and Economic Analysis of Terrorism Events) research group at the University of Southern California (USC) working in conjunction with the Department of Homeland Security to apply game theory and machine learning algorithms to develop more efficient ways of reducing poaching. This research introduces ensemble methods (Random Forests and Stochastic Gradient Boosting) and applies it to real-world poaching data gathered from the Ugandan rain forest park rangers. Next, we consider the effect of data imputation on both the performance of various algorithms and the general accuracy of the method itself when applied to a dependent variable where a large number of observations are missing. Third, we provide an alternate approach to predict the probability of observing poaching both by season and by month. The results from this research are very promising. We conclude that by using Stochastic Gradient Boosting to predict observations for non-commercial poaching by season, we are able to produce statistically equivalent results while being orders of magnitude faster in computation time and complexity. Additionally, when predicting potential poaching incidents by individual month vice entire seasons, boosting techniques produce a mean area under the curve increase of approximately 3% relative to previous prediction schedules by entire seasons.

Keywords: ensemble methods, imputation, machine learning, random forests, statistical analysis, stochastic gradient boosting, wildlife protection

Procedia PDF Downloads 292
1554 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values

Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi

Abstract:

A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.

Keywords: eXtreme gradient boosting, missing data, Alzheimer disease, early mild cognitive impairment, late mild cognitive impair, multiclass classification, ADNI, support vector machine, random forest

Procedia PDF Downloads 188
1553 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data

Authors: Gayathri Nagarajan, L. D. Dhinesh Babu

Abstract:

Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.

Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform

Procedia PDF Downloads 241
1552 Comparisons of Individual and Group Replacement Policies for a Series Connection System with Two Machines

Authors: Wen Liang Chang, Mei Wei Wang, Ruey Huei Yeh

Abstract:

This paper studies the comparisons of individual and group replacement policies for a series connection system with two machines. Suppose that manufacturer’s production system is a series connection system which is combined by two machines. For two machines, when machines fail within the operating time, minimal repair is performed for machines by the manufacturer. The manufacturer plans to a preventive replacement for machines at a pre-specified time to maintain system normal operation. Under these maintenance policies, the maintenance cost rate models of individual and group replacement for a series connection system with two machines is derived and further, optimal preventive replacement time is obtained such that the expected total maintenance cost rate is minimized. Finally, some numerical examples are given to illustrate the influences of individual and group replacement policies to the maintenance cost rate.

Keywords: individual replacement, group replacement, replacement time, two machines, series connection system

Procedia PDF Downloads 488
1551 Machine Learning Prediction of Diabetes Prevalence in the U.S. Using Demographic, Physical, and Lifestyle Indicators: A Study Based on NHANES 2009-2018

Authors: Oluwafunmibi Omotayo Fasanya, Augustine Kena Adjei

Abstract:

To develop a machine learning model to predict diabetes (DM) prevalence in the U.S. population using demographic characteristics, physical indicators, and lifestyle habits, and to analyze how these factors contribute to the likelihood of diabetes. We analyzed data from 23,546 participants aged 20 and older, who were non-pregnant, from the 2009-2018 National Health and Nutrition Examination Survey (NHANES). The dataset included key demographic (age, sex, ethnicity), physical (BMI, leg length, total cholesterol [TCHOL], fasting plasma glucose), and lifestyle indicators (smoking habits). A weighted sample was used to account for NHANES survey design features such as stratification and clustering. A classification machine learning model was trained to predict diabetes status. The target variable was binary (diabetes or non-diabetes) based on fasting plasma glucose measurements. The following models were evaluated: Logistic Regression (baseline), Random Forest Classifier, Gradient Boosting Machine (GBM), Support Vector Machine (SVM). Model performance was assessed using accuracy, F1-score, AUC-ROC, and precision-recall metrics. Feature importance was analyzed using SHAP values to interpret the contributions of variables such as age, BMI, ethnicity, and smoking status. The Gradient Boosting Machine (GBM) model outperformed other classifiers with an AUC-ROC score of 0.85. Feature importance analysis revealed the following key predictors: Age: The most significant predictor, with diabetes prevalence increasing with age, peaking around the 60s for males and 70s for females. BMI: Higher BMI was strongly associated with a higher risk of diabetes. Ethnicity: Black participants had the highest predicted prevalence of diabetes (14.6%), followed by Mexican-Americans (13.5%) and Whites (10.6%). TCHOL: Diabetics had lower total cholesterol levels, particularly among White participants (mean decline of 23.6 mg/dL). Smoking: Smoking showed a slight increase in diabetes risk among Whites (0.2%) but had a limited effect in other ethnic groups. Using machine learning models, we identified key demographic, physical, and lifestyle predictors of diabetes in the U.S. population. The results confirm that diabetes prevalence varies significantly across age, BMI, and ethnic groups, with lifestyle factors such as smoking contributing differently by ethnicity. These findings provide a basis for more targeted public health interventions and resource allocation for diabetes management.

Keywords: diabetes, NHANES, random forest, gradient boosting machine, support vector machine

Procedia PDF Downloads 10
1550 Linear Study of Electrostatic Ion Temperature Gradient Mode with Entropy Gradient Drift and Sheared Ion Flows

Authors: M. Yaqub Khan, Usman Shabbir

Abstract:

History of plasma reveals that continuous struggle of experimentalists and theorists are not fruitful for confinement up to now. It needs a change to bring the research through entropy. Approximately, all the quantities like number density, temperature, electrostatic potential, etc. are connected to entropy. Therefore, it is better to change the way of research. In ion temperature gradient mode with the help of Braginskii model, Boltzmannian electrons, effect of velocity shear is studied inculcating entropy in the magnetoplasma. New dispersion relation is derived for ion temperature gradient mode, and dependence on entropy gradient drift is seen. It is also seen velocity shear enhances the instability but in anomalous transport, its role is not seen significantly but entropy. This work will be helpful to the next step of tokamak and space plasmas.

Keywords: entropy, velocity shear, ion temperature gradient mode, drift

Procedia PDF Downloads 388
1549 Global Convergence of a Modified Three-Term Conjugate Gradient Algorithms

Authors: Belloufi Mohammed, Sellami Badreddine

Abstract:

This paper deals with a new nonlinear modified three-term conjugate gradient algorithm for solving large-scale unstrained optimization problems. The search direction of the algorithms from this class has three terms and is computed as modifications of the classical conjugate gradient algorithms to satisfy both the descent and the conjugacy conditions. An example of three-term conjugate gradient algorithm from this class, as modifications of the classical and well known Hestenes and Stiefel or of the CG_DESCENT by Hager and Zhang conjugate gradient algorithms, satisfying both the descent and the conjugacy conditions is presented. Under mild conditions, we prove that the modified three-term conjugate gradient algorithm with Wolfe type line search is globally convergent. Preliminary numerical results show the proposed method is very promising.

Keywords: unconstrained optimization, three-term conjugate gradient, sufficient descent property, line search

Procedia PDF Downloads 375
1548 Torsional Vibration of Carbon Nanotubes via Nonlocal Gradient Theories

Authors: Mustafa Arda, Metin Aydogdu

Abstract:

Carbon nanotubes (CNTs) have many possible application areas because of their superior physical properties. Nonlocal Theory, which unlike the classical theories, includes the size dependency. Nonlocal Stress and Strain Gradient approaches can be used in nanoscale static and dynamic analysis. In the present study, torsional vibration of CNTs was investigated according to nonlocal stress and strain gradient theories. Effects of the small scale parameters to the non-dimensional frequency were obtained. Results were compared with the Molecular Dynamics Simulation and Lattice Dynamics. Strain Gradient Theory has shown more weakening effect on CNT according to the Stress Gradient Theory. Combination of both theories gives more acceptable results rather than the classical and stress or strain gradient theory according to Lattice Dynamics.

Keywords: torsional vibration, carbon nanotubes, nonlocal gradient theory, stress, strain

Procedia PDF Downloads 389
1547 A New Family of Globally Convergent Conjugate Gradient Methods

Authors: B. Sellami, Y. Laskri, M. Belloufi

Abstract:

Conjugate gradient methods are an important class of methods for unconstrained optimization, especially for large-scale problems. Recently, they have been much studied. In this paper, a new family of conjugate gradient method is proposed for unconstrained optimization. This method includes the already existing two practical nonlinear conjugate gradient methods, which produces a descent search direction at every iteration and converges globally provided that the line search satisfies the Wolfe conditions. The numerical experiments are done to test the efficiency of the new method, which implies the new method is promising. In addition the methods related to this family are uniformly discussed.

Keywords: conjugate gradient method, global convergence, line search, unconstrained optimization

Procedia PDF Downloads 410