Search results for: light gradient boosting model (LGBM)
20554 Development and Adaptation of a LGBM Machine Learning Model, with a Suitable Concept Drift Detection and Adaptation Technique, for Barcelona Household Electric Load Forecasting During Covid-19 Pandemic Periods (Pre-Pandemic and Strict Lockdown)
Authors: Eric Pla Erra, Mariana Jimenez Martinez
Abstract:
While aggregated loads at a community level tend to be easier to predict, individual household load forecasting present more challenges with higher volatility and uncertainty. Furthermore, the drastic changes that our behavior patterns have suffered due to the COVID-19 pandemic have modified our daily electrical consumption curves and, therefore, further complicated the forecasting methods used to predict short-term electric load. Load forecasting is vital for the smooth and optimized planning and operation of our electric grids, but it also plays a crucial role for individual domestic consumers that rely on a HEMS (Home Energy Management Systems) to optimize their energy usage through self-generation, storage, or smart appliances management. An accurate forecasting leads to higher energy savings and overall energy efficiency of the household when paired with a proper HEMS. In order to study how COVID-19 has affected the accuracy of forecasting methods, an evaluation of the performance of a state-of-the-art LGBM (Light Gradient Boosting Model) will be conducted during the transition between pre-pandemic and lockdowns periods, considering day-ahead electric load forecasting. LGBM improves the capabilities of standard Decision Tree models in both speed and reduction of memory consumption, but it still offers a high accuracy. Even though LGBM has complex non-linear modelling capabilities, it has proven to be a competitive method under challenging forecasting scenarios such as short series, heterogeneous series, or data patterns with minimal prior knowledge. An adaptation of the LGBM model – called “resilient LGBM” – will be also tested, incorporating a concept drift detection technique for time series analysis, with the purpose to evaluate its capabilities to improve the model’s accuracy during extreme events such as COVID-19 lockdowns. The results for the LGBM and resilient LGBM will be compared using standard RMSE (Root Mean Squared Error) as the main performance metric. The models’ performance will be evaluated over a set of real households’ hourly electricity consumption data measured before and during the COVID-19 pandemic. All households are located in the city of Barcelona, Spain, and present different consumption profiles. This study is carried out under the ComMit-20 project, financed by AGAUR (Agència de Gestiód’AjutsUniversitaris), which aims to determine the short and long-term impacts of the COVID-19 pandemic on building energy consumption, incrementing the resilience of electrical systems through the use of tools such as HEMS and artificial intelligence.Keywords: concept drift, forecasting, home energy management system (HEMS), light gradient boosting model (LGBM)
Procedia PDF Downloads 10520553 The Use of Stochastic Gradient Boosting Method for Multi-Model Combination of Rainfall-Runoff Models
Authors: Phanida Phukoetphim, Asaad Y. Shamseldin
Abstract:
In this study, the novel Stochastic Gradient Boosting (SGB) combination method is addressed for producing daily river flows from four different rain-runoff models of Ohinemuri catchment, New Zealand. The selected rainfall-runoff models are two empirical black-box models: linear perturbation model and linear varying gain factor model, two conceptual models: soil moisture accounting and routing model and Nedbør-Afrstrømnings model. In this study, the simple average combination method and the weighted average combination method were used as a benchmark for comparing the results of the novel SGB combination method. The models and combination results are evaluated using statistical and graphical criteria. Overall results of this study show that the use of combination technique can certainly improve the simulated river flows of four selected models for Ohinemuri catchment, New Zealand. The results also indicate that the novel SGB combination method is capable of accurate prediction when used in a combination method of the simulated river flows in New Zealand.Keywords: multi-model combination, rainfall-runoff modeling, stochastic gradient boosting, bioinformatics
Procedia PDF Downloads 33920552 Design of Geochemical Maps of Industrial City Using Gradient Boosting and Geographic Information System
Authors: Ruslan Safarov, Zhanat Shomanova, Yuri Nossenko, Zhandos Mussayev, Ayana Baltabek
Abstract:
Geochemical maps of distribution of polluting elements V, Cr, Mn, Co, Ni, Cu, Zn, Mo, Cd, Pb on the territory of the Pavlodar city (Kazakhstan), which is an industrial hub were designed. The samples of soil were taken from 100 locations. Elemental analysis has been performed using XRF. The obtained data was used for training of the computational model with gradient boosting algorithm. The optimal parameters of model as well as the loss function were selected. The computational model was used for prediction of polluting elements concentration for 1000 evenly distributed points. Based on predicted data geochemical maps were created. Additionally, the total pollution index Zc was calculated for every from 1000 point. The spatial distribution of the Zc index was visualized using GIS (QGIS). It was calculated that the maximum coverage area of the territory of the Pavlodar city belongs to the moderately hazardous category (89.7%). The visualization of the obtained data allowed us to conclude that the main source of contamination goes from the industrial zones where the strategic metallurgical and refining plants are placed.Keywords: Pavlodar, geochemical map, gradient boosting, CatBoost, QGIS, spatial distribution, heavy metals
Procedia PDF Downloads 8220551 Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset
Authors: Essam Al Daoud
Abstract:
Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.Keywords: gradient boosting, XGBoost, LightGBM, CatBoost, home credit
Procedia PDF Downloads 17120550 Development of Non-Intrusive Speech Evaluation Measure Using S-Transform and Light-Gbm
Authors: Tusar Kanti Dash, Ganapati Panda
Abstract:
The evaluation of speech quality and intelligence is critical to the overall effectiveness of the Speech Enhancement Algorithms. Several intrusive and non-intrusive measures are employed to calculate these parameters. Non-Intrusive Evaluation is most challenging as, very often, the reference clean speech data is not available. In this paper, a novel non-intrusive speech evaluation measure is proposed using audio features derived from the Stockwell transform. These features are used with the Light Gradient Boosting Machine for the effective prediction of speech quality and intelligibility. The proposed model is analyzed using noisy and reverberant speech from four databases, and the results are compared with the standard Intrusive Evaluation Measures. It is observed from the comparative analysis that the proposed model is performing better than the standard Non-Intrusive models.Keywords: non-Intrusive speech evaluation, S-transform, light GBM, speech quality, and intelligibility
Procedia PDF Downloads 26020549 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates
Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe
Abstract:
Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.Keywords: machine learning, MTB, WGS, drug resistant TB
Procedia PDF Downloads 5220548 Investigation of Extreme Gradient Boosting Model Prediction of Soil Strain-Shear Modulus
Authors: Ehsan Mehryaar, Reza Bushehri
Abstract:
One of the principal parameters defining the clay soil dynamic response is the strain-shear modulus relation. Predicting the strain and, subsequently, shear modulus reduction of the soil is essential for performance analysis of structures exposed to earthquake and dynamic loadings. Many soil properties affect soil’s dynamic behavior. In order to capture those effects, in this study, a database containing 1193 data points consists of maximum shear modulus, strain, moisture content, initial void ratio, plastic limit, liquid limit, initial confining pressure resulting from dynamic laboratory testing of 21 clays is collected for predicting the shear modulus vs. strain curve of soil. A model based on an extreme gradient boosting technique is proposed. A tree-structured parzan estimator hyper-parameter tuning algorithm is utilized simultaneously to find the best hyper-parameters for the model. The performance of the model is compared to the existing empirical equations using the coefficient of correlation and root mean square error.Keywords: XGBoost, hyper-parameter tuning, soil shear modulus, dynamic response
Procedia PDF Downloads 20120547 Parkinson’s Disease Detection Analysis through Machine Learning Approaches
Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee
Abstract:
Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier
Procedia PDF Downloads 12920546 Stacking Ensemble Approach for Combining Different Methods in Real Estate Prediction
Authors: Sol Girouard, Zona Kostic
Abstract:
A home is often the largest and most expensive purchase a person makes. Whether the decision leads to a successful outcome will be determined by a combination of critical factors. In this paper, we propose a method that efficiently handles all the factors in residential real estate and performs predictions given a feature space with high dimensionality while controlling for overfitting. The proposed method was built on gradient descent and boosting algorithms and uses a mixed optimizing technique to improve the prediction power. Usually, a single model cannot handle all the cases thus our approach builds multiple models based on different subsets of the predictors. The algorithm was tested on 3 million homes across the U.S., and the experimental results demonstrate the efficiency of this approach by outperforming techniques currently used in forecasting prices. With everyday changes on the real estate market, our proposed algorithm capitalizes from new events allowing more efficient predictions.Keywords: real estate prediction, gradient descent, boosting, ensemble methods, active learning, training
Procedia PDF Downloads 27720545 Artificial Intelligence-Based Detection of Individuals Suffering from Vestibular Disorder
Authors: Dua Hişam, Serhat İkizoğlu
Abstract:
Identifying the problem behind balance disorder is one of the most interesting topics in the medical literature. This study has considerably enhanced the development of artificial intelligence (AI) algorithms applying multiple machine learning (ML) models to sensory data on gait collected from humans to classify between normal people and those suffering from Vestibular System (VS) problems. Although AI is widely utilized as a diagnostic tool in medicine, AI models have not been used to perform feature extraction and identify VS disorders through training on raw data. In this study, three machine learning (ML) models, the Random Forest Classifier (RF), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN), have been trained to detect VS disorder, and the performance comparison of the algorithms has been made using accuracy, recall, precision, and f1-score. With an accuracy of 95.28 %, Random Forest Classifier (RF) was the most accurate model.Keywords: vestibular disorder, machine learning, random forest classifier, k-nearest neighbor, extreme gradient boosting
Procedia PDF Downloads 6920544 Predicting Wealth Status of Households Using Ensemble Machine Learning Algorithms
Authors: Habtamu Ayenew Asegie
Abstract:
Wealth, as opposed to income or consumption, implies a more stable and permanent status. Due to natural and human-made difficulties, households' economies will be diminished, and their well-being will fall into trouble. Hence, governments and humanitarian agencies offer considerable resources for poverty and malnutrition reduction efforts. One key factor in the effectiveness of such efforts is the accuracy with which low-income or poor populations can be identified. As a result, this study aims to predict a household’s wealth status using ensemble Machine learning (ML) algorithms. In this study, design science research methodology (DSRM) is employed, and four ML algorithms, Random Forest (RF), Adaptive Boosting (AdaBoost), Light Gradient Boosted Machine (LightGBM), and Extreme Gradient Boosting (XGBoost), have been used to train models. The Ethiopian Demographic and Health Survey (EDHS) dataset is accessed for this purpose from the Central Statistical Agency (CSA)'s database. Various data pre-processing techniques were employed, and the model training has been conducted using the scikit learn Python library functions. Model evaluation is executed using various metrics like Accuracy, Precision, Recall, F1-score, area under curve-the receiver operating characteristics (AUC-ROC), and subjective evaluations of domain experts. An optimal subset of hyper-parameters for the algorithms was selected through the grid search function for the best prediction. The RF model has performed better than the rest of the algorithms by achieving an accuracy of 96.06% and is better suited as a solution model for our purpose. Following RF, LightGBM, XGBoost, and AdaBoost algorithms have an accuracy of 91.53%, 88.44%, and 58.55%, respectively. The findings suggest that some of the features like ‘Age of household head’, ‘Total children ever born’ in a family, ‘Main roof material’ of their house, ‘Region’ they lived in, whether a household uses ‘Electricity’ or not, and ‘Type of toilet facility’ of a household are determinant factors to be a focal point for economic policymakers. The determinant risk factors, extracted rules, and designed artifact achieved 82.28% of the domain expert’s evaluation. Overall, the study shows ML techniques are effective in predicting the wealth status of households.Keywords: ensemble machine learning, households wealth status, predictive model, wealth status prediction
Procedia PDF Downloads 3920543 Advancements in Predicting Diabetes Biomarkers: A Machine Learning Epigenetic Approach
Authors: James Ladzekpo
Abstract:
Background: The urgent need to identify new pharmacological targets for diabetes treatment and prevention has been amplified by the disease's extensive impact on individuals and healthcare systems. A deeper insight into the biological underpinnings of diabetes is crucial for the creation of therapeutic strategies aimed at these biological processes. Current predictive models based on genetic variations fall short of accurately forecasting diabetes. Objectives: Our study aims to pinpoint key epigenetic factors that predispose individuals to diabetes. These factors will inform the development of an advanced predictive model that estimates diabetes risk from genetic profiles, utilizing state-of-the-art statistical and data mining methods. Methodology: We have implemented a recursive feature elimination with cross-validation using the support vector machine (SVM) approach for refined feature selection. Building on this, we developed six machine learning models, including logistic regression, k-Nearest Neighbors (k-NN), Naive Bayes, Random Forest, Gradient Boosting, and Multilayer Perceptron Neural Network, to evaluate their performance. Findings: The Gradient Boosting Classifier excelled, achieving a median recall of 92.17% and outstanding metrics such as area under the receiver operating characteristics curve (AUC) with a median of 68%, alongside median accuracy and precision scores of 76%. Through our machine learning analysis, we identified 31 genes significantly associated with diabetes traits, highlighting their potential as biomarkers and targets for diabetes management strategies. Conclusion: Particularly noteworthy were the Gradient Boosting Classifier and Multilayer Perceptron Neural Network, which demonstrated potential in diabetes outcome prediction. We recommend future investigations to incorporate larger cohorts and a wider array of predictive variables to enhance the models' predictive capabilities.Keywords: diabetes, machine learning, prediction, biomarkers
Procedia PDF Downloads 5520542 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector
Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh
Abstract:
A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score
Procedia PDF Downloads 13420541 Heart Ailment Prediction Using Machine Learning Methods
Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula
Abstract:
The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting
Procedia PDF Downloads 5120540 MIMIC: A Multi Input Micro-Influencers Classifier
Authors: Simone Leonardi, Luca Ardito
Abstract:
Micro-influencers are effective elements in the marketing strategies of companies and institutions because of their capability to create an hyper-engaged audience around a specific topic of interest. In recent years, many scientific approaches and commercial tools have handled the task of detecting this type of social media users. These strategies adopt solutions ranging from rule based machine learning models to deep neural networks and graph analysis on text, images, and account information. This work compares the existing solutions and proposes an ensemble method to generalize them with different input data and social media platforms. The deployed solution combines deep learning models on unstructured data with statistical machine learning models on structured data. We retrieve both social media accounts information and multimedia posts on Twitter and Instagram. These data are mapped into feature vectors for an eXtreme Gradient Boosting (XGBoost) classifier. Sixty different topics have been analyzed to build a rule based gold standard dataset and to compare the performances of our approach against baseline classifiers. We prove the effectiveness of our work by comparing the accuracy, precision, recall, and f1 score of our model with different configurations and architectures. We obtained an accuracy of 0.91 with our best performing model.Keywords: deep learning, gradient boosting, image processing, micro-influencers, NLP, social media
Procedia PDF Downloads 18320539 Green Function and Eshelby Tensor Based on Mindlin’s 2nd Gradient Model: An Explicit Study of Spherical Inclusion Case
Authors: A. Selmi, A. Bisharat
Abstract:
Using Fourier transform and based on the Mindlin's 2nd gradient model that involves two length scale parameters, the Green's function, the Eshelby tensor, and the Eshelby-like tensor for a spherical inclusion are derived. It is proved that the Eshelby tensor consists of two parts; the classical Eshelby tensor and a gradient part including the length scale parameters which enable the interpretation of the size effect. When the strain gradient is not taken into account, the obtained Green's function and Eshelby tensor reduce to its analogue based on the classical elasticity. The Eshelby tensor in and outside the inclusion, the volume average of the gradient part and the Eshelby-like tensor are explicitly obtained. Unlike the classical Eshelby tensor, the results show that the components of the new Eshelby tensor vary with the position and the inclusion dimensions. It is demonstrated that the contribution of the gradient part should not be neglected.Keywords: Eshelby tensor, Eshelby-like tensor, Green’s function, Mindlin’s 2nd gradient model, spherical inclusion
Procedia PDF Downloads 27020538 Mathematical Modeling of the Working Principle of Gravity Gradient Instrument
Authors: Danni Cong, Meiping Wu, Hua Mu, Xiaofeng He, Junxiang Lian, Juliang Cao, Shaokun Cai, Hao Qin
Abstract:
Gravity field is of great significance in geoscience, national economy and national security, and gravitational gradient measurement has been extensively studied due to its higher accuracy than gravity measurement. Gravity gradient sensor, being one of core devices of the gravity gradient instrument, plays a key role in measuring accuracy. Therefore, this paper starts from analyzing the working principle of the gravity gradient sensor by Newton’s law, and then considers the relative motion between inertial and non-inertial systems to build a relatively adequate mathematical model, laying a foundation for the measurement error calibration, measurement accuracy improvement.Keywords: gravity gradient, gravity gradient sensor, accelerometer, single-axis rotation modulation
Procedia PDF Downloads 32720537 Machine Learning Prediction of Diabetes Prevalence in the U.S. Using Demographic, Physical, and Lifestyle Indicators: A Study Based on NHANES 2009-2018
Authors: Oluwafunmibi Omotayo Fasanya, Augustine Kena Adjei
Abstract:
To develop a machine learning model to predict diabetes (DM) prevalence in the U.S. population using demographic characteristics, physical indicators, and lifestyle habits, and to analyze how these factors contribute to the likelihood of diabetes. We analyzed data from 23,546 participants aged 20 and older, who were non-pregnant, from the 2009-2018 National Health and Nutrition Examination Survey (NHANES). The dataset included key demographic (age, sex, ethnicity), physical (BMI, leg length, total cholesterol [TCHOL], fasting plasma glucose), and lifestyle indicators (smoking habits). A weighted sample was used to account for NHANES survey design features such as stratification and clustering. A classification machine learning model was trained to predict diabetes status. The target variable was binary (diabetes or non-diabetes) based on fasting plasma glucose measurements. The following models were evaluated: Logistic Regression (baseline), Random Forest Classifier, Gradient Boosting Machine (GBM), Support Vector Machine (SVM). Model performance was assessed using accuracy, F1-score, AUC-ROC, and precision-recall metrics. Feature importance was analyzed using SHAP values to interpret the contributions of variables such as age, BMI, ethnicity, and smoking status. The Gradient Boosting Machine (GBM) model outperformed other classifiers with an AUC-ROC score of 0.85. Feature importance analysis revealed the following key predictors: Age: The most significant predictor, with diabetes prevalence increasing with age, peaking around the 60s for males and 70s for females. BMI: Higher BMI was strongly associated with a higher risk of diabetes. Ethnicity: Black participants had the highest predicted prevalence of diabetes (14.6%), followed by Mexican-Americans (13.5%) and Whites (10.6%). TCHOL: Diabetics had lower total cholesterol levels, particularly among White participants (mean decline of 23.6 mg/dL). Smoking: Smoking showed a slight increase in diabetes risk among Whites (0.2%) but had a limited effect in other ethnic groups. Using machine learning models, we identified key demographic, physical, and lifestyle predictors of diabetes in the U.S. population. The results confirm that diabetes prevalence varies significantly across age, BMI, and ethnic groups, with lifestyle factors such as smoking contributing differently by ethnicity. These findings provide a basis for more targeted public health interventions and resource allocation for diabetes management.Keywords: diabetes, NHANES, random forest, gradient boosting machine, support vector machine
Procedia PDF Downloads 820536 Identification of Wiener Model Using Iterative Schemes
Authors: Vikram Saini, Lillie Dewan
Abstract:
This paper presents the iterative schemes based on Least square, Hierarchical Least Square and Stochastic Approximation Gradient method for the Identification of Wiener model with parametric structure. A gradient method is presented for the parameter estimation of wiener model with noise conditions based on the stochastic approximation. Simulation results are presented for the Wiener model structure with different static non-linear elements in the presence of colored noise to show the comparative analysis of the iterative methods. The stochastic gradient method shows improvement in the estimation performance and provides fast convergence of the parameters estimates.Keywords: hard non-linearity, least square, parameter estimation, stochastic approximation gradient, Wiener model
Procedia PDF Downloads 40520535 Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators
Authors: Jan Doutreloigne
Abstract:
This paper describes two methods for the reduction of the peak input current during the boosting of Dickson charge pumps. Both methods are implemented in the fully integrated Dickson charge pumps of a high-voltage display driver chip for smart-card applications. Experimental results reveal good correspondence with Spice simulations and show a reduction of the peak input current by a factor of 6 during boostingKeywords: bi-stable display driver, Dickson charge pump, high-voltage generator, peak current reduction, sub-pump boosting, variable frequency boosting
Procedia PDF Downloads 45720534 Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain
Authors: Zachary Blanks, Solomon Sonya
Abstract:
Poaching presents a serious threat to endangered animal species, environment conservations, and human life. Additionally, some poaching activity has even been linked to supplying funds to support terrorist networks elsewhere around the world. Consequently, agencies dedicated to protecting wildlife habitats have a near intractable task of adequately patrolling an entire area (spanning several thousand kilometers) given limited resources, funds, and personnel at their disposal. Thus, agencies need predictive tools that are both high-performing and easily implementable by the user to help in learning how the significant features (e.g. animal population densities, topography, behavior patterns of the criminals within the area, etc) interact with each other in hopes of abating poaching. This research develops a classification model using machine learning algorithms to aid in forecasting future attacks that is both easy to train and performs well when compared to other models. In this research, we demonstrate how data imputation methods (specifically predictive mean matching, gradient boosting, and random forest multiple imputation) can be applied to analyze data and create significant predictions across a varied data set. Specifically, we apply these methods to improve the accuracy of adopted prediction models (Logistic Regression, Support Vector Machine, etc). Finally, we assess the performance of the model and the accuracy of our data imputation methods by learning on a real-world data set constituting four years of imputed data and testing on one year of non-imputed data. This paper provides three main contributions. First, we extend work done by the Teamcore and CREATE (Center for Risk and Economic Analysis of Terrorism Events) research group at the University of Southern California (USC) working in conjunction with the Department of Homeland Security to apply game theory and machine learning algorithms to develop more efficient ways of reducing poaching. This research introduces ensemble methods (Random Forests and Stochastic Gradient Boosting) and applies it to real-world poaching data gathered from the Ugandan rain forest park rangers. Next, we consider the effect of data imputation on both the performance of various algorithms and the general accuracy of the method itself when applied to a dependent variable where a large number of observations are missing. Third, we provide an alternate approach to predict the probability of observing poaching both by season and by month. The results from this research are very promising. We conclude that by using Stochastic Gradient Boosting to predict observations for non-commercial poaching by season, we are able to produce statistically equivalent results while being orders of magnitude faster in computation time and complexity. Additionally, when predicting potential poaching incidents by individual month vice entire seasons, boosting techniques produce a mean area under the curve increase of approximately 3% relative to previous prediction schedules by entire seasons.Keywords: ensemble methods, imputation, machine learning, random forests, statistical analysis, stochastic gradient boosting, wildlife protection
Procedia PDF Downloads 29220533 Linear Study of Electrostatic Ion Temperature Gradient Mode with Entropy Gradient Drift and Sheared Ion Flows
Authors: M. Yaqub Khan, Usman Shabbir
Abstract:
History of plasma reveals that continuous struggle of experimentalists and theorists are not fruitful for confinement up to now. It needs a change to bring the research through entropy. Approximately, all the quantities like number density, temperature, electrostatic potential, etc. are connected to entropy. Therefore, it is better to change the way of research. In ion temperature gradient mode with the help of Braginskii model, Boltzmannian electrons, effect of velocity shear is studied inculcating entropy in the magnetoplasma. New dispersion relation is derived for ion temperature gradient mode, and dependence on entropy gradient drift is seen. It is also seen velocity shear enhances the instability but in anomalous transport, its role is not seen significantly but entropy. This work will be helpful to the next step of tokamak and space plasmas.Keywords: entropy, velocity shear, ion temperature gradient mode, drift
Procedia PDF Downloads 38820532 A New Car-Following Model with Consideration of the Brake Light
Authors: Zhiyuan Tang, Ju Zhang, Wenyuan Wu
Abstract:
In this research, a car-following model with consideration of the status of the brake light is proposed. The numerical results show that the stability of the traffic flow is improved. The ability of the brake light to reduce car accident is also showed.Keywords: brake light, car-following model, traffic flow, regional planning, transportation
Procedia PDF Downloads 57920531 Stock Prediction and Portfolio Optimization Thesis
Authors: Deniz Peksen
Abstract:
This thesis aims to predict trend movement of closing price of stock and to maximize portfolio by utilizing the predictions. In this context, the study aims to define a stock portfolio strategy from models created by using Logistic Regression, Gradient Boosting and Random Forest. Recently, predicting the trend of stock price has gained a significance role in making buy and sell decisions and generating returns with investment strategies formed by machine learning basis decisions. There are plenty of studies in the literature on the prediction of stock prices in capital markets using machine learning methods but most of them focus on closing prices instead of the direction of price trend. Our study differs from literature in terms of target definition. Ours is a classification problem which is focusing on the market trend in next 20 trading days. To predict trend direction, fourteen years of data were used for training. Following three years were used for validation. Finally, last three years were used for testing. Training data are between 2002-06-18 and 2016-12-30 Validation data are between 2017-01-02 and 2019-12-31 Testing data are between 2020-01-02 and 2022-03-17 We determine Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate as benchmarks which we should outperform. We compared our machine learning basis portfolio return on test data with return of Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate. We assessed our model performance with the help of roc-auc score and lift charts. We use logistic regression, Gradient Boosting and Random Forest with grid search approach to fine-tune hyper-parameters. As a result of the empirical study, the existence of uptrend and downtrend of five stocks could not be predicted by the models. When we use these predictions to define buy and sell decisions in order to generate model-based-portfolio, model-based-portfolio fails in test dataset. It was found that Model-based buy and sell decisions generated a stock portfolio strategy whose returns can not outperform non-model portfolio strategies on test dataset. We found that any effort for predicting the trend which is formulated on stock price is a challenge. We found same results as Random Walk Theory claims which says that stock price or price changes are unpredictable. Our model iterations failed on test dataset. Although, we built up several good models on validation dataset, we failed on test dataset. We implemented Random Forest, Gradient Boosting and Logistic Regression. We discovered that complex models did not provide advantage or additional performance while comparing them with Logistic Regression. More complexity did not lead us to reach better performance. Using a complex model is not an answer to figure out the stock-related prediction problem. Our approach was to predict the trend instead of the price. This approach converted our problem into classification. However, this label approach does not lead us to solve the stock prediction problem and deny or refute the accuracy of the Random Walk Theory for the stock price.Keywords: stock prediction, portfolio optimization, data science, machine learning
Procedia PDF Downloads 8020530 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values
Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi
Abstract:
A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.Keywords: eXtreme gradient boosting, missing data, Alzheimer disease, early mild cognitive impairment, late mild cognitive impair, multiclass classification, ADNI, support vector machine, random forest
Procedia PDF Downloads 18820529 Robust Recognition of Locomotion Patterns via Data-Driven Machine Learning in the Cloud Environment
Authors: Shinoy Vengaramkode Bhaskaran, Kaushik Sathupadi, Sandesh Achar
Abstract:
Human locomotion recognition is important in a variety of sectors, such as robotics, security, healthcare, fitness tracking and cloud computing. With the increasing pervasiveness of peripheral devices, particularly Inertial Measurement Units (IMUs) sensors, researchers have attempted to exploit these advancements in order to precisely and efficiently identify and categorize human activities. This research paper introduces a state-of-the-art methodology for the recognition of human locomotion patterns in a cloud environment. The methodology is based on a publicly available benchmark dataset. The investigation implements a denoising and windowing strategy to deal with the unprocessed data. Next, feature extraction is adopted to abstract the main cues from the data. The SelectKBest strategy is used to abstract optimal features from the data. Furthermore, state-of-the-art ML classifiers are used to evaluate the performance of the system, including logistic regression, random forest, gradient boosting and SVM have been investigated to accomplish precise locomotion classification. Finally, a detailed comparative analysis of results is presented to reveal the performance of recognition models.Keywords: artificial intelligence, cloud computing, IoT, human locomotion, gradient boosting, random forest, neural networks, body-worn sensors
Procedia PDF Downloads 1120528 Comparison of Different Machine Learning Algorithms for Solubility Prediction
Authors: Muhammet Baldan, Emel Timuçin
Abstract:
Molecular solubility prediction plays a crucial role in various fields, such as drug discovery, environmental science, and material science. In this study, we compare the performance of five machine learning algorithms—linear regression, support vector machines (SVM), random forests, gradient boosting machines (GBM), and neural networks—for predicting molecular solubility using the AqSolDB dataset. The dataset consists of 9981 data points with their corresponding solubility values. MACCS keys (166 bits), RDKit properties (20 properties), and structural properties(3) features are extracted for every smile representation in the dataset. A total of 189 features were used for training and testing for every molecule. Each algorithm is trained on a subset of the dataset and evaluated using metrics accuracy scores. Additionally, computational time for training and testing is recorded to assess the efficiency of each algorithm. Our results demonstrate that random forest model outperformed other algorithms in terms of predictive accuracy, achieving an 0.93 accuracy score. Gradient boosting machines and neural networks also exhibit strong performance, closely followed by support vector machines. Linear regression, while simpler in nature, demonstrates competitive performance but with slightly higher errors compared to ensemble methods. Overall, this study provides valuable insights into the performance of machine learning algorithms for molecular solubility prediction, highlighting the importance of algorithm selection in achieving accurate and efficient predictions in practical applications.Keywords: random forest, machine learning, comparison, feature extraction
Procedia PDF Downloads 4020527 Review on Quaternion Gradient Operator with Marginal and Vector Approaches for Colour Edge Detection
Authors: Nadia Ben Youssef, Aicha Bouzid
Abstract:
Gradient estimation is one of the most fundamental tasks in the field of image processing in general, and more particularly for color images since that the research in color image gradient remains limited. The widely used gradient method is Di Zenzo’s gradient operator, which is based on the measure of squared local contrast of color images. The proposed gradient mechanism, presented in this paper, is based on the principle of the Di Zenzo’s approach using quaternion representation. This edge detector is compared to a marginal approach based on multiscale product of wavelet transform and another vector approach based on quaternion convolution and vector gradient approach. The experimental results indicate that the proposed color gradient operator outperforms marginal approach, however, it is less efficient then the second vector approach.Keywords: gradient, edge detection, color image, quaternion
Procedia PDF Downloads 23420526 A Refined Nonlocal Strain Gradient Theory for Assessing Scaling-Dependent Vibration Behavior of Microbeams
Authors: Xiaobai Li, Li Li, Yujin Hu, Weiming Deng, Zhe Ding
Abstract:
A size-dependent Euler–Bernoulli beam model, which accounts for nonlocal stress field, strain gradient field and higher order inertia force field, is derived based on the nonlocal strain gradient theory considering velocity gradient effect. The governing equations and boundary conditions are derived both in dimensional and dimensionless form by employed the Hamilton principle. The analytical solutions based on different continuum theories are compared. The effect of higher order inertia terms is extremely significant in high frequency range. It is found that there exists an asymptotic frequency for the proposed beam model, while for the nonlocal strain gradient theory the solutions diverge. The effect of strain gradient field in thickness direction is significant in low frequencies domain and it cannot be neglected when the material strain length scale parameter is considerable with beam thickness. The influence of each of three size effect parameters on the natural frequencies are investigated. The natural frequencies increase with the increasing material strain gradient length scale parameter or decreasing velocity gradient length scale parameter and nonlocal parameter.Keywords: Euler-Bernoulli Beams, free vibration, higher order inertia, Nonlocal Strain Gradient Theory, velocity gradient
Procedia PDF Downloads 26720525 A Cellular Automaton Model Examining the Effects of Oxygen, Hydrogen Ions, and Lactate on Early Tumour Growth
Authors: Maymona Al-Husari, Craig Murdoch, Steven Webb
Abstract:
Some tumors are known to exhibit an extracellular pH that is more acidic than the intracellular, creating a 'reversed pH gradient' across the cell membrane and this has been shown to affect their invasive and metastatic potential. Tumour hypoxia also plays an important role in tumour development and has been directly linked to both tumour morphology and aggressiveness. In this paper, we present a hybrid mathematical model of intracellular pH regulation that examines the effect of oxygen and pH on tumour growth and morphology. In particular, we investigate the impact of pH regulatory mechanisms on the cellular pH gradient and tumour morphology. Analysis of the model shows that: low activity of the Na+/H+ exchanger or a high rate of anaerobic glycolysis can give rise to a 'fingering' tumour morphology; and a high activity of the lactate/H+ symporter can result in a reversed transmembrane pH gradient across a large portion of the tumour mass. Also, the reversed pH gradient is spatially heterogenous within the tumour, with a normal pH gradient observed within an intermediate growth layer, that is the layer between the proliferative inner and outermost layer of the tumour.Keywords: acidic pH, cellular automaton, ebola, tumour growth
Procedia PDF Downloads 331