Search results for: stochastic gradient boosting
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1286

Search results for: stochastic gradient boosting

1286 The Use of Stochastic Gradient Boosting Method for Multi-Model Combination of Rainfall-Runoff Models

Authors: Phanida Phukoetphim, Asaad Y. Shamseldin

Abstract:

In this study, the novel Stochastic Gradient Boosting (SGB) combination method is addressed for producing daily river flows from four different rain-runoff models of Ohinemuri catchment, New Zealand. The selected rainfall-runoff models are two empirical black-box models: linear perturbation model and linear varying gain factor model, two conceptual models: soil moisture accounting and routing model and Nedbør-Afrstrømnings model. In this study, the simple average combination method and the weighted average combination method were used as a benchmark for comparing the results of the novel SGB combination method. The models and combination results are evaluated using statistical and graphical criteria. Overall results of this study show that the use of combination technique can certainly improve the simulated river flows of four selected models for Ohinemuri catchment, New Zealand. The results also indicate that the novel SGB combination method is capable of accurate prediction when used in a combination method of the simulated river flows in New Zealand.

Keywords: multi-model combination, rainfall-runoff modeling, stochastic gradient boosting, bioinformatics

Procedia PDF Downloads 297
1285 Identification of Wiener Model Using Iterative Schemes

Authors: Vikram Saini, Lillie Dewan

Abstract:

This paper presents the iterative schemes based on Least square, Hierarchical Least Square and Stochastic Approximation Gradient method for the Identification of Wiener model with parametric structure. A gradient method is presented for the parameter estimation of wiener model with noise conditions based on the stochastic approximation. Simulation results are presented for the Wiener model structure with different static non-linear elements in the presence of colored noise to show the comparative analysis of the iterative methods. The stochastic gradient method shows improvement in the estimation performance and provides fast convergence of the parameters estimates.

Keywords: hard non-linearity, least square, parameter estimation, stochastic approximation gradient, Wiener model

Procedia PDF Downloads 360
1284 Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Authors: Essam Al Daoud

Abstract:

Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.

Keywords: gradient boosting, XGBoost, LightGBM, CatBoost, home credit

Procedia PDF Downloads 120
1283 An Accelerated Stochastic Gradient Method with Momentum

Authors: Liang Liu, Xiaopeng Luo

Abstract:

In this paper, we propose an accelerated stochastic gradient method with momentum. The momentum term is the weighted average of generated gradients, and the weights decay inverse proportionally with the iteration times. Stochastic gradient descent with momentum (SGDM) uses weights that decay exponentially with the iteration times to generate the momentum term. Using exponential decay weights, variants of SGDM with inexplicable and complicated formats have been proposed to achieve better performance. However, the momentum update rules of our method are as simple as that of SGDM. We provide theoretical convergence analyses, which show both the exponential decay weights and our inverse proportional decay weights can limit the variance of the parameter moving directly to a region. Experimental results show that our method works well with many practical problems and outperforms SGDM.

Keywords: exponential decay rate weight, gradient descent, inverse proportional decay rate weight, momentum

Procedia PDF Downloads 123
1282 Design of Geochemical Maps of Industrial City Using Gradient Boosting and Geographic Information System

Authors: Ruslan Safarov, Zhanat Shomanova, Yuri Nossenko, Zhandos Mussayev, Ayana Baltabek

Abstract:

Geochemical maps of distribution of polluting elements V, Cr, Mn, Co, Ni, Cu, Zn, Mo, Cd, Pb on the territory of the Pavlodar city (Kazakhstan), which is an industrial hub were designed. The samples of soil were taken from 100 locations. Elemental analysis has been performed using XRF. The obtained data was used for training of the computational model with gradient boosting algorithm. The optimal parameters of model as well as the loss function were selected. The computational model was used for prediction of polluting elements concentration for 1000 evenly distributed points. Based on predicted data geochemical maps were created. Additionally, the total pollution index Zc was calculated for every from 1000 point. The spatial distribution of the Zc index was visualized using GIS (QGIS). It was calculated that the maximum coverage area of the territory of the Pavlodar city belongs to the moderately hazardous category (89.7%). The visualization of the obtained data allowed us to conclude that the main source of contamination goes from the industrial zones where the strategic metallurgical and refining plants are placed.

Keywords: Pavlodar, geochemical map, gradient boosting, CatBoost, QGIS, spatial distribution, heavy metals

Procedia PDF Downloads 46
1281 Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain

Authors: Zachary Blanks, Solomon Sonya

Abstract:

Poaching presents a serious threat to endangered animal species, environment conservations, and human life. Additionally, some poaching activity has even been linked to supplying funds to support terrorist networks elsewhere around the world. Consequently, agencies dedicated to protecting wildlife habitats have a near intractable task of adequately patrolling an entire area (spanning several thousand kilometers) given limited resources, funds, and personnel at their disposal. Thus, agencies need predictive tools that are both high-performing and easily implementable by the user to help in learning how the significant features (e.g. animal population densities, topography, behavior patterns of the criminals within the area, etc) interact with each other in hopes of abating poaching. This research develops a classification model using machine learning algorithms to aid in forecasting future attacks that is both easy to train and performs well when compared to other models. In this research, we demonstrate how data imputation methods (specifically predictive mean matching, gradient boosting, and random forest multiple imputation) can be applied to analyze data and create significant predictions across a varied data set. Specifically, we apply these methods to improve the accuracy of adopted prediction models (Logistic Regression, Support Vector Machine, etc). Finally, we assess the performance of the model and the accuracy of our data imputation methods by learning on a real-world data set constituting four years of imputed data and testing on one year of non-imputed data. This paper provides three main contributions. First, we extend work done by the Teamcore and CREATE (Center for Risk and Economic Analysis of Terrorism Events) research group at the University of Southern California (USC) working in conjunction with the Department of Homeland Security to apply game theory and machine learning algorithms to develop more efficient ways of reducing poaching. This research introduces ensemble methods (Random Forests and Stochastic Gradient Boosting) and applies it to real-world poaching data gathered from the Ugandan rain forest park rangers. Next, we consider the effect of data imputation on both the performance of various algorithms and the general accuracy of the method itself when applied to a dependent variable where a large number of observations are missing. Third, we provide an alternate approach to predict the probability of observing poaching both by season and by month. The results from this research are very promising. We conclude that by using Stochastic Gradient Boosting to predict observations for non-commercial poaching by season, we are able to produce statistically equivalent results while being orders of magnitude faster in computation time and complexity. Additionally, when predicting potential poaching incidents by individual month vice entire seasons, boosting techniques produce a mean area under the curve increase of approximately 3% relative to previous prediction schedules by entire seasons.

Keywords: ensemble methods, imputation, machine learning, random forests, statistical analysis, stochastic gradient boosting, wildlife protection

Procedia PDF Downloads 259
1280 Parkinson’s Disease Detection Analysis through Machine Learning Approaches

Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee

Abstract:

Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.

Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier

Procedia PDF Downloads 95
1279 Best Resource Recommendation for a Stochastic Process

Authors: Likewin Thomas, M. V. Manoj Kumar, B. Annappa

Abstract:

The aim of this study was to develop an Artificial Neural Network0 s recommendation model for an online process using the complexity of load, performance, and average servicing time of the resources. Here, the proposed model investigates the resource performance using stochastic gradient decent method for learning ranking function. A probabilistic cost function is implemented to identify the optimal θ values (load) on each resource. Based on this result the recommendation of resource suitable for performing the currently executing task is made. The test result of CoSeLoG project is presented with an accuracy of 72.856%.

Keywords: ADALINE, neural network, gradient decent, process mining, resource behaviour, polynomial regression model

Procedia PDF Downloads 349
1278 Stacking Ensemble Approach for Combining Different Methods in Real Estate Prediction

Authors: Sol Girouard, Zona Kostic

Abstract:

A home is often the largest and most expensive purchase a person makes. Whether the decision leads to a successful outcome will be determined by a combination of critical factors. In this paper, we propose a method that efficiently handles all the factors in residential real estate and performs predictions given a feature space with high dimensionality while controlling for overfitting. The proposed method was built on gradient descent and boosting algorithms and uses a mixed optimizing technique to improve the prediction power. Usually, a single model cannot handle all the cases thus our approach builds multiple models based on different subsets of the predictors. The algorithm was tested on 3 million homes across the U.S., and the experimental results demonstrate the efficiency of this approach by outperforming techniques currently used in forecasting prices. With everyday changes on the real estate market, our proposed algorithm capitalizes from new events allowing more efficient predictions.

Keywords: real estate prediction, gradient descent, boosting, ensemble methods, active learning, training

Procedia PDF Downloads 241
1277 Heart Ailment Prediction Using Machine Learning Methods

Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula

Abstract:

The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.

Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting

Procedia PDF Downloads 15
1276 Artificial Intelligence-Based Detection of Individuals Suffering from Vestibular Disorder

Authors: Dua Hişam, Serhat İkizoğlu

Abstract:

Identifying the problem behind balance disorder is one of the most interesting topics in the medical literature. This study has considerably enhanced the development of artificial intelligence (AI) algorithms applying multiple machine learning (ML) models to sensory data on gait collected from humans to classify between normal people and those suffering from Vestibular System (VS) problems. Although AI is widely utilized as a diagnostic tool in medicine, AI models have not been used to perform feature extraction and identify VS disorders through training on raw data. In this study, three machine learning (ML) models, the Random Forest Classifier (RF), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN), have been trained to detect VS disorder, and the performance comparison of the algorithms has been made using accuracy, recall, precision, and f1-score. With an accuracy of 95.28 %, Random Forest Classifier (RF) was the most accurate model.

Keywords: vestibular disorder, machine learning, random forest classifier, k-nearest neighbor, extreme gradient boosting

Procedia PDF Downloads 36
1275 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates

Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe

Abstract:

Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.

Keywords: machine learning, MTB, WGS, drug resistant TB

Procedia PDF Downloads 16
1274 Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators

Authors: Jan Doutreloigne

Abstract:

This paper describes two methods for the reduction of the peak input current during the boosting of Dickson charge pumps. Both methods are implemented in the fully integrated Dickson charge pumps of a high-voltage display driver chip for smart-card applications. Experimental results reveal good correspondence with Spice simulations and show a reduction of the peak input current by a factor of 6 during boosting

Keywords: bi-stable display driver, Dickson charge pump, high-voltage generator, peak current reduction, sub-pump boosting, variable frequency boosting

Procedia PDF Downloads 411
1273 Advancements in Predicting Diabetes Biomarkers: A Machine Learning Epigenetic Approach

Authors: James Ladzekpo

Abstract:

Background: The urgent need to identify new pharmacological targets for diabetes treatment and prevention has been amplified by the disease's extensive impact on individuals and healthcare systems. A deeper insight into the biological underpinnings of diabetes is crucial for the creation of therapeutic strategies aimed at these biological processes. Current predictive models based on genetic variations fall short of accurately forecasting diabetes. Objectives: Our study aims to pinpoint key epigenetic factors that predispose individuals to diabetes. These factors will inform the development of an advanced predictive model that estimates diabetes risk from genetic profiles, utilizing state-of-the-art statistical and data mining methods. Methodology: We have implemented a recursive feature elimination with cross-validation using the support vector machine (SVM) approach for refined feature selection. Building on this, we developed six machine learning models, including logistic regression, k-Nearest Neighbors (k-NN), Naive Bayes, Random Forest, Gradient Boosting, and Multilayer Perceptron Neural Network, to evaluate their performance. Findings: The Gradient Boosting Classifier excelled, achieving a median recall of 92.17% and outstanding metrics such as area under the receiver operating characteristics curve (AUC) with a median of 68%, alongside median accuracy and precision scores of 76%. Through our machine learning analysis, we identified 31 genes significantly associated with diabetes traits, highlighting their potential as biomarkers and targets for diabetes management strategies. Conclusion: Particularly noteworthy were the Gradient Boosting Classifier and Multilayer Perceptron Neural Network, which demonstrated potential in diabetes outcome prediction. We recommend future investigations to incorporate larger cohorts and a wider array of predictive variables to enhance the models' predictive capabilities.

Keywords: diabetes, machine learning, prediction, biomarkers

Procedia PDF Downloads 11
1272 Investigation of Extreme Gradient Boosting Model Prediction of Soil Strain-Shear Modulus

Authors: Ehsan Mehryaar, Reza Bushehri

Abstract:

One of the principal parameters defining the clay soil dynamic response is the strain-shear modulus relation. Predicting the strain and, subsequently, shear modulus reduction of the soil is essential for performance analysis of structures exposed to earthquake and dynamic loadings. Many soil properties affect soil’s dynamic behavior. In order to capture those effects, in this study, a database containing 1193 data points consists of maximum shear modulus, strain, moisture content, initial void ratio, plastic limit, liquid limit, initial confining pressure resulting from dynamic laboratory testing of 21 clays is collected for predicting the shear modulus vs. strain curve of soil. A model based on an extreme gradient boosting technique is proposed. A tree-structured parzan estimator hyper-parameter tuning algorithm is utilized simultaneously to find the best hyper-parameters for the model. The performance of the model is compared to the existing empirical equations using the coefficient of correlation and root mean square error.

Keywords: XGBoost, hyper-parameter tuning, soil shear modulus, dynamic response

Procedia PDF Downloads 161
1271 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 99
1270 Review on Quaternion Gradient Operator with Marginal and Vector Approaches for Colour Edge Detection

Authors: Nadia Ben Youssef, Aicha Bouzid

Abstract:

Gradient estimation is one of the most fundamental tasks in the field of image processing in general, and more particularly for color images since that the research in color image gradient remains limited. The widely used gradient method is Di Zenzo’s gradient operator, which is based on the measure of squared local contrast of color images. The proposed gradient mechanism, presented in this paper, is based on the principle of the Di Zenzo’s approach using quaternion representation. This edge detector is compared to a marginal approach based on multiscale product of wavelet transform and another vector approach based on quaternion convolution and vector gradient approach. The experimental results indicate that the proposed color gradient operator outperforms marginal approach, however, it is less efficient then the second vector approach.

Keywords: gradient, edge detection, color image, quaternion

Procedia PDF Downloads 195
1269 Stochastic Age-Structured Population Models

Authors: Arcady Ponosov

Abstract:

Many well-known age-structured population models are derived from the celebrated McKendrick-von Foerster equation (MFE), also called the biological conservation law. A similar technique is suggested for the stochastically perturbed MFE. This technique is shown to produce stochastic versions of the deterministic population models, which appear to be very different from those one can construct by simply appending additive stochasticity to deterministic equations. In particular, it is shown that stochastic Nicholson’s blowflies model should contain both additive and multiplicative stochastic noises. The suggested transformation technique is similar to that used in the deterministic case. The difference is hidden in the formulas for the exact solutions of the simplified boundary value problem for the stochastically perturbed MFE. The analysis is also based on the theory of stochastic delay differential equations.

Keywords: boundary value problems, population models, stochastic delay differential equations, stochastic partial differential equation

Procedia PDF Downloads 209
1268 Mathematical Modeling of the Working Principle of Gravity Gradient Instrument

Authors: Danni Cong, Meiping Wu, Hua Mu, Xiaofeng He, Junxiang Lian, Juliang Cao, Shaokun Cai, Hao Qin

Abstract:

Gravity field is of great significance in geoscience, national economy and national security, and gravitational gradient measurement has been extensively studied due to its higher accuracy than gravity measurement. Gravity gradient sensor, being one of core devices of the gravity gradient instrument, plays a key role in measuring accuracy. Therefore, this paper starts from analyzing the working principle of the gravity gradient sensor by Newton’s law, and then considers the relative motion between inertial and non-inertial systems to build a relatively adequate mathematical model, laying a foundation for the measurement error calibration, measurement accuracy improvement.

Keywords: gravity gradient, gravity gradient sensor, accelerometer, single-axis rotation modulation

Procedia PDF Downloads 283
1267 Least Squares Solution for Linear Quadratic Gaussian Problem with Stochastic Approximation Approach

Authors: Sie Long Kek, Wah June Leong, Kok Lay Teo

Abstract:

Linear quadratic Gaussian model is a standard mathematical model for the stochastic optimal control problem. The combination of the linear quadratic estimation and the linear quadratic regulator allows the state estimation and the optimal control policy to be designed separately. This is known as the separation principle. In this paper, an efficient computational method is proposed to solve the linear quadratic Gaussian problem. In our approach, the Hamiltonian function is defined, and the necessary conditions are derived. In addition to this, the output error is defined and the least-square optimization problem is introduced. By determining the first-order necessary condition, the gradient of the sum squares of output error is established. On this point of view, the stochastic approximation approach is employed such that the optimal control policy is updated. Within a given tolerance, the iteration procedure would be stopped and the optimal solution of the linear-quadratic Gaussian problem is obtained. For illustration, an example of the linear-quadratic Gaussian problem is studied. The result shows the efficiency of the approach proposed. In conclusion, the applicability of the approach proposed for solving the linear quadratic Gaussian problem is highly demonstrated.

Keywords: iteration procedure, least squares solution, linear quadratic Gaussian, output error, stochastic approximation

Procedia PDF Downloads 129
1266 Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients

Authors: Karina Zaccari, Ernesto Cordeiro Marujo

Abstract:

This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.

Keywords: machine learning, medical diagnosis, meningitis detection, pediatric research

Procedia PDF Downloads 116
1265 MIMIC: A Multi Input Micro-Influencers Classifier

Authors: Simone Leonardi, Luca Ardito

Abstract:

Micro-influencers are effective elements in the marketing strategies of companies and institutions because of their capability to create an hyper-engaged audience around a specific topic of interest. In recent years, many scientific approaches and commercial tools have handled the task of detecting this type of social media users. These strategies adopt solutions ranging from rule based machine learning models to deep neural networks and graph analysis on text, images, and account information. This work compares the existing solutions and proposes an ensemble method to generalize them with different input data and social media platforms. The deployed solution combines deep learning models on unstructured data with statistical machine learning models on structured data. We retrieve both social media accounts information and multimedia posts on Twitter and Instagram. These data are mapped into feature vectors for an eXtreme Gradient Boosting (XGBoost) classifier. Sixty different topics have been analyzed to build a rule based gold standard dataset and to compare the performances of our approach against baseline classifiers. We prove the effectiveness of our work by comparing the accuracy, precision, recall, and f1 score of our model with different configurations and architectures. We obtained an accuracy of 0.91 with our best performing model.

Keywords: deep learning, gradient boosting, image processing, micro-influencers, NLP, social media

Procedia PDF Downloads 132
1264 Scoring System for the Prognosis of Sepsis Patients in Intensive Care Units

Authors: Javier E. García-Gallo, Nelson J. Fonseca-Ruiz, John F. Duitama-Munoz

Abstract:

Sepsis is a syndrome that occurs with physiological and biochemical abnormalities induced by severe infection and carries a high mortality and morbidity, therefore the severity of its condition must be interpreted quickly. After patient admission in an intensive care unit (ICU), it is necessary to synthesize the large volume of information that is collected from patients in a value that represents the severity of their condition. Traditional severity of illness scores seeks to be applicable to all patient populations, and usually assess in-hospital mortality. However, the use of machine learning techniques and the data of a population that shares a common characteristic could lead to the development of customized mortality prediction scores with better performance. This study presents the development of a score for the one-year mortality prediction of the patients that are admitted to an ICU with a sepsis diagnosis. 5650 ICU admissions extracted from the MIMICIII database were evaluated, divided into two groups: 70% to develop the score and 30% to validate it. Comorbidities, demographics and clinical information of the first 24 hours after the ICU admission were used to develop a mortality prediction score. LASSO (least absolute shrinkage and selection operator) and SGB (Stochastic Gradient Boosting) variable importance methodologies were used to select the set of variables that make up the developed score; each of this variables was dichotomized and a cut-off point that divides the population into two groups with different mean mortalities was found; if the patient is in the group that presents a higher mortality a one is assigned to the particular variable, otherwise a zero is assigned. These binary variables are used in a logistic regression (LR) model, and its coefficients were rounded to the nearest integer. The resulting integers are the point values that make up the score when multiplied with each binary variables and summed. The one-year mortality probability was estimated using the score as the only variable in a LR model. Predictive power of the score, was evaluated using the 1695 admissions of the validation subset obtaining an area under the receiver operating characteristic curve of 0.7528, which outperforms the results obtained with Sequential Organ Failure Assessment (SOFA), Oxford Acute Severity of Illness Score (OASIS) and Simplified Acute Physiology Score II (SAPSII) scores on the same validation subset. Observed and predicted mortality rates within estimated probabilities deciles were compared graphically and found to be similar, indicating that the risk estimate obtained with the score is close to the observed mortality, it is also observed that the number of events (deaths) is indeed increasing as the outcome go from the decile with the lowest probabilities to the decile with the highest probabilities. Sepsis is a syndrome that carries a high mortality, 43.3% for the patients included in this study; therefore, tools that help clinicians to quickly and accurately predict a worse prognosis are needed. This work demonstrates the importance of customization of mortality prediction scores since the developed score provides better performance than traditional scoring systems.

Keywords: intensive care, logistic regression model, mortality prediction, sepsis, severity of illness, stochastic gradient boosting

Procedia PDF Downloads 180
1263 Solving SPDEs by Least Squares Method

Authors: Hassan Manouzi

Abstract:

We present in this paper a useful strategy to solve stochastic partial differential equations (SPDEs) involving stochastic coefficients. Using the Wick-product of higher order and the Wiener-Itˆo chaos expansion, the SPDEs is reformulated as a large system of deterministic partial differential equations. To reduce the computational complexity of this system, we shall use a decomposition-coordination method. To obtain the chaos coefficients in the corresponding deterministic equations, we use a least square formulation. Once this approximation is performed, the statistics of the numerical solution can be easily evaluated.

Keywords: least squares, wick product, SPDEs, finite element, wiener chaos expansion, gradient method

Procedia PDF Downloads 380
1262 Non-Stationary Stochastic Optimization of an Oscillating Water Column

Authors: María L. Jalón, Feargal Brennan

Abstract:

A non-stationary stochastic optimization methodology is applied to an OWC (oscillating water column) to find the design that maximizes the wave energy extraction. Different temporal cycles are considered to represent the long-term variability of the wave climate at the site in the optimization problem. The results of the non-stationary stochastic optimization problem are compared against those obtained by a stationary stochastic optimization problem. The comparative analysis reveals that the proposed non-stationary optimization provides designs with a better fit to reality. However, the stationarity assumption can be adequate when looking at averaged system response.

Keywords: non-stationary stochastic optimization, oscillating water, temporal variability, wave energy

Procedia PDF Downloads 330
1261 A New Modification of Nonlinear Conjugate Gradient Coefficients with Global Convergence Properties

Authors: Ahmad Alhawarat, Mustafa Mamat, Mohd Rivaie, Ismail Mohd

Abstract:

Conjugate gradient method has been enormously used to solve large scale unconstrained optimization problems due to the number of iteration, memory, CPU time, and convergence property, in this paper we find a new class of nonlinear conjugate gradient coefficient with global convergence properties proved by exact line search. The numerical results for our new βK give a good result when it compared with well-known formulas.

Keywords: conjugate gradient method, conjugate gradient coefficient, global convergence

Procedia PDF Downloads 407
1260 Weak Solutions Of Stochastic Fractional Differential Equations

Authors: Lev Idels, Arcady Ponosov

Abstract:

Stochastic fractional differential equations have recently attracted considerable attention, as they have been used to model real-world processes, which are subject to natural memory effects and measurement uncertainties. Compared to conventional hereditary differential equations, one of the advantages of fractional differential equations is related to more realistic geometric properties of their trajectories that do not intersect in the phase space. In this report, a Peano-like existence theorem for nonlinear stochastic fractional differential equations is proven under very general hypotheses. Several specific classes of equations are checked to satisfy these hypotheses, including delay equations driven by the fractional Brownian motion, stochastic fractional neutral equations and many others.

Keywords: delay equations, operator methods, stochastic noise, weak solutions

Procedia PDF Downloads 163
1259 Lyapunov and Input-to-State Stability of Stochastic Differential Equations

Authors: Arcady Ponosov, Ramazan Kadiev

Abstract:

Input-to-State Stability (ISS) is widely used in deterministic control theory but less known in the stochastic case. Roughly speaking, the theory explains when small perturbations of the right-hand sides of the system on the entire semiaxis cause only small changes in the solutions of the system, again on the entire semiaxis. This property is crucial in many applications. In the report, we explain how to define and study ISS for systems of linear stochastic differential equations with or without delays. The central result connects ISS with the property of Lyapunov stability. This relationship is well-known in the deterministic setting, but its stochastic version is new. As an application, a method of studying asymptotic Lyapunov stability for stochastic delay equations is described and justified. Several examples are provided that confirm the efficiency and simplicity of the framework.

Keywords: asymptotic stability, delay equations, operator methods, stochastic perturbations

Procedia PDF Downloads 143
1258 Finite-Sum Optimization: Adaptivity to Smoothness and Loopless Variance Reduction

Authors: Bastien Batardière, Joon Kwon

Abstract:

For finite-sum optimization, variance-reduced gradient methods (VR) compute at each iteration the gradient of a single function (or of a mini-batch), and yet achieve faster convergence than SGD thanks to a carefully crafted lower-variance stochastic gradient estimator that reuses past gradients. Another important line of research of the past decade in continuous optimization is the adaptive algorithms such as AdaGrad, that dynamically adjust the (possibly coordinate-wise) learning rate to past gradients and thereby adapt to the geometry of the objective function. Variants such as RMSprop and Adam demonstrate outstanding practical performance that have contributed to the success of deep learning. In this work, we present AdaLVR, which combines the AdaGrad algorithm with loopless variance-reduced gradient estimators such as SAGA or L-SVRG that benefits from a straightforward construction and a streamlined analysis. We assess that AdaLVR inherits both good convergence properties from VR methods and the adaptive nature of AdaGrad: in the case of L-smooth convex functions we establish a gradient complexity of O(n + (L + √ nL)/ε) without prior knowledge of L. Numerical experiments demonstrate the superiority of AdaLVR over state-of-the-art methods. Moreover, we empirically show that the RMSprop and Adam algorithm combined with variance-reduced gradients estimators achieve even faster convergence.

Keywords: convex optimization, variance reduction, adaptive algorithms, loopless

Procedia PDF Downloads 16
1257 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values

Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi

Abstract:

A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.

Keywords: eXtreme gradient boosting, missing data, Alzheimer disease, early mild cognitive impairment, late mild cognitive impair, multiclass classification, ADNI, support vector machine, random forest

Procedia PDF Downloads 156