Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3

stochastic gradient boosting Related Abstracts

3 The Use of Stochastic Gradient Boosting Method for Multi-Model Combination of Rainfall-Runoff Models

Authors: Phanida Phukoetphim, Asaad Y. Shamseldin

Abstract:

In this study, the novel Stochastic Gradient Boosting (SGB) combination method is addressed for producing daily river flows from four different rain-runoff models of Ohinemuri catchment, New Zealand. The selected rainfall-runoff models are two empirical black-box models: linear perturbation model and linear varying gain factor model, two conceptual models: soil moisture accounting and routing model and Nedbør-Afrstrømnings model. In this study, the simple average combination method and the weighted average combination method were used as a benchmark for comparing the results of the novel SGB combination method. The models and combination results are evaluated using statistical and graphical criteria. Overall results of this study show that the use of combination technique can certainly improve the simulated river flows of four selected models for Ohinemuri catchment, New Zealand. The results also indicate that the novel SGB combination method is capable of accurate prediction when used in a combination method of the simulated river flows in New Zealand.

Keywords: Bioinformatics, multi-model combination, rainfall-runoff modeling, stochastic gradient boosting

Procedia PDF Downloads 192
2 Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain

Authors: Zachary Blanks, Solomon Sonya

Abstract:

Poaching presents a serious threat to endangered animal species, environment conservations, and human life. Additionally, some poaching activity has even been linked to supplying funds to support terrorist networks elsewhere around the world. Consequently, agencies dedicated to protecting wildlife habitats have a near intractable task of adequately patrolling an entire area (spanning several thousand kilometers) given limited resources, funds, and personnel at their disposal. Thus, agencies need predictive tools that are both high-performing and easily implementable by the user to help in learning how the significant features (e.g. animal population densities, topography, behavior patterns of the criminals within the area, etc) interact with each other in hopes of abating poaching. This research develops a classification model using machine learning algorithms to aid in forecasting future attacks that is both easy to train and performs well when compared to other models. In this research, we demonstrate how data imputation methods (specifically predictive mean matching, gradient boosting, and random forest multiple imputation) can be applied to analyze data and create significant predictions across a varied data set. Specifically, we apply these methods to improve the accuracy of adopted prediction models (Logistic Regression, Support Vector Machine, etc). Finally, we assess the performance of the model and the accuracy of our data imputation methods by learning on a real-world data set constituting four years of imputed data and testing on one year of non-imputed data. This paper provides three main contributions. First, we extend work done by the Teamcore and CREATE (Center for Risk and Economic Analysis of Terrorism Events) research group at the University of Southern California (USC) working in conjunction with the Department of Homeland Security to apply game theory and machine learning algorithms to develop more efficient ways of reducing poaching. This research introduces ensemble methods (Random Forests and Stochastic Gradient Boosting) and applies it to real-world poaching data gathered from the Ugandan rain forest park rangers. Next, we consider the effect of data imputation on both the performance of various algorithms and the general accuracy of the method itself when applied to a dependent variable where a large number of observations are missing. Third, we provide an alternate approach to predict the probability of observing poaching both by season and by month. The results from this research are very promising. We conclude that by using Stochastic Gradient Boosting to predict observations for non-commercial poaching by season, we are able to produce statistically equivalent results while being orders of magnitude faster in computation time and complexity. Additionally, when predicting potential poaching incidents by individual month vice entire seasons, boosting techniques produce a mean area under the curve increase of approximately 3% relative to previous prediction schedules by entire seasons.

Keywords: Statistical Analysis, Machine Learning, Wildlife Protection, stochastic gradient boosting, imputation, random forests, ensemble methods

Procedia PDF Downloads 154
1 Scoring System for the Prognosis of Sepsis Patients in Intensive Care Units

Authors: Javier E. García-Gallo, Nelson J. Fonseca-Ruiz, John F. Duitama-Munoz

Abstract:

Sepsis is a syndrome that occurs with physiological and biochemical abnormalities induced by severe infection and carries a high mortality and morbidity, therefore the severity of its condition must be interpreted quickly. After patient admission in an intensive care unit (ICU), it is necessary to synthesize the large volume of information that is collected from patients in a value that represents the severity of their condition. Traditional severity of illness scores seeks to be applicable to all patient populations, and usually assess in-hospital mortality. However, the use of machine learning techniques and the data of a population that shares a common characteristic could lead to the development of customized mortality prediction scores with better performance. This study presents the development of a score for the one-year mortality prediction of the patients that are admitted to an ICU with a sepsis diagnosis. 5650 ICU admissions extracted from the MIMICIII database were evaluated, divided into two groups: 70% to develop the score and 30% to validate it. Comorbidities, demographics and clinical information of the first 24 hours after the ICU admission were used to develop a mortality prediction score. LASSO (least absolute shrinkage and selection operator) and SGB (Stochastic Gradient Boosting) variable importance methodologies were used to select the set of variables that make up the developed score; each of this variables was dichotomized and a cut-off point that divides the population into two groups with different mean mortalities was found; if the patient is in the group that presents a higher mortality a one is assigned to the particular variable, otherwise a zero is assigned. These binary variables are used in a logistic regression (LR) model, and its coefficients were rounded to the nearest integer. The resulting integers are the point values that make up the score when multiplied with each binary variables and summed. The one-year mortality probability was estimated using the score as the only variable in a LR model. Predictive power of the score, was evaluated using the 1695 admissions of the validation subset obtaining an area under the receiver operating characteristic curve of 0.7528, which outperforms the results obtained with Sequential Organ Failure Assessment (SOFA), Oxford Acute Severity of Illness Score (OASIS) and Simplified Acute Physiology Score II (SAPSII) scores on the same validation subset. Observed and predicted mortality rates within estimated probabilities deciles were compared graphically and found to be similar, indicating that the risk estimate obtained with the score is close to the observed mortality, it is also observed that the number of events (deaths) is indeed increasing as the outcome go from the decile with the lowest probabilities to the decile with the highest probabilities. Sepsis is a syndrome that carries a high mortality, 43.3% for the patients included in this study; therefore, tools that help clinicians to quickly and accurately predict a worse prognosis are needed. This work demonstrates the importance of customization of mortality prediction scores since the developed score provides better performance than traditional scoring systems.

Keywords: Intensive Care, sepsis, logistic regression model, stochastic gradient boosting, mortality prediction, severity of illness

Procedia PDF Downloads 74