Search results for: logistic regression models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9018

Search results for: logistic regression models

8748 Prediction of Compressive Strength Using Artificial Neural Network

Authors: Vijay Pal Singh, Yogesh Chandra Kotiyal

Abstract:

Structures are a combination of various load carrying members which transfer the loads to the foundation from the superstructure safely. At the design stage, the loading of the structure is defined and appropriate material choices are made based upon their properties, mainly related to strength. The strength of materials kept on reducing with time because of many factors like environmental exposure and deformation caused by unpredictable external loads. Hence, to predict the strength of materials used in structures, various techniques are used. Among these techniques, Non-Destructive Techniques (NDT) are the one that can be used to predict the strength without damaging the structure. In the present study, the compressive strength of concrete has been predicted using Artificial Neural Network (ANN). The predicted strength was compared with the experimentally obtained actual compressive strength of concrete and equations were developed for different models. A good co-relation has been obtained between the predicted strength by these models and experimental values. Further, the co-relation has been developed using two NDT techniques for prediction of strength by regression analysis. It was found that the percentage error has been reduced between the predicted strength by using combined techniques in place of single techniques.

Keywords: rebound, ultra-sonic pulse, penetration, ANN, NDT, regression

Procedia PDF Downloads 398
8747 Chemometric Regression Analysis of Radical Scavenging Ability of Kombucha Fermented Kefir-Like Products

Authors: Strahinja Kovacevic, Milica Karadzic Banjac, Jasmina Vitas, Stefan Vukmanovic, Radomir Malbasa, Lidija Jevric, Sanja Podunavac-Kuzmanovic

Abstract:

The present study deals with chemometric regression analysis of quality parameters and the radical scavenging ability of kombucha fermented kefir-like products obtained with winter savory (WS), peppermint (P), stinging nettle (SN) and wild thyme tea (WT) kombucha inoculums. Each analyzed sample was described by milk fat content (MF, %), total unsaturated fatty acids content (TUFA, %), monounsaturated fatty acids content (MUFA, %), polyunsaturated fatty acids content (PUFA, %), the ability of free radicals scavenging (RSA Dₚₚₕ, % and RSA.ₒₕ, %) and pH values measured after each hour from the start until the end of fermentation. The aim of the conducted regression analysis was to establish chemometric models which can predict the radical scavenging ability (RSA Dₚₚₕ, % and RSA.ₒₕ, %) of the samples by correlating it with the MF, TUFA, MUFA, PUFA and the pH value at the beginning, in the middle and at the end of fermentation process which lasted between 11 and 17 hours, until pH value of 4.5 was reached. The analysis was carried out applying univariate linear (ULR) and multiple linear regression (MLR) methods on the raw data and the data standardized by the min-max normalization method. The obtained models were characterized by very limited prediction power (poor cross-validation parameters) and weak statistical characteristics. Based on the conducted analysis it can be concluded that the resulting radical scavenging ability cannot be precisely predicted only on the basis of MF, TUFA, MUFA, PUFA content, and pH values, however, other quality parameters should be considered and included in the further modeling. This study is based upon work from project: Kombucha beverages production using alternative substrates from the territory of the Autonomous Province of Vojvodina, 142-451-2400/2019-03, supported by Provincial Secretariat for Higher Education and Scientific Research of AP Vojvodina.

Keywords: chemometrics, regression analysis, kombucha, quality control

Procedia PDF Downloads 112
8746 Regression for Doubly Inflated Multivariate Poisson Distributions

Authors: Ishapathik Das, Sumen Sen, N. Rao Chaganty, Pooja Sengupta

Abstract:

Dependent multivariate count data occur in several research studies. These data can be modeled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells, and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present a real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.

Keywords: copula, Gaussian copula, multivariate distributions, inflated distributios

Procedia PDF Downloads 132
8745 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models

Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti

Abstract:

In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.

Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics

Procedia PDF Downloads 20
8744 Analyzing the Influence of Hydrometeorlogical Extremes, Geological Setting, and Social Demographic on Public Health

Authors: Irfan Ahmad Afip

Abstract:

This main research objective is to accurately identify the possibility for a Leptospirosis outbreak severity of a certain area based on its input features into a multivariate regression model. The research question is the possibility of an outbreak in a specific area being influenced by this feature, such as social demographics and hydrometeorological extremes. If the occurrence of an outbreak is being subjected to these features, then the epidemic severity for an area will be different depending on its environmental setting because the features will influence the possibility and severity of an outbreak. Specifically, this research objective was three-fold, namely: (a) to identify the relevant multivariate features and visualize the patterns data, (b) to develop a multivariate regression model based from the selected features and determine the possibility for Leptospirosis outbreak in an area, and (c) to compare the predictive ability of multivariate regression model and machine learning algorithms. Several secondary data features were collected locations in the state of Negeri Sembilan, Malaysia, based on the possibility it would be relevant to determine the outbreak severity in the area. The relevant features then will become an input in a multivariate regression model; a linear regression model is a simple and quick solution for creating prognostic capabilities. A multivariate regression model has proven more precise prognostic capabilities than univariate models. The expected outcome from this research is to establish a correlation between the features of social demographic and hydrometeorological with Leptospirosis bacteria; it will also become a contributor for understanding the underlying relationship between the pathogen and the ecosystem. The relationship established can be beneficial for the health department or urban planner to inspect and prepare for future outcomes in event detection and system health monitoring.

Keywords: geographical information system, hydrometeorological, leptospirosis, multivariate regression

Procedia PDF Downloads 84
8743 Ground Motion Modeling Using the Least Absolute Shrinkage and Selection Operator

Authors: Yildiz Stella Dak, Jale Tezcan

Abstract:

Ground motion models that relate a strong motion parameter of interest to a set of predictive seismological variables describing the earthquake source, the propagation path of the seismic wave, and the local site conditions constitute a critical component of seismic hazard analyses. When a sufficient number of strong motion records are available, ground motion relations are developed using statistical analysis of the recorded ground motion data. In regions lacking a sufficient number of recordings, a synthetic database is developed using stochastic, theoretical or hybrid approaches. Regardless of the manner the database was developed, ground motion relations are developed using regression analysis. Development of a ground motion relation is a challenging process which inevitably requires the modeler to make subjective decisions regarding the inclusion criteria of the recordings, the functional form of the model and the set of seismological variables to be included in the model. Because these decisions are critically important to the validity and the applicability of the model, there is a continuous interest on procedures that will facilitate the development of ground motion models. This paper proposes the use of the Least Absolute Shrinkage and Selection Operator (LASSO) in selecting the set predictive seismological variables to be used in developing a ground motion relation. The LASSO can be described as a penalized regression technique with a built-in capability of variable selection. Similar to the ridge regression, the LASSO is based on the idea of shrinking the regression coefficients to reduce the variance of the model. Unlike ridge regression, where the coefficients are shrunk but never set equal to zero, the LASSO sets some of the coefficients exactly to zero, effectively performing variable selection. Given a set of candidate input variables and the output variable of interest, LASSO allows ranking the input variables in terms of their relative importance, thereby facilitating the selection of the set of variables to be included in the model. Because the risk of overfitting increases as the ratio of the number of predictors to the number of recordings increases, selection of a compact set of variables is important in cases where a small number of recordings are available. In addition, identification of a small set of variables can improve the interpretability of the resulting model, especially when there is a large number of candidate predictors. A practical application of the proposed approach is presented, using more than 600 recordings from the National Geospatial-Intelligence Agency (NGA) database, where the effect of a set of seismological predictors on the 5% damped maximum direction spectral acceleration is investigated. The set of candidate predictors considered are Magnitude, Rrup, Vs30. Using LASSO, the relative importance of the candidate predictors has been ranked. Regression models with increasing levels of complexity were constructed using one, two, three, and four best predictors, and the models’ ability to explain the observed variance in the target variable have been compared. The bias-variance trade-off in the context of model selection is discussed.

Keywords: ground motion modeling, least absolute shrinkage and selection operator, penalized regression, variable selection

Procedia PDF Downloads 304
8742 Early Predictive Signs for Kasai Procedure Success

Authors: Medan Isaeva, Anna Degtyareva

Abstract:

Context: Biliary atresia is a common reason for liver transplants in children, and the Kasai procedure can potentially be successful in avoiding the need for transplantation. However, it is important to identify factors that influence surgical outcomes in order to optimize treatment and improve patient outcomes. Research aim: The aim of this study was to develop prognostic models to assess the outcomes of the Kasai procedure in children with biliary atresia. Methodology: This retrospective study analyzed data from 166 children with biliary atresia who underwent the Kasai procedure between 2002 and 2021. The effectiveness of the operation was assessed based on specific criteria, including post-operative stool color, jaundice reduction, and bilirubin levels. The study involved a comparative analysis of various parameters, such as gestational age, birth weight, age at operation, physical development, liver and spleen sizes, and laboratory values including bilirubin, ALT, AST, and others, measured pre- and post-operation. Ultrasonographic evaluations were also conducted pre-operation, assessing the hepatobiliary system and related quantitative parameters. The study was carried out by two experienced specialists in pediatric hepatology. Comparative analysis and multifactorial logistic regression were used as the primary statistical methods. Findings: The study identified several statistically significant predictors of a successful Kasai procedure, including the presence of the gallbladder and levels of cholesterol and direct bilirubin post-operation. A detectable gallbladder was associated with a higher probability of surgical success, while elevated post-operative cholesterol and direct bilirubin levels were indicative of a reduced chance of positive outcomes. Theoretical importance: The findings of this study contribute to the optimization of treatment strategies for children with biliary atresia undergoing the Kasai procedure. By identifying early predictive signs of success, clinicians can modify treatment plans and manage patient care more effectively and proactively. Data collection and analysis procedures: Data for this analysis were obtained from the health records of patients who received the Kasai procedure. Comparative analysis and multifactorial logistic regression were employed to analyze the data and identify significant predictors. Question addressed: The study addressed the question of identifying predictive factors for the success of the Kasai procedure in children with biliary atresia. Conclusion: The developed prognostic models serve as valuable tools for early detection of patients who are less likely to benefit from the Kasai procedure. This enables clinicians to modify treatment plans and manage patient care more effectively and proactively. Potential limitations of the study: The study has several limitations. Its retrospective nature may introduce biases and inconsistencies in data collection. Being single centered, the results might not be generalizable to wider populations due to variations in surgical and postoperative practices. Also, other potential influencing factors beyond the clinical, laboratory, and ultrasonographic parameters considered in this study were not explored, which could affect the outcomes of the Kasai operation. Future studies could benefit from including a broader range of factors.

Keywords: biliary atresia, kasai operation, prognostic model, native liver survival

Procedia PDF Downloads 25
8741 Analyzing Impacts of Road Network on Vegetation Using Geographic Information System and Remote Sensing Techniques

Authors: Elizabeth Malebogo Mosepele

Abstract:

Road transport has become increasingly common in the world; people rely on road networks for transportation purpose on a daily basis. However, environmental impact of roads on surrounding landscapes extends their potential effects even further. This study investigates the impact of road network on natural vegetation. The study will provide baseline knowledge regarding roadside vegetation and would be helpful in future for conservation of biodiversity along the road verges and improvements of road verges. The general hypothesis of this study is that the amount and condition of road side vegetation could be explained by road network conditions. Remote sensing techniques were used to analyze vegetation conditions. Landsat 8 OLI image was used to assess vegetation cover condition. NDVI image was generated and used as a base from which land cover classes were extracted, comprising four categories viz. healthy vegetation, degraded vegetation, bare surface, and water. The classification of the image was achieved using the supervised classification technique. Road networks were digitized from Google Earth. For observed data, transect based quadrats of 50*50 m were conducted next to road segments for vegetation assessment. Vegetation condition was related to road network, with the multinomial logistic regression confirming a significant relationship between vegetation condition and road network. The null hypothesis formulated was that 'there is no variation in vegetation condition as we move away from the road.' Analysis of vegetation condition revealed degraded vegetation within close proximity of a road segment and healthy vegetation as the distance increase away from the road. The Chi Squared value was compared with critical value of 3.84, at the significance level of 0.05 to determine the significance of relationship. Given that the Chi squared value was 395, 5004, the null hypothesis was therefore rejected; there is significant variation in vegetation the distance increases away from the road. The conclusion is that the road network plays an important role in the condition of vegetation.

Keywords: Chi squared, geographic information system, multinomial logistic regression, remote sensing, road side vegetation

Procedia PDF Downloads 399
8740 Solving Dimensionality Problem and Finding Statistical Constructs on Latent Regression Models: A Novel Methodology with Real Data Application

Authors: Sergio Paez Moncaleano, Alvaro Mauricio Montenegro

Abstract:

This paper presents a novel statistical methodology for measuring and founding constructs in Latent Regression Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations on Item Response Theory (IRT). In addition, based on the fundamentals of submodel theory and with a convergence of many ideas of IRT, we propose an algorithm not just to solve the dimensionality problem (nowadays an open discussion) but a new research field that promises more fear and realistic qualifications for examiners and a revolution on IRT and educational research. In the end, the methodology is applied to a set of real data set presenting impressive results for the coherence, speed and precision. Acknowledgments: This research was financed by Colciencias through the project: 'Multidimensional Item Response Theory Models for Practical Application in Large Test Designed to Measure Multiple Constructs' and both authors belong to SICS Research Group from Universidad Nacional de Colombia.

Keywords: item response theory, dimensionality, submodel theory, factorial analysis

Procedia PDF Downloads 342
8739 Exploring the Applications of Neural Networks in the Adaptive Learning Environment

Authors: Baladitya Swaika, Rahul Khatry

Abstract:

Computer Adaptive Tests (CATs) is one of the most efficient ways for testing the cognitive abilities of students. CATs are based on Item Response Theory (IRT) which is based on item selection and ability estimation using statistical methods of maximum information selection/selection from posterior and maximum-likelihood (ML)/maximum a posteriori (MAP) estimators respectively. This study aims at combining both classical and Bayesian approaches to IRT to create a dataset which is then fed to a neural network which automates the process of ability estimation and then comparing it to traditional CAT models designed using IRT. This study uses python as the base coding language, pymc for statistical modelling of the IRT and scikit-learn for neural network implementations. On creation of the model and on comparison, it is found that the Neural Network based model performs 7-10% worse than the IRT model for score estimations. Although performing poorly, compared to the IRT model, the neural network model can be beneficially used in back-ends for reducing time complexity as the IRT model would have to re-calculate the ability every-time it gets a request whereas the prediction from a neural network could be done in a single step for an existing trained Regressor. This study also proposes a new kind of framework whereby the neural network model could be used to incorporate feature sets, other than the normal IRT feature set and use a neural network’s capacity of learning unknown functions to give rise to better CAT models. Categorical features like test type, etc. could be learnt and incorporated in IRT functions with the help of techniques like logistic regression and can be used to learn functions and expressed as models which may not be trivial to be expressed via equations. This kind of a framework, when implemented would be highly advantageous in psychometrics and cognitive assessments. This study gives a brief overview as to how neural networks can be used in adaptive testing, not only by reducing time-complexity but also by being able to incorporate newer and better datasets which would eventually lead to higher quality testing.

Keywords: computer adaptive tests, item response theory, machine learning, neural networks

Procedia PDF Downloads 154
8738 Currency Exchange Rate Forecasts Using Quantile Regression

Authors: Yuzhi Cai

Abstract:

In this paper, we discuss a Bayesian approach to quantile autoregressive (QAR) time series model estimation and forecasting. Together with a combining forecasts technique, we then predict USD to GBP currency exchange rates. Combined forecasts contain all the information captured by the fitted QAR models at different quantile levels and are therefore better than those obtained from individual models. Our results show that an unequally weighted combining method performs better than other forecasting methodology. We found that a median AR model can perform well in point forecasting when the predictive density functions are symmetric. However, in practice, using the median AR model alone may involve the loss of information about the data captured by other QAR models. We recommend that combined forecasts should be used whenever possible.

Keywords: combining forecasts, MCMC, predictive density functions, quantile forecasting, quantile modelling

Procedia PDF Downloads 230
8737 Investigating Income Diversification Strategies into Off-Farm Activities Among Rural Households in Ethiopia

Authors: Kibret Berhanu Getinet

Abstract:

Off-farm income diversification by farm rural households has gained the attention of researchers and policymakers due to the fact that agriculture failed to meet the needs of people in developing countries like Ethiopia. The objective of this study was to investigate income diversification strategies into off-farm activities among rural households in Hawassa Zuria Woreda, Sidama National Regional State, Ethiopia. The study used primary and secondary data sources for the primary data collection questionnaire employed as a data collection instrument. A multistage sampling technique was used to collect data from a total of 197 sample households from four kebeles of the study area. Descriptive statistics, as well as econometrics methods of data analysis, were employed. The descriptive statistics result indicates that the majority of sample rural households (68.53 %) have engaged in off-farm income diversification activities while the remaining 31.47% of households did not participate in the diversification in the study area. The choice of participants among the strategies indicates that 6.60% of respondents participated in off-farm wage employment, 30.46% participated in off-farm self-employment, and about 31.47% of them participated in both off-farm wage employment. The study revealed that the share of off-farm income in total annual earnings of households was about 48.457%, and thus, the off-farm diversification significantly contributes to the rural household income. Moreover, binary and multinomial logistic regression models were employed to identify factors that affect the participation and the choices of the off-farm income diversification strategies, respectively. The binary logit model result indicated that agro-ecological zone, education status of the households, available technical skills of the household, household saving, total livestock owned by the households, access to electricity, road access and being married of household head were significant and positively affected the chance of diversification in off-farm activities while the on-farm income of households is negatively affected the chance of diversification. Similarly, the multinomial logistic regression model estimate revealed that agroecological zone, on-farm income, available technical skills, household savings, and access to electricity are positively related and significantly influenced the household’s choice of employment into off-farm wage employment. The off-farm self-employment diversification choice is significantly influenced by on-farm income, available technical skills, household savings, total livestock owned, and access to electricity. Moreover, the result showed that the factors that affect the choice of farm households to engage in both off-farm wage and self-employment are ecological zone, education status, on-farm income, available technical skills, household own saving, market access, total livestock owned, access to electricity and road access. Thus, due attention should be given to addressing the demographic, socio-economic, and institutional constraints to strengthen off-farm income diversification strategies to improve the income of rural households.

Keywords: off-farm, incoem, diversification, logit model

Procedia PDF Downloads 21
8736 Magnitude of Visual Impairment and Associated Factors among Adult Glaucoma Patients Attending University of Gondar, Comprehensive Specialized Hospital, Tertiary Eye Care and Training Center, Northwest Ethiopia, 2022

Authors: Getenet Shumet Birhan, Biruk Lelisa Eticha, Gizachew Tilahun Belete, Fisseha Admassu Ayele

Abstract:

Context: Glaucoma is a significant public health concern globally, being the second leading cause of blindness. This study focuses on adult glaucoma patients in Ethiopia, specifically at the University of Gondar. Research Aim: The main objective is to assess the prevalence of visual impairment and identify associated factors among adult glaucoma patients at the University of Gondar. Methodology: The study used an institution-based cross-sectional design, collecting data from 423 glaucoma patients through interviews and medical chart reviews. Descriptive statistics and logistic regression were employed for analysis. Findings: The study found a high prevalence of visual impairment (77.6%) among adult glaucoma patients, with factors such as female sex, rural residence, glaucoma type, disease stage, and duration of diagnosis significantly associated with visual impairment. Theoretical Importance: This research adds valuable insights into the prevalence and determinants of visual impairment among glaucoma patients in Ethiopia, contributing to the existing literature on eye health in low-resource settings. Data Collection: Data were collected through face-to-face interviews and medical chart reviews at the University of Gondar, utilizing a structured questionnaire. Analysis Procedures: Descriptive statistics, frequency analysis, and binary logistic regression were employed to analyze the data and identify factors associated with visual impairment in adult glaucoma patients. Question Addressed: The study sought to answer the question of the prevalence of visual impairment and its associated factors among adult glaucoma patients at the University of Gondar in Northwest Ethiopia. Conclusion: The research concludes that visual impairment is significantly high among adult glaucoma patients in this setting, with several factors playing a role in its occurrence.

Keywords: visual impairment, glaucoma, Ethiopia, Gondar

Procedia PDF Downloads 24
8735 Derivation of Bathymetry from High-Resolution Satellite Images: Comparison of Empirical Methods through Geographical Error Analysis

Authors: Anusha P. Wijesundara, Dulap I. Rathnayake, Nihal D. Perera

Abstract:

Bathymetric information is fundamental importance to coastal and marine planning and management, nautical navigation, and scientific studies of marine environments. Satellite-derived bathymetry data provide detailed information in areas where conventional sounding data is lacking and conventional surveys are inaccessible. The two empirical approaches of log-linear bathymetric inversion model and non-linear bathymetric inversion model are applied for deriving bathymetry from high-resolution multispectral satellite imagery. This study compares these two approaches by means of geographical error analysis for the site Kankesanturai using WorldView-2 satellite imagery. Based on the Levenberg-Marquardt method calibrated the parameters of non-linear inversion model and the multiple-linear regression model was applied to calibrate the log-linear inversion model. In order to calibrate both models, Single Beam Echo Sounding (SBES) data in this study area were used as reference points. Residuals were calculated as the difference between the derived depth values and the validation echo sounder bathymetry data and the geographical distribution of model residuals was mapped. The spatial autocorrelation was calculated by comparing the performance of the bathymetric models and the results showing the geographic errors for both models. A spatial error model was constructed from the initial bathymetry estimates and the estimates of autocorrelation. This spatial error model is used to generate more reliable estimates of bathymetry by quantifying autocorrelation of model error and incorporating this into an improved regression model. Log-linear model (R²=0.846) performs better than the non- linear model (R²=0.692). Finally, the spatial error models improved bathymetric estimates derived from linear and non-linear models up to R²=0.854 and R²=0.704 respectively. The Root Mean Square Error (RMSE) was calculated for all reference points in various depth ranges. The magnitude of the prediction error increases with depth for both the log-linear and the non-linear inversion models. Overall RMSE for log-linear and the non-linear inversion models were ±1.532 m and ±2.089 m, respectively.

Keywords: log-linear model, multi spectral, residuals, spatial error model

Procedia PDF Downloads 273
8734 Machine Learning Techniques for Estimating Ground Motion Parameters

Authors: Farid Khosravikia, Patricia Clayton

Abstract:

The main objective of this study is to evaluate the advantages and disadvantages of various machine learning techniques in forecasting ground-motion intensity measures given source characteristics, source-to-site distance, and local site condition. Intensity measures such as peak ground acceleration and velocity (PGA and PGV, respectively) as well as 5% damped elastic pseudospectral accelerations at different periods (PSA), are indicators of the strength of shaking at the ground surface. Estimating these variables for future earthquake events is a key step in seismic hazard assessment and potentially subsequent risk assessment of different types of structures. Typically, linear regression-based models, with pre-defined equations and coefficients, are used in ground motion prediction. However, due to the restrictions of the linear regression methods, such models may not capture more complex nonlinear behaviors that exist in the data. Thus, this study comparatively investigates potential benefits from employing other machine learning techniques as a statistical method in ground motion prediction such as Artificial Neural Network, Random Forest, and Support Vector Machine. The algorithms are adjusted to quantify event-to-event and site-to-site variability of the ground motions by implementing them as random effects in the proposed models to reduce the aleatory uncertainty. All the algorithms are trained using a selected database of 4,528 ground-motions, including 376 seismic events with magnitude 3 to 5.8, recorded over the hypocentral distance range of 4 to 500 km in Oklahoma, Kansas, and Texas since 2005. The main reason of the considered database stems from the recent increase in the seismicity rate of these states attributed to petroleum production and wastewater disposal activities, which necessities further investigation in the ground motion models developed for these states. Accuracy of the models in predicting intensity measures, generalization capability of the models for future data, as well as usability of the models are discussed in the evaluation process. The results indicate the algorithms satisfy some physically sound characteristics such as magnitude scaling distance dependency without requiring pre-defined equations or coefficients. Moreover, it is shown that, when sufficient data is available, all the alternative algorithms tend to provide more accurate estimates compared to the conventional linear regression-based method, and particularly, Random Forest outperforms the other algorithms. However, the conventional method is a better tool when limited data is available.

Keywords: artificial neural network, ground-motion models, machine learning, random forest, support vector machine

Procedia PDF Downloads 92
8733 Association of the Time in Targeted Blood Glucose Range of 3.9–10 Mmol/L with the Mortality of Critically Ill Patients with or without Diabetes

Authors: Guo Yu, Haoming Ma, Peiru Zhou

Abstract:

BACKGROUND: In addition to hyperglycemia, hypoglycemia, and glycemic variability, a decrease in the time in the targeted blood glucose range (TIR) may be associated with an increased risk of death for critically ill patients. However, the relationship between the TIR and mortality may be influenced by the presence of diabetes and glycemic variability. METHODS: A total of 998 diabetic and non-diabetic patients with severe diseases in the ICU were selected for this retrospective analysis. The TIR is defined as the percentage of time spent in the target blood glucose range of 3.9–10.0 mmol/L within 24 hours. The relationship between TIR and in-hospital in diabetic and non-diabetic patients was analyzed. The effect of glycemic variability was also analyzed. RESULTS: The binary logistic regression model showed that there was a significant association between the TIR as a continuous variable and the in-hospital death of severely ill non-diabetic patients (OR=0.991, P=0.015). As a classification variable, TIR≥70% was significantly associated with in-hospital death (OR=0.581, P=0.003). Specifically, TIR≥70% was a protective factor for the in-hospital death of severely ill non-diabetic patients. The TIR of severely ill diabetic patients was not significantly associated with in-hospital death; however, glycemic variability was significantly and independently associated with in-hospital death (OR=1.042, P=0.027). Binary logistic regression analysis of comprehensive indices showed that for non-diabetic patients, the C3 index (low TIR & high CV) was a risk factor for increased mortality (OR=1.642, P<0.001). In addition, for diabetic patients, the C3 index was an independent risk factor for death (OR=1.994, P=0.008), and the C4 index (low TIR & low CV) was independently associated with increased survival. CONCLUSIONS: The TIR of non-diabetic patients during ICU hospitalization was associated with in-hospital death even after adjusting for disease severity and glycemic variability. There was no significant association between the TIR and mortality of diabetic patients. However, for both diabetic and non-diabetic critically ill patients, the combined effect of high TIR and low CV was significantly associated with ICU mortality. Diabetic patients seem to have higher blood glucose fluctuations and can tolerate a large TIR range. Both diabetic and non-diabetic critically ill patients should maintain blood glucose levels within the target range to reduce mortality.

Keywords: severe disease, diabetes, blood glucose control, time in targeted blood glucose range, glycemic variability, mortality

Procedia PDF Downloads 195
8732 An Experimental Machine Learning Analysis on Adaptive Thermal Comfort and Energy Management in Hospitals

Authors: Ibrahim Khan, Waqas Khalid

Abstract:

The Healthcare sector is known to consume a higher proportion of total energy consumption in the HVAC market owing to an excessive cooling and heating requirement in maintaining human thermal comfort in indoor conditions, catering to patients undergoing treatment in hospital wards, rooms, and intensive care units. The indoor thermal comfort conditions in selected hospitals of Islamabad, Pakistan, were measured on a real-time basis with the collection of first-hand experimental data using calibrated sensors measuring Ambient Temperature, Wet Bulb Globe Temperature, Relative Humidity, Air Velocity, Light Intensity and CO2 levels. The Experimental data recorded was analyzed in conjunction with the Thermal Comfort Questionnaire Surveys, where the participants, including patients, doctors, nurses, and hospital staff, were assessed based on their thermal sensation, acceptability, preference, and comfort responses. The Recorded Dataset, including experimental and survey-based responses, was further analyzed in the development of a correlation between operative temperature, operative relative humidity, and other measured operative parameters with the predicted mean vote and adaptive predicted mean vote, with the adaptive temperature and adaptive relative humidity estimated using the seasonal data set gathered for both summer – hot and dry, and hot and humid as well as winter – cold and dry, and cold and humid climate conditions. The Machine Learning Logistic Regression Algorithm was incorporated to train the operative experimental data parameters and develop a correlation between patient sensations and the thermal environmental parameters for which a new ML-based adaptive thermal comfort model was proposed and developed in our study. Finally, the accuracy of our model was determined using the K-fold cross-validation.

Keywords: predicted mean vote, thermal comfort, energy management, logistic regression, machine learning

Procedia PDF Downloads 27
8731 Comparative Study on Daily Discharge Estimation of Soolegan River

Authors: Redvan Ghasemlounia, Elham Ansari, Hikmet Kerem Cigizoglu

Abstract:

Hydrological modeling in arid and semi-arid regions is very important. Iran has many regions with these climate conditions such as Chaharmahal and Bakhtiari province that needs lots of attention with an appropriate management. Forecasting of hydrological parameters and estimation of hydrological events of catchments, provide important information that used for design, management and operation of water resources such as river systems, and dams, widely. Discharge in rivers is one of these parameters. This study presents the application and comparison of some estimation methods such as Feed-Forward Back Propagation Neural Network (FFBPNN), Multi Linear Regression (MLR), Gene Expression Programming (GEP) and Bayesian Network (BN) to predict the daily flow discharge of the Soolegan River, located at Chaharmahal and Bakhtiari province, in Iran. In this study, Soolegan, station was considered. This Station is located in Soolegan River at 51° 14՜ Latitude 31° 38՜ longitude at North Karoon basin. The Soolegan station is 2086 meters higher than sea level. The data used in this study are daily discharge and daily precipitation of Soolegan station. Feed Forward Back Propagation Neural Network(FFBPNN), Multi Linear Regression (MLR), Gene Expression Programming (GEP) and Bayesian Network (BN) models were developed using the same input parameters for Soolegan's daily discharge estimation. The results of estimation models were compared with observed discharge values to evaluate performance of the developed models. Results of all methods were compared and shown in tables and charts.

Keywords: ANN, multi linear regression, Bayesian network, forecasting, discharge, gene expression programming

Procedia PDF Downloads 533
8730 Effect of Traffic Volume and Its Composition on Vehicular Speed under Mixed Traffic Conditions: A Kriging Based Approach

Authors: Subhadip Biswas, Shivendra Maurya, Satish Chandra, Indrajit Ghosh

Abstract:

Use of speed prediction models sometimes appears as a feasible alternative to laborious field measurement particularly, in case when field data cannot fulfill designer’s requirements. However, developing speed models is a challenging task specifically in the context of developing countries like India where vehicles with diverse static and dynamic characteristics use the same right of way without any segregation. Here the traffic composition plays a significant role in determining the vehicular speed. The present research was carried out to examine the effects of traffic volume and its composition on vehicular speed under mixed traffic conditions. Classified traffic volume and speed data were collected from different geometrically identical six lane divided arterials in New Delhi. Based on these field data, speed prediction models were developed for individual vehicle category adopting Kriging approximation technique, an alternative for commonly used regression. These models are validated with the data set kept aside earlier for validation purpose. The predicted speeds showed a great deal of agreement with the observed values and also the model outperforms all other existing speed models. Finally, the proposed models were utilized to evaluate the effect of traffic volume and its composition on speed.

Keywords: speed, Kriging, arterial, traffic volume

Procedia PDF Downloads 330
8729 Model Averaging for Poisson Regression

Authors: Zhou Jianhong

Abstract:

Model averaging is a desirable approach to deal with model uncertainty, which, however, has rarely been explored for Poisson regression. In this paper, we propose a model averaging procedure based on an unbiased estimator of the expected Kullback-Leibler distance for the Poisson regression. Simulation study shows that the proposed model average estimator outperforms some other commonly used model selection and model average estimators in some situations. Our proposed methods are further applied to a real data example and the advantage of this method is demonstrated again.

Keywords: model averaging, poission regression, Kullback-Leibler distance, statistics

Procedia PDF Downloads 488
8728 Establishment of the Regression Uncertainty of the Critical Heat Flux Power Correlation for an Advanced Fuel Bundle

Authors: L. Q. Yuan, J. Yang, A. Siddiqui

Abstract:

A new regression uncertainty analysis methodology was applied to determine the uncertainties of the critical heat flux (CHF) power correlation for an advanced 43-element bundle design, which was developed by Canadian Nuclear Laboratories (CNL) to achieve improved economics, resource utilization and energy sustainability. The new methodology is considered more appropriate than the traditional methodology in the assessment of the experimental uncertainty associated with regressions. The methodology was first assessed using both the Monte Carlo Method (MCM) and the Taylor Series Method (TSM) for a simple linear regression model, and then extended successfully to a non-linear CHF power regression model (CHF power as a function of inlet temperature, outlet pressure and mass flow rate). The regression uncertainty assessed by MCM agrees well with that by TSM. An equation to evaluate the CHF power regression uncertainty was developed and expressed as a function of independent variables that determine the CHF power.

Keywords: CHF experiment, CHF correlation, regression uncertainty, Monte Carlo Method, Taylor Series Method

Procedia PDF Downloads 391
8727 The Relationship between Depression, HIV Stigma and Adherence to Antiretroviral Therapy among Adult Patients Living with HIV at a Tertiary Hospital in Durban, South Africa: The Mediating Roles of Self-Efficacy and Social Support

Authors: Muziwandile Luthuli

Abstract:

Although numerous factors predicting adherence to antiretroviral therapy (ART) among people living with HIV/AIDS (PLWHA) have been broadly studied on both regional and global level, up-to-date adherence of patients to ART remains an overarching, dynamic and multifaceted problem that needs to be investigated over time and across various contexts. There is a rarity of empirical data in the literature on interactive mechanisms by which psychosocial factors influence adherence to ART among PLWHA within the South African context. Therefore, this study was designed to investigate the relationship between depression, HIV stigma, and adherence to ART among adult patients living with HIV at a tertiary hospital in Durban, South Africa, and the mediating roles of self-efficacy and social support. The health locus of control theory and the social support theory were the underlying theoretical frameworks for this study. Using a cross-sectional research design, a total of 201 male and female adult patients aged between 18-75 years receiving ART at a tertiary hospital in Durban, KwaZulu-Natal were sampled, using time location sampling (TLS). A self-administered questionnaire was employed to collect the data in this study. Data were analysed through SPSS version 27. Several statistical analyses were conducted in this study, namely univariate statistical analysis, correlational analysis, Pearson’s chi-square analysis, cross-tabulation analysis, binary logistic regression analysis, and mediational analysis. Univariate analysis indicated that the sample mean age was 39.28 years (SD=12.115), while most participants were females 71.0% (n=142), never married 74.2% (n=147), and most were also secondary school educated 48.3% (n=97), as well as unemployed 65.7% (n=132). The prevalence rate of participants who had high adherence to ART was 53.7% (n=108), and 46.3% (n=93) of participants had low adherence to ART. Chi-square analysis revealed that employment status was the only statistically significant socio-demographic influence of adherence to ART in this study (χ2 (3) = 8.745; p < .033). Chi-square analysis showed that there was a statistically significant difference found between depression and adherence to ART (χ2 (4) = 16.140; p < .003), while between HIV stigma and adherence to ART, no statistically significant difference was found (χ2 (1) = .323; p >.570). Binary logistic regression indicated that depression was statistically associated with adherence to ART (OR= .853; 95% CI, .789–.922, P < 001), while the association between self-efficacy and adherence to ART was statistically significant (OR= 1.04; 95% CI, 1.001– 1.078, P < .045) after controlling for the effect of depression. However, the findings showed that the effect of depression on adherence to ART was not significantly mediated by self-efficacy (Sobel test for indirect effect, Z= 1.01, P > 0.31). Binary logistic regression showed that the effect of HIV stigma on adherence to ART was not statistically significant (OR= .980; 95% CI, .937– 1.025, P > .374), but the effect of social support on adherence to ART was statistically significant, only after the effect of HIV stigma was controlled for (OR= 1.017; 95% CI, 1.000– 1.035, P < .046). This study promotes behavioral and social change effected through evidence-based interventions by emphasizing the need for additional research that investigates the interactive mechanisms by which psychosocial factors influence adherence to ART. Depression is a significant predictor of adherence to ART. Thus, to alleviate the psychosocial impact of depression on adherence to ART, effective interventions must be devised, along with special consideration of self-efficacy and social support. Therefore, this study is helpful in informing and effecting change in health policy and healthcare services through its findings

Keywords: ART adherence, depression, HIV/AIDS, PLWHA

Procedia PDF Downloads 160
8726 Assessment of Level of Sedation and Associated Factors Among Intubated Critically Ill Children in Pediatric Intensive Care Unit of Jimma University Medical Center: A Fourteen Months Prospective Observation Study, 2023

Authors: Habtamu Wolde Engudai

Abstract:

Background: Sedation can be provided to facilitate a procedure or to stabilize patients admitted in pediatric intensive care unit (PICU). Sedation is often necessary to maintain optimal care for critically ill children requiring mechanical ventilation. However, if sedation is too deep or too light, it has its own adverse effects, and hence, it is important to monitor the level of sedation and maintain an optimal level. Objectives: The objective is to assess the level of sedation and associated factors among intubated critically ill children admitted to PICU of JUMC, Jimma. Methods: A prospective observation study was conducted in the PICU of JUMC in September 2021 in 105 patients who were going to be admitted to the PICU aged less than 14 and with GCS >8. Data was collected by residents and nurses working in PICU. Data entry was done by Epi data manager (version 4.6.0.2). Statistical analysis and the creation of charts is going to be performed using SPSS version 26. Data was presented as mean, percentage and standard deviation. The assumption of logistic regression and the result of the assumption will be checked. To find potential predictors, bi-variable logistic regression was used for each predictor and outcome variable. A p value of <0.05 was considered as statistically significant. Finally, findings have been presented using figures, AOR, percentages, and a summary table. Result: in this study, 105 critically ill children had been involved who were started on continuous or intermittent forms of sedative drugs. Sedation level was assessed using a comfort scale three times per day. Based on this observation, we got a 44.8% level of suboptimal sedation at the baseline, a 36.2% level of suboptimal sedation at eight hours, and a 24.8% level of suboptimal sedation at sixteen hours. There is a significant association between suboptimal sedation and duration of stay with mechanical ventilation and the rate of unplanned extubation, which was shown by P < 0.05 using the Hosmer-Lemeshow test of goodness of fit (p> 0.44).

Keywords: level of sedation, critically ill children, Pediatric intensive care unit, Jimma university

Procedia PDF Downloads 30
8725 AutoML: Comprehensive Review and Application to Engineering Datasets

Authors: Parsa Mahdavi, M. Amin Hariri-Ardebili

Abstract:

The development of accurate machine learning and deep learning models traditionally demands hands-on expertise and a solid background to fine-tune hyperparameters. With the continuous expansion of datasets in various scientific and engineering domains, researchers increasingly turn to machine learning methods to unveil hidden insights that may elude classic regression techniques. This surge in adoption raises concerns about the adequacy of the resultant meta-models and, consequently, the interpretation of the findings. In response to these challenges, automated machine learning (AutoML) emerges as a promising solution, aiming to construct machine learning models with minimal intervention or guidance from human experts. AutoML encompasses crucial stages such as data preparation, feature engineering, hyperparameter optimization, and neural architecture search. This paper provides a comprehensive overview of the principles underpinning AutoML, surveying several widely-used AutoML platforms. Additionally, the paper offers a glimpse into the application of AutoML on various engineering datasets. By comparing these results with those obtained through classical machine learning methods, the paper quantifies the uncertainties inherent in the application of a single ML model versus the holistic approach provided by AutoML. These examples showcase the efficacy of AutoML in extracting meaningful patterns and insights, emphasizing its potential to revolutionize the way we approach and analyze complex datasets.

Keywords: automated machine learning, uncertainty, engineering dataset, regression

Procedia PDF Downloads 34
8724 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size

Authors: Jude Opara, Esemokumo Perewarebo Akpos

Abstract:

This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.

Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS

Procedia PDF Downloads 276
8723 The Relationship between Personal, Psycho-Social and Occupational Risk Factors with Low Back Pain Severity in Industrial Workers

Authors: Omid Giahi, Ebrahim Darvishi, Mahdi Akbarzadeh

Abstract:

Introduction: Occupational low back pain (LBP) is one of the most prevalent work-related musculoskeletal disorders in which a lot of risk factors are involved that. The present study focuses on the relation between personal, psycho-social and occupational risk factors and LBP severity in industrial workers. Materials and Methods: This research was a case-control study which was conducted in Kurdistan province. 100 workers (Mean Age ± SD of 39.9 ± 10.45) with LBP were selected as the case group, and 100 workers (Mean Age ± SD of 37.2 ± 8.5) without LBP were assigned into the control group. All participants were selected from various industrial units, and they had similar occupational conditions. The required data including demographic information (BMI, smoking, alcohol, and family history), occupational (posture, mental workload (MWL), force, vibration and repetition), and psychosocial factors (stress, occupational satisfaction and security) of the participants were collected via consultation with occupational medicine specialists, interview, and the related questionnaires and also the NASA-TLX software and REBA worksheet. Chi-square test, logistic regression and structural equation modeling (SEM) were used to analyze the data. For analysis of data, IBM Statistics SPSS 24 and Mplus6 software have been used. Results: 114 (77%) of the individuals were male and 86 were (23%) female. Mean Career length of the Case Group and Control Group were 10.90 ± 5.92, 9.22 ± 4.24, respectively. The statistical analysis of the data revealed that there was a significant correlation between the Posture, Smoking, Stress, Satisfaction, and MWL with occupational LBP. The odds ratios (95% confidence intervals) derived from a logistic regression model were 2.7 (1.27-2.24) and 2.5 (2.26-5.17) and 3.22 (2.47-3.24) for Stress, MWL, and Posture, respectively. Also, the SEM analysis of the personal, psycho-social and occupational factors with LBP revealed that there was a significant correlation. Conclusion: All three broad categories of risk factors simultaneously increase the risk of occupational LBP in the workplace. But, the risks of Posture, Stress, and MWL have a major role in LBP severity. Therefore, prevention strategies for persons in jobs with high risks for LBP are required to decrease the risk of occupational LBP.

Keywords: industrial workers occupational, low back pain, occupational risk factors, psychosocial factors

Procedia PDF Downloads 232
8722 Optimum Design of Alkali Activated Slag Concretes for Low Chloride Ion Permeability and Water Absorption Capacity

Authors: Müzeyyen Balçikanli, Erdoğan Özbay, Hakan Tacettin Türker, Okan Karahan, Cengiz Duran Atiş

Abstract:

In this research, effect of curing time (TC), curing temperature (CT), sodium concentration (SC) and silicate modules (SM) on the compressive strength, chloride ion permeability, and water absorption capacity of alkali activated slag (AAS) concretes were investigated. For maximization of compressive strength while for minimization of chloride ion permeability and water absorption capacity of AAS concretes, best possible combination of CT, CTime, SC and SM were determined. An experimental program was conducted by using the central composite design method. Alkali solution-slag ratio was kept constant at 0.53 in all mixture. The effects of the independent parameters were characterized and analyzed by using statistically significant quadratic regression models on the measured properties (dependent parameters). The proposed regression models are valid for AAS concretes with the SC from 0.1% to 7.5%, SM from 0.4 to 3.2, CT from 20 °C to 94 °C and TC from 1.2 hours to 25 hours. The results of test and analysis indicate that the most effective parameter for the compressive strength, chloride ion permeability and water absorption capacity is the sodium concentration.

Keywords: alkali activation, slag, rapid chloride permeability, water absorption capacity

Procedia PDF Downloads 285
8721 Coverage Probability Analysis of WiMAX Network under Additive White Gaussian Noise and Predicted Empirical Path Loss Model

Authors: Chaudhuri Manoj Kumar Swain, Susmita Das

Abstract:

This paper explores a detailed procedure of predicting a path loss (PL) model and its application in estimating the coverage probability in a WiMAX network. For this a hybrid approach is followed in predicting an empirical PL model of a 2.65 GHz WiMAX network deployed in a suburban environment. Data collection, statistical analysis, and regression analysis are the phases of operations incorporated in this approach and the importance of each of these phases has been discussed properly. The procedure of collecting data such as received signal strength indicator (RSSI) through experimental set up is demonstrated. From the collected data set, empirical PL and RSSI models are predicted with regression technique. Furthermore, with the aid of the predicted PL model, essential parameters such as PL exponent as well as the coverage probability of the network are evaluated. This research work may assist in the process of deployment and optimisation of any cellular network significantly.

Keywords: WiMAX, RSSI, path loss, coverage probability, regression analysis

Procedia PDF Downloads 140
8720 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates

Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe

Abstract:

Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.

Keywords: machine learning, MTB, WGS, drug resistant TB

Procedia PDF Downloads 23
8719 Automatic and High Precise Modeling for System Optimization

Authors: Stephanie Chen, Mitja Echim, Christof Büskens

Abstract:

To describe and propagate the behavior of a system mathematical models are formulated. Parameter identification is used to adapt the coefficients of the underlying laws of science. For complex systems this approach can be incomplete and hence imprecise and moreover too slow to be computed efficiently. Therefore, these models might be not applicable for the numerical optimization of real systems, since these techniques require numerous evaluations of the models. Moreover not all quantities necessary for the identification might be available and hence the system must be adapted manually. Therefore, an approach is described that generates models that overcome the before mentioned limitations by not focusing on physical laws, but on measured (sensor) data of real systems. The approach is more general since it generates models for every system detached from the scientific background. Additionally, this approach can be used in a more general sense, since it is able to automatically identify correlations in the data. The method can be classified as a multivariate data regression analysis. In contrast to many other data regression methods this variant is also able to identify correlations of products of variables and not only of single variables. This enables a far more precise and better representation of causal correlations. The basis and the explanation of this method come from an analytical background: the series expansion. Another advantage of this technique is the possibility of real-time adaptation of the generated models during operation. Herewith system changes due to aging, wear or perturbations from the environment can be taken into account, which is indispensable for realistic scenarios. Since these data driven models can be evaluated very efficiently and with high precision, they can be used in mathematical optimization algorithms that minimize a cost function, e.g. time, energy consumption, operational costs or a mixture of them, subject to additional constraints. The proposed method has successfully been tested in several complex applications and with strong industrial requirements. The generated models were able to simulate the given systems with an error in precision less than one percent. Moreover the automatic identification of the correlations was able to discover so far unknown relationships. To summarize the above mentioned approach is able to efficiently compute high precise and real-time-adaptive data-based models in different fields of industry. Combined with an effective mathematical optimization algorithm like WORHP (We Optimize Really Huge Problems) several complex systems can now be represented by a high precision model to be optimized within the user wishes. The proposed methods will be illustrated with different examples.

Keywords: adaptive modeling, automatic identification of correlations, data based modeling, optimization

Procedia PDF Downloads 376