Search results for: multinomial Naive Bayes
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 257

Search results for: multinomial Naive Bayes

167 Modelling the Effect of Physical Environment Factors on Child Pedestrian Severity Collisions in Malaysia: A Multinomial Logistic Regression Analysis

Authors: Muhamad N. Borhan, Nur S. Darus, Siti Z. Ishak, Rozmi Ismail, Siti F. M. Razali

Abstract:

Children are at the greater risk to be involved in road traffic collisions due to the complex interaction of various elements in our transportation system. It encompasses interactions between the elements of children and driver behavior along with physical and social environment factors. The present study examined the effect between the collisions severity and physical environment factors on child pedestrian collisions. The severity of collisions is categorized into four injury outcomes: fatal, serious injury, slight injury, and damage. The sample size comprised of 2487 cases of child pedestrian-vehicle collisions in which children aged 7 to 12 years old was involved in Malaysia for the years 2006-2015. A multinomial logistic regression was applied to establish the effect between severity levels and physical environment factors. The results showed that eight contributing factors influence the probability of an injury road surface material, traffic system, road marking, control type, lighting condition, type of location, land use and road surface condition. Understanding the effect of physical environment factors may contribute to the improvement of physical environment design and decrease the collision involvement.

Keywords: child pedestrian, collisions, primary school, road injuries

Procedia PDF Downloads 142
166 Poverty Dynamics in Thailand: Evidence from Household Panel Data

Authors: Nattabhorn Leamcharaskul

Abstract:

This study aims to examine determining factors of the dynamics of poverty in Thailand by using panel data of 3,567 households in 2007-2017. Four techniques of estimation are employed to analyze the situation of poverty across households and time periods: the multinomial logit model, the sequential logit model, the quantile regression model, and the difference in difference model. Households are categorized based on their experiences into 5 groups, namely chronically poor, falling into poverty, re-entering into poverty, exiting from poverty and never poor households. Estimation results emphasize the effects of demographic and socioeconomic factors as well as unexpected events on the economic status of a household. It is found that remittances have positive impact on household’s economic status in that they are likely to lower the probability of falling into poverty or trapping in poverty while they tend to increase the probability of exiting from poverty. In addition, not only receiving a secondary source of household income can raise the probability of being a never poor household, but it also significantly increases household income per capita of the chronically poor and falling into poverty households. Public work programs are recommended as an important tool to relieve household financial burden and uncertainty and thus consequently increase a chance for households to escape from poverty.

Keywords: difference in difference, dynamic, multinomial logit model, panel data, poverty, quantile regression, remittance, sequential logit model, Thailand, transfer

Procedia PDF Downloads 83
165 Early Warning System of Financial Distress Based On Credit Cycle Index

Authors: Bi-Huei Tsai

Abstract:

Previous studies on financial distress prediction choose the conventional failing and non-failing dichotomy; however, the distressed extent differs substantially among different financial distress events. To solve the problem, “non-distressed”, “slightly-distressed” and “reorganization and bankruptcy” are used in our article to approximate the continuum of corporate financial health. This paper explains different financial distress events using the two-stage method. First, this investigation adopts firm-specific financial ratios, corporate governance and market factors to measure the probability of various financial distress events based on multinomial logit models. Specifically, the bootstrapping simulation is performed to examine the difference of estimated misclassifying cost (EMC). Second, this work further applies macroeconomic factors to establish the credit cycle index and determines the distressed cut-off indicator of the two-stage models using such index. Two different models, one-stage and two-stage prediction models, are developed to forecast financial distress, and the results acquired from different models are compared with each other, and with the collected data. The findings show that the two-stage model incorporating financial ratios, corporate governance and market factors has the lowest misclassification error rate. The two-stage model is more accurate than the one-stage model as its distressed cut-off indicators are adjusted according to the macroeconomic-based credit cycle index.

Keywords: Multinomial logit model, corporate governance, company failure, reorganization, bankruptcy

Procedia PDF Downloads 350
164 Evaluation of Machine Learning Algorithms and Ensemble Methods for Prediction of Students’ Graduation

Authors: Soha A. Bahanshal, Vaibhav Verdhan, Bayong Kim

Abstract:

Graduation rates at six-year colleges are becoming a more essential indicator for incoming fresh students and for university rankings. Predicting student graduation is extremely beneficial to schools and has a huge potential for targeted intervention. It is important for educational institutions since it enables the development of strategic plans that will assist or improve students' performance in achieving their degrees on time (GOT). A first step and a helping hand in extracting useful information from these data and gaining insights into the prediction of students' progress and performance is offered by machine learning techniques. Data analysis and visualization techniques are applied to understand and interpret the data. The data used for the analysis contains students who have graduated in 6 years in the academic year 2017-2018 for science majors. This analysis can be used to predict the graduation of students in the next academic year. Different Predictive modelings such as logistic regression, decision trees, support vector machines, Random Forest, Naïve Bayes, and KNeighborsClassifier are applied to predict whether a student will graduate. These classifiers were evaluated with k folds of 5. The performance of these classifiers was compared based on accuracy measurement. The results indicated that Ensemble Classifier achieves better accuracy, about 91.12%. This GOT prediction model would hopefully be useful to university administration and academics in developing measures for assisting and boosting students' academic performance and ensuring they graduate on time.

Keywords: prediction, decision trees, machine learning, support vector machine, ensemble model, student graduation, GOT graduate on time

Procedia PDF Downloads 48
163 Prediction of Covid-19 Cases and Current Situation of Italy and Its Different Regions Using Machine Learning Algorithm

Authors: Shafait Hussain Ali

Abstract:

Since its outbreak in China, the Covid_19 19 disease has been caused by the corona virus SARS N coyote 2. Italy was the first Western country to be severely affected, and the first country to take drastic measures to control the disease. In start of December 2019, the sudden outbreaks of the Coronary Virus Disease was caused by a new Corona 2 virus (SARS-CO2) of acute respiratory syndrome in china city Wuhan. The World Health Organization declared the epidemic a public health emergency of international concern on January 30, 2020,. On February 14, 2020, 49,053 laboratory-confirmed deaths and 1481 deaths have been reported worldwide. The threat of the disease has forced most of the governments to implement various control measures. Therefore it becomes necessary to analyze the Italian data very carefully, in particular to investigates and to find out the present condition and the number of infected persons in the form of positive cases, death, hospitalized or some other features of infected persons will clear in simple form. So used such a model that will clearly shows the real facts and figures and also understandable to every readable person which can get some real benefit after reading it. The model used must includes(total positive cases, current positive cases, hospitalized patients, death, recovered peoples frequency rates ) all features that explains and clear the wide range facts in very simple form and helpful to administration of that country.

Keywords: machine learning tools and techniques, rapid miner tool, Naive-Bayes algorithm, predictions

Procedia PDF Downloads 82
162 A Hybrid Fuzzy Clustering Approach for Fertile and Unfertile Analysis

Authors: Shima Soltanzadeh, Mohammad Hosain Fazel Zarandi, Mojtaba Barzegar Astanjin

Abstract:

Diagnosis of male infertility by the laboratory tests is expensive and, sometimes it is intolerable for patients. Filling out the questionnaire and then using classification method can be the first step in decision-making process, so only in the cases with a high probability of infertility we can use the laboratory tests. In this paper, we evaluated the performance of four classification methods including naive Bayesian, neural network, logistic regression and fuzzy c-means clustering as a classification, in the diagnosis of male infertility due to environmental factors. Since the data are unbalanced, the ROC curves are most suitable method for the comparison. In this paper, we also have selected the more important features using a filtering method and examined the impact of this feature reduction on the performance of each methods; generally, most of the methods had better performance after applying the filter. We have showed that using fuzzy c-means clustering as a classification has a good performance according to the ROC curves and its performance is comparable to other classification methods like logistic regression.

Keywords: classification, fuzzy c-means, logistic regression, Naive Bayesian, neural network, ROC curve

Procedia PDF Downloads 304
161 A Probabilistic Theory of the Buy-Low and Sell-High for Algorithmic Trading

Authors: Peter Shi

Abstract:

Algorithmic trading is a rapidly expanding domain within quantitative finance, constituting a substantial portion of trading volumes in the US financial market. The demand for rigorous and robust mathematical theories underpinning these trading algorithms is ever-growing. In this study, the author establishes a new stock market model that integrates the Efficient Market Hypothesis and the statistical arbitrage. The model, for the first time, finds probabilistic relations between the rational price and the market price in terms of the conditional expectation. The theory consequently leads to a mathematical justification of the old market adage: buy-low and sell-high. The thresholds for “low” and “high” are precisely derived using a max-min operation on Bayes’s error. This explicit connection harmonizes the Efficient Market Hypothesis and Statistical Arbitrage, demonstrating their compatibility in explaining market dynamics. The amalgamation represents a pioneering contribution to quantitative finance. The study culminates in comprehensive numerical tests using historical market data, affirming that the “buy-low” and “sell-high” algorithm derived from this theory significantly outperforms the general market over the long term in four out of six distinct market environments.

Keywords: efficient market hypothesis, behavioral finance, Bayes' decision, algorithmic trading, risk control, stock market

Procedia PDF Downloads 44
160 Applied Complement of Probability and Information Entropy for Prediction in Student Learning

Authors: Kennedy Efosa Ehimwenma, Sujatha Krishnamoorthy, Safiya Al‑Sharji

Abstract:

The probability computation of events is in the interval of [0, 1], which are values that are determined by the number of outcomes of events in a sample space S. The probability Pr(A) that an event A will never occur is 0. The probability Pr(B) that event B will certainly occur is 1. This makes both events A and B a certainty. Furthermore, the sum of probabilities Pr(E₁) + Pr(E₂) + … + Pr(Eₙ) of a finite set of events in a given sample space S equals 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. This paper first discusses Bayes, the complement of probability, and the difference of probability for occurrences of learning-events before applying them in the prediction of learning objects in student learning. Given the sum of 1; to make a recommendation for student learning, this paper proposes that the difference of argMaxPr(S) and the probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates i) the probability of skill-set events that have occurred that would lead to higher-level learning; ii) the probability of the events that have not occurred that requires subject-matter relearning; iii) accuracy of the decision tree in the prediction of student performance into class labels and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning.

Keywords: complement of probability, Bayes’ rule, prediction, pre-assessments, computational education, information theory

Procedia PDF Downloads 127
159 Effectiveness, Safety, and Tolerability Profile of Stribild® in HIV-1-infected Patients in the Clinical Setting

Authors: Heiko Jessen, Laura Tanus, Slobodan Ruzicic

Abstract:

Objectives: The efficacy of Stribild®, an integrase strand transfer inhibitor (INSTI) -based STR, has been evaluated in randomized clinical trials and it has demonstrated durable capability in terms of achieving sustained suppression of HIV-1 RNA-levels. However, differences in monitoring frequency, existing selection bias and profile of patients enrolled in the trials, may all result in divergent efficacy of this regimen in routine clinical settings. The aim of this study was to assess the virologic outcomes, safety and tolerability profile of Stribild® in a routine clinical setting. Methods: This was a retrospective monocentric analysis on HIV-1-infected patients, who started with or were switched to Stribild®. Virological failure (VF) was defined as confirmed HIV-RNA>50 copies/ml. The minimum time of follow-up was 24 weeks. The percentage of patients remaining free of therapeutic failure was estimated using the time-to-loss-of-virologic-response (TLOVR) algorithm, by intent-to-treat analysis. Results: We analyzed the data of 197 patients (56 ART-naïve and 141 treatment-experienced patients), who fulfilled the inclusion criteria. Majority (95.9%) of patients were male. The median time of HIV-infection at baseline was 2 months in treatment-naïve and 70 months in treatment-experienced patients. Median time [IQR] under ART in treatment-experienced patients was 37 months. Among the treatment-experienced patients 27.0% had already been treated with a regimen consisting of two NRTIs and one INSTI, whereas 18.4% of them experienced a VF. The median time [IQR] of virological suppression prior to therapy with Stribild® in the treatment-experienced patients was 10 months [0-27]. At the end of follow-up (median 33 months), 87.3% (95% CI, 83.5-91.2) of treatment-naïve and 80.3% (95% CI, 75.8-84.8) of treatment-experienced patients remained free of therapeutic failure. Considering only treatment-experienced patients with baseline VL<50 copies/ml, 83.0% (95% CI, 78.5-87.5) remained free of therapeutic failure. A total of 17 patients stopped treatment with Stribild®, 5.4% (3/56) of them were treatment-naïve and 9.9% (14/141) were treatment-experienced patients. The Stribild® therapy was discontinued in 2 (1.0%) because of VF, loss to follow-up in 4 (2.0%), and drug-drug interactions in 2 (1.0%) patients. Adverse events were in 7 (3.6%) patients the reason to switch from therapy with Stribild® and further 2 (1.0%) patients decided personally to switch. The most frequently observed adverse events were gastrointestinal side effects (20.0%), headache (8%), rash events (7%) and dizziness (6%). In two patients we observed an emergence of novel resistances in integrase-gene. The N155H evolved in one patient and resulted in VF. In another patient S119R evolved either during or shortly upon switch from therapy with Stribild®. In one further patient with VF two novel mutations in the RT-gene were observed when compared to historical genotypic test result (V106I/M and M184V), whereby it is not clear whether they evolved during or already before the switch to Stribild®. Conclusions: Effectiveness of Stribild® for treatment-naïve patients was consistent with data obtained in clinical trials. The safety and tolerability profile as well as resistance development confirmed clinical efficacy of Stribild® in a daily practice setting.

Keywords: ART, HIV, integrase inhibitor, stribild

Procedia PDF Downloads 260
158 The Role of the Gut Microbiome of Marine Invertebrates in the Degradation of Complex Algal Substrates

Authors: Yuchen LI, Martyn Kurr, Peter Golyshin

Abstract:

Biological invasion is a global problem. Invasive species can threaten local ecosystems by competing for resources, consuming local species, and reproducing faster than natives. Sargassum muticum is an invasive algae in the UK. It negatively impacts local algae through overshading and can cause reductions in local biodiversity. One possibility for its success is herbivore release. According to the Enemy Release Hypothesis, invasives are less impacted by local herbivores than natives. In many species, gastrointestinal (GI) tract microbes have been found as a key factor in food preference and similar mechanisms may exist in the relationship between local consumers and S. muticum. Some populations of native Littorina snails accept S. muticum as a food source, while others avoid it. This project aims to establish the relationship between GI tract microbes and the feeding preferences of L. littorea, when offered both native algae and S. muticum. Individuals of L. littorea from a site invaded by S. muticum around 18 years ago were compared to those from an un-invaded site nearby. Sargassum-experienced snails are more likely to consume it than those naïve, and pronounced differences were found in the GI-tract microbial communities through 16S (prokaryote) and 18S (eukaryote) sequencing. Sargassum-naïve snails were then exposed to a faecal pellets from experienced snails to ‘inoculate’ them with microbes from the exposed snails. Preliminary results suggest these faecal-pellet-exposed but otherwise Sargassum-naïve snails subsequently begun consuming S. muticum. It is unclear if these results are due to genuine changes in GI-tract microbes or through some other mechanism, such as behavioural responses to chemical cues in the faecal pellets, but these results are nevertheless of significance for invasive ecology, suggesting that foraging preferences for an invasive prey type are malleable and possibly programmable in laboratory settings.

Keywords: invasive algae, sea snails, gut microbiome, biocontrol

Procedia PDF Downloads 47
157 A Bayesian Classification System for Facilitating an Institutional Risk Profile Definition

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for easy creation and classification of institutional risk profiles supporting endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support set up of the most important risk factors. Subsequently, risk profiles employ risk factors classifier and associated configurations to support digital preservation experts with a semi-automatic estimation of endangerment group for file format risk profiles. Our goal is to make use of an expert knowledge base, accuired through a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation of risk factors for a requried dimension for analysis. Using the naive Bayes method, the decision support system recommends to an expert the matching risk profile group for the previously selected institutional risk profile. The proposed methods improve the visibility of risk factor values and the quality of a digital preservation process. The presented approach is designed to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and values of file format risk profiles. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert and to define its profile group. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.

Keywords: linked open data, information integration, digital libraries, data mining

Procedia PDF Downloads 397
156 Agroforestry Systems and Practices and Its Adoption in Kilombero Cluster of Sagcot, Tanzania

Authors: Lazaro E. Nnko, Japhet J. Kashaigili, Gerald C. Monela, Pantaleo K. T. Munishi

Abstract:

Agroforestry systems and practices are perceived to improve livelihood and sustainable management of natural resources. However, their adoption in various regions differs with the biophysical conditions and societal characteristics. This study was conducted in Kilombero District to investigate the factors influencing the adoption of different agroforestry systems and practices in agro-ecosystems and farming systems. A household survey, key informant interviews, and focus group discussion was used for data collection in three villages. Descriptive statistics and multinomial logistic regression in SPSS were applied for analysis. Results show that Igima and Ngajengwa villages had home garden practices dominated, as revealed by 63.3% and 66.7%, respectively, while Mbingu village had mixed intercropping practice with 56.67%. Agrosilvopasture systems were dominant in Igima and Ngajengwa villages with 56.7% and 66.7%, respectively, while in Mbingu village, the dominant system was agrosilviculture with 66.7%. The results from multinomial logistic regression show that different explanatory variable was statistical significance as predictors of the adoption of agroforestry systems and practices. Residence type and sex were the most dominant factor influencing the adoption of agroforestry systems. Duration of stay in the village, availability of extension education, residence, and sex were the dominant factor influencing the adoption of agroforestry practices. The most important and statistically significant factors among these were residence type and sex. The study concludes that agroforestry will be more successful if the local priorities, which include social-economic need characteristics of the society, will be considered in designing systems and practices. The socio-economic need of the community should be addressed in the process of expanding the adoption of agroforestry systems and practices.

Keywords: agroforestry adoption, agroforestry systems, agroforestry practices, agroforestry, Kilombero

Procedia PDF Downloads 85
155 Evaluation of Classification Algorithms for Diagnosis of Asthma in Iranian Patients

Authors: Taha SamadSoltani, Peyman Rezaei Hachesu, Marjan GhaziSaeedi, Maryam Zolnoori

Abstract:

Introduction: Data mining defined as a process to find patterns and relationships along data in the database to build predictive models. Application of data mining extended in vast sectors such as the healthcare services. Medical data mining aims to solve real-world problems in the diagnosis and treatment of diseases. This method applies various techniques and algorithms which have different accuracy and precision. The purpose of this study was to apply knowledge discovery and data mining techniques for the diagnosis of asthma based on patient symptoms and history. Method: Data mining includes several steps and decisions should be made by the user which starts by creation of an understanding of the scope and application of previous knowledge in this area and identifying KD process from the point of view of the stakeholders and finished by acting on discovered knowledge using knowledge conducting, integrating knowledge with other systems and knowledge documenting and reporting.in this study a stepwise methodology followed to achieve a logical outcome. Results: Sensitivity, Specifity and Accuracy of KNN, SVM, Naïve bayes, NN, Classification tree and CN2 algorithms and related similar studies was evaluated and ROC curves were plotted to show the performance of the system. Conclusion: The results show that we can accurately diagnose asthma, approximately ninety percent, based on the demographical and clinical data. The study also showed that the methods based on pattern discovery and data mining have a higher sensitivity compared to expert and knowledge-based systems. On the other hand, medical guidelines and evidence-based medicine should be base of diagnostics methods, therefore recommended to machine learning algorithms used in combination with knowledge-based algorithms.

Keywords: asthma, datamining, classification, machine learning

Procedia PDF Downloads 422
154 Factors Affecting At-Grade Railway Level Crossing Accidents in Bangladesh

Authors: Armana Huq

Abstract:

Railway networks have a significant role in the economy of any country. Similar to other transportation modes, many lives suffer from fatalities or injuries caused by accidents related to the railway. Railway accidents are not as common as roadway accidents yet they are more devastating and damaging than other roadway accidents. Despite that, issues related to railway accidents are not taken into consideration with significant attention as a major threat because of their less frequency compared to other accident categories perhaps. However, the Federal Railroad Administration reported nearly twelve thousand train accidents related to the railroad in the year 2014, resulting in more than eight hundred fatalities and thousands of injuries in the United States alone of which nearly one third fatalities resulted from railway crossing accidents. From an analysis of railway accident data of six years (2005-2010), it has been revealed that 344 numbers of the collision were occurred resulting 200 people dead and 443 people injured in Bangladesh. This paper includes a comprehensive overview of the railway safety situation in Bangladesh from 1998 to 2015. Each year on average, eight fatalities are reported in at-grade level crossings due to railway accidents in Bangladesh. In this paper, the number of railway accidents that occurred in Bangladesh has been presented and a fatality rate of 58.62% has been estimated as the percentage of total at-grade railway level crossing accidents. For this study, analysis of railway accidents in Bangladesh for the period 1998 to 2015 was obtained from the police reported accident database using MAAP (Microcomputer Accident Analysis Package). Investigation of the major contributing factors to the railway accidents has been performed using the Multinomial Logit model. Furthermore, hotspot analysis has been conducted using ArcGIS. Eventually, some suggestions have been provided to mitigate those accidents.

Keywords: safety, human factors, multinomial logit model, railway

Procedia PDF Downloads 123
153 Beyond Adoption: Econometric Analysis of Impacts of Farmer Innovation Systems and Improved Agricultural Technologies on Rice Yield in Ghana

Authors: Franklin N. Mabe, Samuel A. Donkoh, Seidu Al-Hassan

Abstract:

In order to increase and bridge the differences in rice yield, many farmers have resorted to adopting Farmer Innovation Systems (FISs) and Improved Agricultural Technologies (IATs). This study econometrically analysed the impacts of adoption of FISs and IATs on rice yield using multinomial endogenous switching regression (MESR). Nine-hundred and seven (907) rice farmers from Guinea Savannah Zone (GSZ), Forest Savannah Transition Zone (FSTZ) and Coastal Savannah Zone (CSZ) were used for the study. The study used both primary and secondary data. FBO advice, rice farming experience and distance from farming communities to input markets increase farmers’ adoption of only FISs. Factors that increase farmers’ probability of adopting only IATs are access to extension advice, credit, improved seeds and contract farming. Farmers located in CSZ have higher probability of adopting only IATs than their counterparts living in other agro-ecological zones. Age and access to input subsidy increase the probability of jointly adopting FISs and IATs. FISs and IATs have heterogeneous impact on rice yield with adoption of only IATs having the highest impact followed by joint adoption of FISs and IATs. It is important for stakeholders in rice subsector to champion the provision of improved rice seeds, the intensification of agricultural extension services and contract farming concept. Researchers should endeavour to researched into FISs.

Keywords: farmer innovation systems, improved agricultural technologies, multinomial endogenous switching regression, treatment effect

Procedia PDF Downloads 391
152 Data Mining Model for Predicting the Status of HIV Patients during Drug Regimen Change

Authors: Ermias A. Tegegn, Million Meshesha

Abstract:

Human Immunodeficiency Virus and Acquired Immunodeficiency Syndrome (HIV/AIDS) is a major cause of death for most African countries. Ethiopia is one of the seriously affected countries in sub Saharan Africa. Previously in Ethiopia, having HIV/AIDS was almost equivalent to a death sentence. With the introduction of Antiretroviral Therapy (ART), HIV/AIDS has become chronic, but manageable disease. The study focused on a data mining technique to predict future living status of HIV/AIDS patients at the time of drug regimen change when the patients become toxic to the currently taking ART drug combination. The data is taken from University of Gondar Hospital ART program database. Hybrid methodology is followed to explore the application of data mining on ART program dataset. Data cleaning, handling missing values and data transformation were used for preprocessing the data. WEKA 3.7.9 data mining tools, classification algorithms, and expertise are utilized as means to address the research problem. By using four different classification algorithms, (i.e., J48 Classifier, PART rule induction, Naïve Bayes and Neural network) and by adjusting their parameters thirty-two models were built on the pre-processed University of Gondar ART program dataset. The performances of the models were evaluated using the standard metrics of accuracy, precision, recall, and F-measure. The most effective model to predict the status of HIV patients with drug regimen substitution is pruned J48 decision tree with a classification accuracy of 98.01%. This study extracts interesting attributes such as Ever taking Cotrim, Ever taking TbRx, CD4 count, Age, Weight, and Gender so as to predict the status of drug regimen substitution. The outcome of this study can be used as an assistant tool for the clinician to help them make more appropriate drug regimen substitution. Future research directions are forwarded to come up with an applicable system in the area of the study.

Keywords: HIV drug regimen, data mining, hybrid methodology, predictive model

Procedia PDF Downloads 113
151 Multiobjective Optimization of a Pharmaceutical Formulation Using Regression Method

Authors: J. Satya Eswari, Ch. Venkateswarlu

Abstract:

The formulation of a commercial pharmaceutical product involves several composition factors and response characteristics. When the formulation requires to satisfy multiple response characteristics which are conflicting, an optimal solution requires the need for an efficient multiobjective optimization technique. In this work, a regression is combined with a non-dominated sorting differential evolution (NSDE) involving Naïve & Slow and ε constraint techniques to derive different multiobjective optimization strategies, which are then evaluated by means of a trapidil pharmaceutical formulation. The analysis of the results show the effectiveness of the strategy that combines the regression model and NSDE with the integration of both Naïve & Slow and ε constraint techniques for Pareto optimization of trapidil formulation. With this strategy, the optimal formulation at pH=6.8 is obtained with the decision variables of micro crystalline cellulose, hydroxypropyl methylcellulose and compression pressure. The corresponding response characteristics of rate constant and release order are also noted down. The comparison of these results with the experimental data and with those of other multiple regression model based multiobjective evolutionary optimization strategies signify the better performance for optimal trapidil formulation.

Keywords: pharmaceutical formulation, multiple regression model, response surface method, radial basis function network, differential evolution, multiobjective optimization

Procedia PDF Downloads 383
150 Advancements in Predicting Diabetes Biomarkers: A Machine Learning Epigenetic Approach

Authors: James Ladzekpo

Abstract:

Background: The urgent need to identify new pharmacological targets for diabetes treatment and prevention has been amplified by the disease's extensive impact on individuals and healthcare systems. A deeper insight into the biological underpinnings of diabetes is crucial for the creation of therapeutic strategies aimed at these biological processes. Current predictive models based on genetic variations fall short of accurately forecasting diabetes. Objectives: Our study aims to pinpoint key epigenetic factors that predispose individuals to diabetes. These factors will inform the development of an advanced predictive model that estimates diabetes risk from genetic profiles, utilizing state-of-the-art statistical and data mining methods. Methodology: We have implemented a recursive feature elimination with cross-validation using the support vector machine (SVM) approach for refined feature selection. Building on this, we developed six machine learning models, including logistic regression, k-Nearest Neighbors (k-NN), Naive Bayes, Random Forest, Gradient Boosting, and Multilayer Perceptron Neural Network, to evaluate their performance. Findings: The Gradient Boosting Classifier excelled, achieving a median recall of 92.17% and outstanding metrics such as area under the receiver operating characteristics curve (AUC) with a median of 68%, alongside median accuracy and precision scores of 76%. Through our machine learning analysis, we identified 31 genes significantly associated with diabetes traits, highlighting their potential as biomarkers and targets for diabetes management strategies. Conclusion: Particularly noteworthy were the Gradient Boosting Classifier and Multilayer Perceptron Neural Network, which demonstrated potential in diabetes outcome prediction. We recommend future investigations to incorporate larger cohorts and a wider array of predictive variables to enhance the models' predictive capabilities.

Keywords: diabetes, machine learning, prediction, biomarkers

Procedia PDF Downloads 19
149 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: classifier ensemble, breast cancer survivability, data mining, SEER

Procedia PDF Downloads 294
148 Principal Component Analysis Combined Machine Learning Techniques on Pharmaceutical Samples by Laser Induced Breakdown Spectroscopy

Authors: Kemal Efe Eseller, Göktuğ Yazici

Abstract:

Laser-induced breakdown spectroscopy (LIBS) is a rapid optical atomic emission spectroscopy which is used for material identification and analysis with the advantages of in-situ analysis, elimination of intensive sample preparation, and micro-destructive properties for the material to be tested. LIBS delivers short pulses of laser beams onto the material in order to create plasma by excitation of the material to a certain threshold. The plasma characteristics, which consist of wavelength value and intensity amplitude, depends on the material and the experiment’s environment. In the present work, medicine samples’ spectrum profiles were obtained via LIBS. Medicine samples’ datasets include two different concentrations for both paracetamol based medicines, namely Aferin and Parafon. The spectrum data of the samples were preprocessed via filling outliers based on quartiles, smoothing spectra to eliminate noise and normalizing both wavelength and intensity axis. Statistical information was obtained and principal component analysis (PCA) was incorporated to both the preprocessed and raw datasets. The machine learning models were set based on two different train-test splits, which were 70% training – 30% test and 80% training – 20% test. Cross-validation was preferred to protect the models against overfitting; thus the sample amount is small. The machine learning results of preprocessed and raw datasets were subjected to comparison for both splits. This is the first time that all supervised machine learning classification algorithms; consisting of Decision Trees, Discriminant, naïve Bayes, Support Vector Machines (SVM), k-NN(k-Nearest Neighbor) Ensemble Learning and Neural Network algorithms; were incorporated to LIBS data of paracetamol based pharmaceutical samples, and their different concentrations on preprocessed and raw dataset in order to observe the effect of preprocessing.

Keywords: machine learning, laser-induced breakdown spectroscopy, medicines, principal component analysis, preprocessing

Procedia PDF Downloads 65
147 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 268
146 An Overbooking Model for Car Rental Service with Different Types of Cars

Authors: Naragain Phumchusri, Kittitach Pongpairoj

Abstract:

Overbooking is a very useful revenue management technique that could help reduce costs caused by either undersales or oversales. In this paper, we propose an overbooking model for two types of cars that can minimize the total cost for car rental service. With two types of cars, there is an upgrade possibility for lower type to upper type. This makes the model more complex than one type of cars scenario. We have found that convexity can be proved in this case. Sensitivity analysis of the parameters is conducted to observe the effects of relevant parameters on the optimal solution. Model simplification is proposed using multiple linear regression analysis, which can help estimate the optimal overbooking level using appropriate independent variables. The results show that the overbooking level from multiple linear regression model is relatively close to the optimal solution (with the adjusted R-squared value of at least 72.8%). To evaluate the performance of the proposed model, the total cost was compared with the case where the decision maker uses a naïve method for the overbooking level. It was found that the total cost from optimal solution is only 0.5 to 1 percent (on average) lower than the cost from regression model, while it is approximately 67% lower than the cost obtained by the naïve method. It indicates that our proposed simplification method using regression analysis can effectively perform in estimating the overbooking level.

Keywords: overbooking, car rental industry, revenue management, stochastic model

Procedia PDF Downloads 146
145 Analyzing Impacts of Road Network on Vegetation Using Geographic Information System and Remote Sensing Techniques

Authors: Elizabeth Malebogo Mosepele

Abstract:

Road transport has become increasingly common in the world; people rely on road networks for transportation purpose on a daily basis. However, environmental impact of roads on surrounding landscapes extends their potential effects even further. This study investigates the impact of road network on natural vegetation. The study will provide baseline knowledge regarding roadside vegetation and would be helpful in future for conservation of biodiversity along the road verges and improvements of road verges. The general hypothesis of this study is that the amount and condition of road side vegetation could be explained by road network conditions. Remote sensing techniques were used to analyze vegetation conditions. Landsat 8 OLI image was used to assess vegetation cover condition. NDVI image was generated and used as a base from which land cover classes were extracted, comprising four categories viz. healthy vegetation, degraded vegetation, bare surface, and water. The classification of the image was achieved using the supervised classification technique. Road networks were digitized from Google Earth. For observed data, transect based quadrats of 50*50 m were conducted next to road segments for vegetation assessment. Vegetation condition was related to road network, with the multinomial logistic regression confirming a significant relationship between vegetation condition and road network. The null hypothesis formulated was that 'there is no variation in vegetation condition as we move away from the road.' Analysis of vegetation condition revealed degraded vegetation within close proximity of a road segment and healthy vegetation as the distance increase away from the road. The Chi Squared value was compared with critical value of 3.84, at the significance level of 0.05 to determine the significance of relationship. Given that the Chi squared value was 395, 5004, the null hypothesis was therefore rejected; there is significant variation in vegetation the distance increases away from the road. The conclusion is that the road network plays an important role in the condition of vegetation.

Keywords: Chi squared, geographic information system, multinomial logistic regression, remote sensing, road side vegetation

Procedia PDF Downloads 399
144 Smallholder Participation in Organized Retail Markets: Evidence from India

Authors: Kedar Vishnu, Parmod Kumar

Abstract:

India is becoming most favored retail destination in the world. The organized retail has presented many opportunities to farmers to increase income by shifting cropping pattern from food grains to commercial crops. Previous research revealed potential benefits for farmers by supplying fruits and vegetables to organized retail channels. However the supply of fruits and vegetables from small and marginal farmers remain low than expected. The main objective of this paper is to identify the factors determining market participation of smallholder farmers in modern organized retail chains. Attempt is also made to find out factors influencing the choice of participation in particular organized retail collection centers as compared to other organized retail. The paper was based on primary survey of 40 Beans and Tomato farmers who supply to organized retail collection centers from Karnataka, India. Multiple regression technique is used to identify the factors determining quantity sold at collection centers. The regression result, show that area under vegetables, yield, and price from modern collection center and having access to technical help were found significantly affecting quantity sold into modern organized retail channels. On the opposite, increased rejection rates and vegetable prices at APMC were found influencing farmers decision into the reverse side. Empirical result of the multinomial logit model show that Reliance fresh has tendency to prefer large farmers who can supply more quality and better quantity compared with TESCO and More collection centers. The negative sign of area, having access to technical help, transportation cost, and number of bore wells led to higher probability of farmers to participate in Reliance Fresh collection centers as compared with More and TESCO.

Keywords: fruits, vegetables, organized retail markets, multinomial logit model

Procedia PDF Downloads 320
143 Indigenous Adaptation Strategies for Climate Change: Small Farmers’ Options for Sustainable Crop Farming in South-Western Nigeria

Authors: Emmanuel Olasope Bamigboye, Ismail Oladeji Oladosu

Abstract:

Local people of south-western Nigeria like in other climes, continue to be confronted with the vagaries of changing environments. Through the modification of existing practice and shifting resource base, their strategies for coping with change have enabled them to successfully negotiate the shifts in climate change and the environment. This article analyses indigenous adaptation strategies for climate change with a view to enhancing sustainable crop farming in south –western Nigeria. Multi-stage sampling procedure was used to select 340 respondents from the two major ecological zones (Forest and Derived Savannah) for good geographical spread. The article draws on mixed methods of qualitative research, literature review, field observations, informal interview and multinomial logit regression to capture choice probabilities across the various options of climate change adaptation options among arable crop farmers. The study revealed that most 85.0% of the arable crop farmers were males. It also showed that the use of local climate change adaptation strategies had no relationship with the educational level of the respondents as 77.3% had educational experiences at varying levels. Furthermore, the findings showed that seven local adaptation strategies were commonly utilized by arable crop farmers. Nonetheless, crop diversification, consultation with rainmakers and involvement in non-agricultural ventures were prioritized in the order of 1-3, respectively. Also, multinomial logit analysis result showed that at p ≤ 0.05 level of significance, household size (P<0.08), sex (p<0.06), access to loan(p<0.16), age(p<0.07), educational level (P<0.17) and functional extension contact (P<0.28) were all important in explaining the indigenous climate change adaptation utilized by the arable crops farmers in south-western Nigeria. The study concluded that all the identified local adaptation strategies need to be integrated into the development process for sustainable climate change adaptation.

Keywords: crop diversification, climate change, adaptation option, sustainable, small farmers

Procedia PDF Downloads 271
142 Study of Circulatory MiR-122 and MiR-130a Expression among Chronic Hepatitis C Egyptian Patients

Authors: Hend K. Moosa, Eman A. Rashwan, Ezzat M. Hassan, Amany A. Ghazy, Amel G. Sheredy

Abstract:

The stability of microRNA (miR) in the circulation can show a great progress toward the discovery of non-invasive diagnostic and prognostic biomarkers in many diseases. In the present study, circulatory miR-122 and miR-130a were analysed in chronic hepatitis C Egyptian patients in predicting the clinical outcome of interferon treatment. In addition, their expression levels were correlated to viral RNA levels, necro-inflammatory markers (AST, ALT) and to each other. This study was conducted on 51 subjects where 36 were chronic HCV patients in which they were divided into naive and interferon treated HCV patients (responders and non-responders) and 15 matched healthy controls. Serum quantification of miR-122 and miR-130a were performed by quantitative Real-time Polymerase Chain Reaction (qRT-PCR). The results showed a significant upregulation of miR-122 in non-responder patients (P=0.049). By receiver operating characteristic analysis curve, miR-122 revealed 65% sensitivity and 92.3% specificity in predicting non-responsiveness of patients to IFN treatment, while miR-130a showed a sensitivity of 100% and specificity of 53.85%. Remarkably, there was a significant positive correlation between miR-122 and miR-130a in naive HCV patients (r=0.714, p=0.003). However, there was no significant correlation between serum miR-122, miR-130a expression levels and necro-inflammatory markers (AST, ALT). To conclude, miR-122 and miR-130a have a significant association with viral RNA levels and accordingly, they may have a synergistic power in promoting viral replication. Interestingly, miR-122 and miR-130a have a predictive power in predicting clinical outcome of IFN treatment which can be further studied in currently used drugs in order to reduce the socio-economic burden of potentially non-responders.

Keywords: hepatitis C, microRNA, miR-122, miR-130a

Procedia PDF Downloads 140
141 Interpretation of the Russia-Ukraine 2022 War via N-Gram Analysis

Authors: Elcin Timur Cakmak, Ayse Oguzlar

Abstract:

This study presents the results of the tweets sent by Twitter users on social media about the Russia-Ukraine war by bigram and trigram methods. On February 24, 2022, Russian President Vladimir Putin declared a military operation against Ukraine, and all eyes were turned to this war. Many people living in Russia and Ukraine reacted to this war and protested and also expressed their deep concern about this war as they felt the safety of their families and their futures were at stake. Most people, especially those living in Russia and Ukraine, express their views on the war in different ways. The most popular way to do this is through social media. Many people prefer to convey their feelings using Twitter, one of the most frequently used social media tools. Since the beginning of the war, it is seen that there have been thousands of tweets about the war from many countries of the world on Twitter. These tweets accumulated in data sources are extracted using various codes for analysis through Twitter API and analysed by Python programming language. The aim of the study is to find the word sequences in these tweets by the n-gram method, which is known for its widespread use in computational linguistics and natural language processing. The tweet language used in the study is English. The data set consists of the data obtained from Twitter between February 24, 2022, and April 24, 2022. The tweets obtained from Twitter using the #ukraine, #russia, #war, #putin, #zelensky hashtags together were captured as raw data, and the remaining tweets were included in the analysis stage after they were cleaned through the preprocessing stage. In the data analysis part, the sentiments are found to present what people send as a message about the war on Twitter. Regarding this, negative messages make up the majority of all the tweets as a ratio of %63,6. Furthermore, the most frequently used bigram and trigram word groups are found. Regarding the results, the most frequently used word groups are “he, is”, “I, do”, “I, am” for bigrams. Also, the most frequently used word groups are “I, do, not”, “I, am, not”, “I, can, not” for trigrams. In the machine learning phase, the accuracy of classifications is measured by Classification and Regression Trees (CART) and Naïve Bayes (NB) algorithms. The algorithms are used separately for bigrams and trigrams. We gained the highest accuracy and F-measure values by the NB algorithm and the highest precision and recall values by the CART algorithm for bigrams. On the other hand, the highest values for accuracy, precision, and F-measure values are achieved by the CART algorithm, and the highest value for the recall is gained by NB for trigrams.

Keywords: classification algorithms, machine learning, sentiment analysis, Twitter

Procedia PDF Downloads 52
140 Theta-Phase Gamma-Amplitude Coupling as a Neurophysiological Marker in Neuroleptic-Naive Schizophrenia

Authors: Jun Won Kim

Abstract:

Objective: Theta-phase gamma-amplitude coupling (TGC) was used as a novel evidence-based tool to reflect the dysfunctional cortico-thalamic interaction in patients with schizophrenia. However, to our best knowledge, no studies have reported the diagnostic utility of the TGC in the resting-state electroencephalographic (EEG) of neuroleptic-naive patients with schizophrenia compared to healthy controls. Thus, the purpose of this EEG study was to understand the underlying mechanisms in patients with schizophrenia by comparing the TGC at rest between two groups and to evaluate the diagnostic utility of TGC. Method: The subjects included 90 patients with schizophrenia and 90 healthy controls. All patients were diagnosed with schizophrenia according to the criteria of Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) by two independent psychiatrists using semi-structured clinical interviews. Because patients were either drug-naïve (first episode) or had not been taking psychoactive drugs for one month before the study, we could exclude the influence of medications. Five frequency bands were defined for spectral analyses: delta (1–4 Hz), theta (4–8 Hz), slow alpha (8–10 Hz), fast alpha (10–13.5 Hz), beta (13.5–30 Hz), and gamma (30-80 Hz). The spectral power of the EEG data was calculated with fast Fourier Transformation using the 'spectrogram.m' function of the signal processing toolbox in Matlab. An analysis of covariance (ANCOVA) was performed to compare the TGC results between the groups, which were adjusted using a Bonferroni correction (P < 0.05/19 = 0.0026). Receiver operator characteristic (ROC) analysis was conducted to examine the discriminating ability of the TGC data for schizophrenia diagnosis. Results: The patients with schizophrenia showed a significant increase in the resting-state TGC at all electrodes. The delta, theta, slow alpha, fast alpha, and beta powers showed low accuracies of 62.2%, 58.4%, 56.9%, 60.9%, and 59.0%, respectively, in discriminating the patients with schizophrenia from the healthy controls. The ROC analysis performed on the TGC data generated the most accurate result among the EEG measures, displaying an overall classification accuracy of 92.5%. Conclusion: As TGC includes phase, which contains information about neuronal interactions from the EEG recording, TGC is expected to be useful for understanding the mechanisms the dysfunctional cortico-thalamic interaction in patients with schizophrenia. The resting-state TGC value was increased in the patients with schizophrenia compared to that in the healthy controls and had a higher discriminating ability than the other parameters. These findings may be related to the compensatory hyper-arousal patterns of the dysfunctional default-mode network (DMN) in schizophrenia. Further research exploring the association between TGC and medical or psychiatric conditions that may confound EEG signals will help clarify the potential utility of TGC.

Keywords: quantitative electroencephalography (QEEG), theta-phase gamma-amplitude coupling (TGC), schizophrenia, diagnostic utility

Procedia PDF Downloads 114
139 Transcriptional Differences in B cell Subpopulations over the Course of Preclinical Autoimmunity Development

Authors: Aleksandra Bylinska, Samantha Slight-Webb, Kevin Thomas, Miles Smith, Susan Macwana, Nicolas Dominguez, Eliza Chakravarty, Joan T. Merrill, Judith A. James, Joel M. Guthridge

Abstract:

Background: Systemic Lupus Erythematosus (SLE) is an interferon-related autoimmune disease characterized by B cell dysfunction. One of the main hallmarks is a loss of tolerance to self-antigens leading to increased levels of autoantibodies against nuclear components (ANAs). However, up to 20% of healthy ANA+ individuals will not develop clinical illness. SLE is more prevalent among women and minority populations (African, Asian American and Hispanics). Moreover, African Americans have a stronger interferon (IFN) signature and develop more severe symptoms. The exact mechanisms involved in ethnicity-dependent B cell dysregulation and the progression of autoimmune disease from ANA+ healthy individuals to clinical disease remains unclear. Methods: Peripheral blood mononuclear cells (PBMCs) from African (AA) and European American (EA) ANA- (n=12), ANA+ (n=12) and SLE (n=12) individuals were assessed by multimodal scRNA-Seq/CITE-Seq methods to examine differential gene signatures in specific B cell subsets. Library preparation was done with a 10X Genomics Chromium according to established protocols and sequenced on Illumina NextSeq. The data were further analyzed for distinct cluster identification and differential gene signatures in the Seurat package in R and pathways analysis was performed using Ingenuity Pathways Analysis (IPA). Results: Comparing all subjects, 14 distinct B cell clusters were identified using a community detection algorithm and visualized with Uniform Manifold Approximation Projection (UMAP). The proportion of each of those clusters varied by disease status and ethnicity. Transitional B cells trended higher in ANA+ healthy individuals, especially in AA. Ribonucleoprotein high population (HNRNPH1 elevated, heterogeneous nuclear ribonucleoprotein, RNP-Hi) of proliferating Naïve B cells were more prevalent in SLE patients, specifically in EA. Interferon-induced protein high population (IFIT-Hi) of Naive B cells are increased in EA ANA- individuals. The proportion of memory B cells and plasma cells clusters tend to be expanded in SLE patients. As anticipated, we observed a higher signature of cytokine-related pathways, especially interferon, in SLE individuals. Pathway analysis among AA individuals revealed an NRF2-mediated Oxidative Stress response signature in the transitional B cell cluster, not seen in EA individuals. TNFR1/2 and Sirtuin Signaling pathway genes were higher in AA IFIT-Hi Naive B cells, whereas they were not detected in EA individuals. Interferon signaling was observed in B cells in both ethnicities. Oxidative phosphorylation was found in age-related B cells (ABCs) for both ethnicities, whereas Death Receptor Signaling was found only in EA patients in these cells. Interferon-related transcription factors were elevated in ABCs and IFIT-Hi Naive B cells in SLE subjects of both ethnicities. Conclusions: ANA+ healthy individuals have altered gene expression pathways in B cells that might drive apoptosis and subsequent clinical autoimmune pathogenesis. Increases in certain regulatory pathways may delay progression to SLE. Further, AA individuals have more elevated activation pathways that may make them more susceptible to SLE.

Keywords:

Procedia PDF Downloads 150
138 Data-Driven Surrogate Models for Damage Prediction of Steel Liquid Storage Tanks under Seismic Hazard

Authors: Laura Micheli, Majd Hijazi, Mahmoud Faytarouni

Abstract:

The damage reported by oil and gas industrial facilities revealed the utmost vulnerability of steel liquid storage tanks to seismic events. The failure of steel storage tanks may yield devastating and long-lasting consequences on built and natural environments, including the release of hazardous substances, uncontrolled fires, and soil contamination with hazardous materials. It is, therefore, fundamental to reliably predict the damage that steel liquid storage tanks will likely experience under future seismic hazard events. The seismic performance of steel liquid storage tanks is usually assessed using vulnerability curves obtained from the numerical simulation of a tank under different hazard scenarios. However, the computational demand of high-fidelity numerical simulation models, such as finite element models, makes the vulnerability assessment of liquid storage tanks time-consuming and often impractical. As a solution, this paper presents a surrogate model-based strategy for predicting seismic-induced damage in steel liquid storage tanks. In the proposed strategy, the surrogate model is leveraged to reduce the computational demand of time-consuming numerical simulations. To create the data set for training the surrogate model, field damage data from past earthquakes reconnaissance surveys and reports are collected. Features representative of steel liquid storage tank characteristics (e.g., diameter, height, liquid level, yielding stress) and seismic excitation parameters (e.g., peak ground acceleration, magnitude) are extracted from the field damage data. The collected data are then utilized to train a surrogate model that maps the relationship between tank characteristics, seismic hazard parameters, and seismic-induced damage via a data-driven surrogate model. Different types of surrogate algorithms, including naïve Bayes, k-nearest neighbors, decision tree, and random forest, are investigated, and results in terms of accuracy are reported. The model that yields the most accurate predictions is employed to predict future damage as a function of tank characteristics and seismic hazard intensity level. Results show that the proposed approach can be used to estimate the extent of damage in steel liquid storage tanks, where the use of data-driven surrogates represents a viable alternative to computationally expensive numerical simulation models.

Keywords: damage prediction , data-driven model, seismic performance, steel liquid storage tanks, surrogate model

Procedia PDF Downloads 122