Search results for: stepwise regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3138

Search results for: stepwise regression

3048 Predicting Survival in Cancer: How Cox Regression Model Compares to Artifial Neural Networks?

Authors: Dalia Rimawi, Walid Salameh, Amal Al-Omari, Hadeel AbdelKhaleq

Abstract:

Predication of Survival time of patients with cancer, is a core factor that influences oncologist decisions in different aspects; such as offered treatment plans, patients’ quality of life and medications development. For a long time proportional hazards Cox regression (ph. Cox) was and still the most well-known statistical method to predict survival outcome. But due to the revolution of data sciences; new predication models were employed and proved to be more flexible and provided higher accuracy in that type of studies. Artificial neural network is one of those models that is suitable to handle time to event predication. In this study we aim to compare ph Cox regression with artificial neural network method according to data handling and Accuracy of each model.

Keywords: Cox regression, neural networks, survival, cancer.

Procedia PDF Downloads 158
3047 Survival and Hazard Maximum Likelihood Estimator with Covariate Based on Right Censored Data of Weibull Distribution

Authors: Al Omari Mohammed Ahmed

Abstract:

This paper focuses on Maximum Likelihood Estimator with Covariate. Covariates are incorporated into the Weibull model. Under this regression model with regards to maximum likelihood estimator, the parameters of the covariate, shape parameter, survival function and hazard rate of the Weibull regression distribution with right censored data are estimated. The mean square error (MSE) and absolute bias are used to compare the performance of Weibull regression distribution. For the simulation comparison, the study used various sample sizes and several specific values of the Weibull shape parameter.

Keywords: weibull regression distribution, maximum likelihood estimator, survival function, hazard rate, right censoring

Procedia PDF Downloads 412
3046 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins

Authors: Navab Karimi, Tohid Alizadeh

Abstract:

An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.

Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.

Procedia PDF Downloads 44
3045 Analysis of Factors Affecting the Number of Infant and Maternal Mortality in East Java with Geographically Weighted Bivariate Generalized Poisson Regression Method

Authors: Luh Eka Suryani, Purhadi

Abstract:

Poisson regression is a non-linear regression model with response variable in the form of count data that follows Poisson distribution. Modeling for a pair of count data that show high correlation can be analyzed by Poisson Bivariate Regression. Data, the number of infant mortality and maternal mortality, are count data that can be analyzed by Poisson Bivariate Regression. The Poisson regression assumption is an equidispersion where the mean and variance values are equal. However, the actual count data has a variance value which can be greater or less than the mean value (overdispersion and underdispersion). Violations of this assumption can be overcome by applying Generalized Poisson Regression. Characteristics of each regency can affect the number of cases occurred. This issue can be overcome by spatial analysis called geographically weighted regression. This study analyzes the number of infant mortality and maternal mortality based on conditions in East Java in 2016 using Geographically Weighted Bivariate Generalized Poisson Regression (GWBGPR) method. Modeling is done with adaptive bisquare Kernel weighting which produces 3 regency groups based on infant mortality rate and 5 regency groups based on maternal mortality rate. Variables that significantly influence the number of infant and maternal mortality are the percentages of pregnant women visit health workers at least 4 times during pregnancy, pregnant women get Fe3 tablets, obstetric complication handled, clean household and healthy behavior, and married women with the first marriage age under 18 years.

Keywords: adaptive bisquare kernel, GWBGPR, infant mortality, maternal mortality, overdispersion

Procedia PDF Downloads 131
3044 Determining the Causality Variables in Female Genital Mutilation: A Factor Screening Approach

Authors: Ekele Alih, Enejo Jalija

Abstract:

Female Genital Mutilation (FGM) is made up of three types namely: Clitoridectomy, Excision and Infibulation. In this study, we examine the factors responsible for FGM in order to identify the causality variables in a logistic regression approach. From the result of the survey conducted by the Public Health Division, Nigeria Institute of Medical Research, Yaba, Lagos State, the tau statistic, τ was used to screen 9 factors that causes FGM in order to select few of the predictors before multiple regression equation is obtained. The need for this may be that the sample size may not be able to sustain having a regression with all the predictors or to avoid multi-collinearity. A total of 300 respondents, comprising 150 adult males and 150 adult females were selected for the household survey based on the multi-stage sampling procedure. The tau statistic,

Keywords: female genital mutilation, logistic regression, tau statistic, African society

Procedia PDF Downloads 227
3043 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation

Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen

Abstract:

Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.

Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning

Procedia PDF Downloads 36
3042 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate

Procedia PDF Downloads 160
3041 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 113
3040 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 332
3039 A Comparison of Neural Network and DOE-Regression Analysis for Predicting Resource Consumption of Manufacturing Processes

Authors: Frank Kuebler, Rolf Steinhilper

Abstract:

Artificial neural networks (ANN) as well as Design of Experiments (DOE) based regression analysis (RA) are mainly used for modeling of complex systems. Both methodologies are commonly applied in process and quality control of manufacturing processes. Due to the fact that resource efficiency has become a critical concern for manufacturing companies, these models needs to be extended to predict resource-consumption of manufacturing processes. This paper describes an approach to use neural networks as well as DOE based regression analysis for predicting resource consumption of manufacturing processes and gives a comparison of the achievable results based on an industrial case study of a turning process.

Keywords: artificial neural network, design of experiments, regression analysis, resource efficiency, manufacturing process

Procedia PDF Downloads 492
3038 Logistic Regression Model versus Additive Model for Recurrent Event Data

Authors: Entisar A. Elgmati

Abstract:

Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.

Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event

Procedia PDF Downloads 604
3037 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data

Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone

Abstract:

This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.

Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression

Procedia PDF Downloads 96
3036 An Analysis of the Effect of Sharia Financing and Work Relation Founding towards Non-Performing Financing in Islamic Banks in Indonesia

Authors: Muhammad Bahrul Ilmi

Abstract:

The purpose of this research is to analyze the influence of Islamic financing and work relation founding simultaneously and partially towards non-performing financing in Islamic banks. This research was regression quantitative field research, and had been done in Muammalat Indonesia Bank and Islamic Danamon Bank in 3 months. The populations of this research were 15 account officers of Muammalat Indonesia Bank and Islamic Danamon Bank in Surakarta, Indonesia. The techniques of collecting data used in this research were documentation, questionnaire, literary study and interview. Regression analysis result shows that Islamic financing and work relation founding simultaneously has positive and significant effect towards non performing financing of two Islamic Banks. It is obtained with probability value 0.003 which is less than 0.05 and F value 9.584. The analysis result of Islamic financing regression towards non performing financing shows the significant effect. It is supported by double linear regression analysis with probability value 0.001 which is less than 0.05. The regression analysis of work relation founding effect towards non-performing financing shows insignificant effect. This is shown in the double linear regression analysis with probability value 0.161 which is bigger than 0.05.

Keywords: Syariah financing, work relation founding, non-performing financing (NPF), Islamic Bank

Procedia PDF Downloads 404
3035 A Kolmogorov-Smirnov Type Goodness-Of-Fit Test of Multinomial Logistic Regression Model in Case-Control Studies

Authors: Chen Li-Ching

Abstract:

The multinomial logistic regression model is used popularly for inferring the relationship of risk factors and disease with multiple categories. This study based on the discrepancy between the nonparametric maximum likelihood estimator and semiparametric maximum likelihood estimator of the cumulative distribution function to propose a Kolmogorov-Smirnov type test statistic to assess adequacy of the multinomial logistic regression model for case-control data. A bootstrap procedure is presented to calculate the critical value of the proposed test statistic. Empirical type I error rates and powers of the test are performed by simulation studies. Some examples will be illustrated the implementation of the test.

Keywords: case-control studies, goodness-of-fit test, Kolmogorov-Smirnov test, multinomial logistic regression

Procedia PDF Downloads 427
3034 Investigating the Effects of Psychological and Socio-Cultural Factors on the Tendency of Villagers to Use E-Banking Services: Case Study of Agricultural Bank Branches in Ilam

Authors: Nahid Ehsani, Amir Hossein Rezvanfar

Abstract:

The main objective of this study is to investigate psychological and socio-cultural factors effective on the tendency of the villagers to use e-banking services. The current paper is an applied study considering its objectives. The main data gathering tool in the current study is a made questionnaire which is designed and executed based on the conceptual background of the subject matter and the objectives and hypotheses of the study. The statistical population of this study includes all the customers of rural branches of Agricultural Bank in Ilam Province (N=82885). Among these 120 participants were chosen through sample size determination formula and they were studied using stratified random sampling method. In the analytical statistics level the results obtained from calculating Spearman’s Correlative Coefficient showed that socio-cultural and psychological factors had a significant impact of the extent of the tendency of the villagers to use e-banking services of the Agricultural Bank at the 99% level. Furthermore, stepwise multiple regression analysis showed that both sets of psychological factors as well as socio-economic factors were able to explain 50 percent of the variance of the independent variable; namely the tendency of villagers to use e-banking services.

Keywords: e-banking, agricultural bank, tendency, socio-economic factors, psychological factors

Procedia PDF Downloads 508
3033 A Study on Inference from Distance Variables in Hedonic Regression

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban area, several landmarks may affect housing price and rents, hedonic analysis should employ distance variables corresponding to each landmarks. Unfortunately, the effects of distances to landmarks on housing prices are generally not consistent with the true price. These distance variables may cause magnitude error in regression, pointing a problem of spatial multicollinearity. In this paper, we provided some approaches for getting the samples with less bias and method on locating the specific sampling area to avoid the multicollinerity problem in two specific landmarks case.

Keywords: landmarks, hedonic regression, distance variables, collinearity, multicollinerity

Procedia PDF Downloads 425
3032 Forecasting of Grape Juice Flavor by Using Support Vector Regression

Authors: Ren-Jieh Kuo, Chun-Shou Huang

Abstract:

The research of juice flavor forecasting has become more important in China. Due to the fast economic growth in China, many different kinds of juices have been introduced to the market. If a beverage company can understand their customers’ preference well, the juice can be served more attractively. Thus, this study intends to introduce the basic theory and computing process of grapes juice flavor forecasting based on support vector regression (SVR). Applying SVR, BPN and LR to forecast the flavor of grapes juice in real data, the result shows that SVR is more suitable and effective at predicting performance.

Keywords: flavor forecasting, artificial neural networks, Support Vector Regression, China

Procedia PDF Downloads 452
3031 Estimation of Coefficients of Ridge and Principal Components Regressions with Multicollinear Data

Authors: Rajeshwar Singh

Abstract:

The presence of multicollinearity is common in handling with several explanatory variables simultaneously due to exhibiting a linear relationship among them. A great problem arises in understanding the impact of explanatory variables on the dependent variable. Thus, the method of least squares estimation gives inexact estimates. In this case, it is advised to detect its presence first before proceeding further. Using the ridge regression degree of its occurrence is reduced but principal components regression gives good estimates in this situation. This paper discusses well-known techniques of the ridge and principal components regressions and applies to get the estimates of coefficients by both techniques. In addition to it, this paper also discusses the conflicting claim on the discovery of the method of ridge regression based on available documents.

Keywords: conflicting claim on credit of discovery of ridge regression, multicollinearity, principal components and ridge regressions, variance inflation factor

Procedia PDF Downloads 375
3030 Anxiety and Self-Perceived L2 Proficiency: A Comparison of Which Can Better Predict L2 Pronunciation Performance

Authors: Jiexuan Lin, Huiyi Chen

Abstract:

The development of L2 pronunciation competence remains understudied in the literature and it is not clear what may influence learners’ development of L2 pronunciation. The present study was an attempt to find out which of the two common factors in L2 acquisition, i.e., foreign language anxiety or self-perceived L2 proficiency, can better predict Chinese EFL learners’ pronunciation performance. 78 first-year English majors, who had received a three-month pronunciation training course, were asked to 1) fill out a questionnaire on foreign language classroom anxiety, 2) self-report their L2 proficiency in general, in speaking and in pronunciation, and 3) complete an oral and a written test on their L2 pronunciation (the score of the oral part indicates participants’ pronunciation proficiency in oral production, and the score of the written part indexes participants’ ability in applying pronunciation knowledge in comprehension.) Results showed that the pronunciation scores were negatively correlated with the anxiety scores, and were positively correlated with the self-perceived pronunciation proficiency. But only the written scores in the L2 pronunciation test, not the oral scores, were positively correlated with the L2 self-perceived general proficiency. Neither the oral nor the written scores in the L2 pronunciation test had a significant correlation with the self-perceived speaking proficiency. Given the fairly strong correlations, the anxiety scores and the self-perceived pronunciation proficiency were put in regression models to predict L2 pronunciation performance. The anxiety factor alone accounted for 13.9% of the variance and the self-perceived pronunciation proficiency alone explained 12.1% of the variance. But when both anxiety scores and self-perceived pronunciation proficiency were put in a stepwise regression model, only the anxiety scores had a significant and unique contribution to the L2 pronunciation performance (4.8%). Taken together, the results suggested that the learners’ anxiety level could better predict their L2 pronunciation performance, compared with the self-perceived proficiency levels. The obtained data have the following pedagogical implications. 1) Given the fairly strong correlation between anxiety and L2 pronunciation performance, the instructors who are interested in predicting learners’ L2 pronunciation proficiency may measure their anxiety level, instead of their proficiency, as the predicting variable. 2) The correlation of oral scores (in the pronunciation test) with pronunciation proficiency, rather than with speaking proficiency, indicates that a) learners after receiving some amounts of training are to some extent able to evaluate their own pronunciation ability, implying the feasibility of incorporating self-evaluation and peer comments in course instruction; b) the ‘proficiency’ measure used to predict pronunciation performance should be used with caution. The proficiency of specific skills seemingly highly related to pronunciation (i.e., speaking in this case) may not be taken for granted as an effective predictor for pronunciation performance. 3) The correlation between the written scores with general L2 proficiency is interesting.

Keywords: anxiety, Chinese EFL learners, L2 pronunciation, self-perceived L2 proficiency

Procedia PDF Downloads 332
3029 Genetic and Non-Genetic Factors Affecting the Response to Clopidogrel Therapy

Authors: Snezana Mugosa, Zoran Todorovic, Zoran Bukumiric, Ivan Radosavljevic, Natasa Djordjevic

Abstract:

Introduction: Various studies have shown that the frequency of clopidogrel resistance ranges from 4-40%. The aim of this study was to provide in depth analysis of genetic and non-genetic factors that influence clopidogrel resistance in cardiology patients. Methods: We have conducted a prospective study in 200 hospitalized patients hospitalized at Cardiology Centre of the Clinical Centre of Montenegro. CYP2C19 genetic testing was conducted, and the PREDICT score was calculated in 102 out of 200 patients treated with clopidogrel in order to determine the influence of genetic and non-genetic factors on outcomes of interest. Adverse cardiovascular events and adverse reactions to clopidogrel were assessed during 12 months follow up period. Results: PREDICT score and CYP2C19 enzymatic activity were found to be statistically significant predictors of expressing lack of therapeutic efficacy of clopidogrel by multivariate logistic regression, without multicollinearity or interaction between the predictors (p = 0.002 and 0.009, respectively). Conclusions: Pharmacogenetics analyses that were done in the Montenegrin population of patients for the first time suggest that these analyses can predict patient response to the certain therapy. Stepwise approach could be used in assessing the clopidogrel resistance in cardiology patients, combining the PREDICT score, platelet aggregation test, and genetic testing for CYP2C19 polymorphism.

Keywords: clopidogrel, pharmacogenetics, pharmacotherapy, PREDICT score

Procedia PDF Downloads 324
3028 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing

Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva

Abstract:

In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.

Keywords: power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity

Procedia PDF Downloads 287
3027 Incorporating Anomaly Detection in a Digital Twin Scenario Using Symbolic Regression

Authors: Manuel Alves, Angelica Reis, Armindo Lobo, Valdemar Leiras

Abstract:

In industry 4.0, it is common to have a lot of sensor data. In this deluge of data, hints of possible problems are difficult to spot. The digital twin concept aims to help answer this problem, but it is mainly used as a monitoring tool to handle the visualisation of data. Failure detection is of paramount importance in any industry, and it consumes a lot of resources. Any improvement in this regard is of tangible value to the organisation. The aim of this paper is to add the ability to forecast test failures, curtailing detection times. To achieve this, several anomaly detection algorithms were compared with a symbolic regression approach. To this end, Isolation Forest, One-Class SVM and an auto-encoder have been explored. For the symbolic regression PySR library was used. The first results show that this approach is valid and can be added to the tools available in this context as a low resource anomaly detection method since, after training, the only requirement is the calculation of a polynomial, a useful feature in the digital twin context.

Keywords: anomaly detection, digital twin, industry 4.0, symbolic regression

Procedia PDF Downloads 89
3026 Impact of Infrastructural Development on Socio-Economic Growth: An Empirical Investigation in India

Authors: Jonardan Koner

Abstract:

The study attempts to find out the impact of infrastructural investment on state economic growth in India. It further tries to determine the magnitude of the impact of infrastructural investment on economic indicator, i.e., per-capita income (PCI) in Indian States. The study uses panel regression technique to measure the impact of infrastructural investment on per-capita income (PCI) in Indian States. Panel regression technique helps incorporate both the cross-section and time-series aspects of the dataset. In order to analyze the difference in impact of the explanatory variables on the explained variables across states, the study uses Fixed Effect Panel Regression Model. The conclusions of the study are that infrastructural investment has a desirable impact on economic development and that the impact is different for different states in India. We analyze time series data (annual frequency) ranging from 1991 to 2010. The study reveals that the infrastructural investment significantly explains the variation of economic indicators.

Keywords: infrastructural investment, multiple regression, panel regression techniques, economic development, fixed effect dummy variable model

Procedia PDF Downloads 345
3025 A Quadratic Model to Early Predict the Blastocyst Stage with a Time Lapse Incubator

Authors: Cecile Edel, Sandrine Giscard D'Estaing, Elsa Labrune, Jacqueline Lornage, Mehdi Benchaib

Abstract:

Introduction: The use of incubator equipped with time-lapse technology in Artificial Reproductive Technology (ART) allows a continuous surveillance. With morphocinetic parameters, algorithms are available to predict the potential outcome of an embryo. However, the different proposed time-lapse algorithms do not take account the missing data, and then some embryos could not be classified. The aim of this work is to construct a predictive model even in the case of missing data. Materials and methods: Patients: A retrospective study was performed, in biology laboratory of reproduction at the hospital ‘Femme Mère Enfant’ (Lyon, France) between 1 May 2013 and 30 April 2015. Embryos (n= 557) obtained from couples (n=108) were cultured in a time-lapse incubator (Embryoscope®, Vitrolife, Goteborg, Sweden). Time-lapse incubator: The morphocinetic parameters obtained during the three first days of embryo life were used to build the predictive model. Predictive model: A quadratic regression was performed between the number of cells and time. N = a. T² + b. T + c. N: number of cells at T time (T in hours). The regression coefficients were calculated with Excel software (Microsoft, Redmond, WA, USA), a program with Visual Basic for Application (VBA) (Microsoft) was written for this purpose. The quadratic equation was used to find a value that allows to predict the blastocyst formation: the synthetize value. The area under the curve (AUC) obtained from the ROC curve was used to appreciate the performance of the regression coefficients and the synthetize value. A cut-off value has been calculated for each regression coefficient and for the synthetize value to obtain two groups where the difference of blastocyst formation rate according to the cut-off values was maximal. The data were analyzed with SPSS (IBM, Il, Chicago, USA). Results: Among the 557 embryos, 79.7% had reached the blastocyst stage. The synthetize value corresponds to the value calculated with time value equal to 99, the highest AUC was then obtained. The AUC for regression coefficient ‘a’ was 0.648 (p < 0.001), 0.363 (p < 0.001) for the regression coefficient ‘b’, 0.633 (p < 0.001) for the regression coefficient ‘c’, and 0.659 (p < 0.001) for the synthetize value. The results are presented as follow: blastocyst formation rate under cut-off value versus blastocyst rate formation above cut-off value. For the regression coefficient ‘a’ the optimum cut-off value was -1.14.10-3 (61.3% versus 84.3%, p < 0.001), 0.26 for the regression coefficient ‘b’ (83.9% versus 63.1%, p < 0.001), -4.4 for the regression coefficient ‘c’ (62.2% versus 83.1%, p < 0.001) and 8.89 for the synthetize value (58.6% versus 85.0%, p < 0.001). Conclusion: This quadratic regression allows to predict the outcome of an embryo even in case of missing data. Three regression coefficients and a synthetize value could represent the identity card of an embryo. ‘a’ regression coefficient represents the acceleration of cells division, ‘b’ regression coefficient represents the speed of cell division. We could hypothesize that ‘c’ regression coefficient could represent the intrinsic potential of an embryo. This intrinsic potential could be dependent from oocyte originating the embryo. These hypotheses should be confirmed by studies analyzing relationship between regression coefficients and ART parameters.

Keywords: ART procedure, blastocyst formation, time-lapse incubator, quadratic model

Procedia PDF Downloads 284
3024 Foot Self-Monitoring Knowledge, Attitude, Practice, and Related Factors among Diabetic Patients: A Descriptive and Correlational Study in a Taiwan Teaching Hospital

Authors: Li-Ching Lin, Yu-Tzu Dai

Abstract:

Recurrent foot ulcers or foot amputation have a major impact on patients with diabetes mellitus (DM), medical professionals, and society. A critical procedure for foot care is foot self-monitoring. Medical professionals’ understanding of patients’ foot self-monitoring knowledge, attitude, and practice is beneficial for raising patients’ disease awareness. This study investigated these and related factors among patients with DM through a descriptive study of the correlations. A scale for measuring the foot self-monitoring knowledge, attitude, and practice of patients with DM was used. Purposive sampling was adopted, and 100 samples were collected from the respondents’ self-reports or from interviews. The statistical methods employed were an independent-sample t-test, one-way analysis of variance, Pearson correlation coefficient, and multivariate regression analysis. The findings were as follows: the respondents scored an average of 12.97 on foot self-monitoring knowledge, and the correct answer rate was 68.26%. The respondents performed relatively lower in foot health screenings and recording, and awareness of neuropathy in the foot. The respondents held a positive attitude toward self-monitoring their feet and a negative attitude toward having others check the soles of their feet. The respondents scored an average of 12.64 on foot self-monitoring practice. Their scores were lower in their frequency of self-monitoring their feet, recording their self-monitoring results, checking their pedal pulse, and examining if their soles were red immediately after taking off their shoes. Significant positive correlations were observed among foot self-monitoring knowledge, attitude, and practice. The correlation coefficient between self-monitoring knowledge and self-monitoring practice was 0.20, and that between self-monitoring attitude and self-monitoring practice was 0.44. Stepwise regression analysis revealed that the main predictive factors of the foot self-monitoring practice in patients with DM were foot self-monitoring attitude, prior experience in foot care, and an educational attainment of college or higher. These factors predicted 33% of the variance. This study concludes that patients with DM lacked foot self-monitoring practice and advises that the patients’ self-monitoring abilities be evaluated first, including whether patients have poor eyesight, difficulties in bending forward due to obesity, and people who can assist them in self-monitoring. In addition, patient education should emphasize self-monitoring knowledge and practice, such as perceptions regarding the symptoms of foot neurovascular lesions, pulse monitoring methods, and new foot self-monitoring equipment. By doing so, new or recurring ulcers may be discovered in their early stages.

Keywords: diabetic foot, foot self-monitoring attitude, foot self-monitoring knowledge, foot self-monitoring practice

Procedia PDF Downloads 168
3023 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values

Authors: Daniel Fundi Murithi

Abstract:

Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.

Keywords: finite population total, missing data, model-based imputation, two-phase sampling

Procedia PDF Downloads 104
3022 A Novel Approach towards Test Case Prioritization Technique

Authors: Kamna Solanki, Yudhvir Singh, Sandeep Dalal

Abstract:

Software testing is a time and cost intensive process. A scrutiny of the code and rigorous testing is required to identify and rectify the putative bugs. The process of bug identification and its consequent correction is continuous in nature and often some of the bugs are removed after the software has been launched in the market. This process of code validation of the altered software during the maintenance phase is termed as Regression testing. Regression testing ubiquitously considers resource constraints; therefore, the deduction of an appropriate set of test cases, from the ensemble of the entire gamut of test cases, is a critical issue for regression test planning. This paper presents a novel method for designing a suitable prioritization process to optimize fault detection rate and performance of regression test on predefined constraints. The proposed method for test case prioritization m-ACO alters the food source selection criteria of natural ants and is basically a modified version of Ant Colony Optimization (ACO). The proposed m-ACO approach has been coded in 'Perl' language and results are validated using three examples by computation of Average Percentage of Faults Detected (APFD) metric.

Keywords: regression testing, software testing, test case prioritization, test suite optimization

Procedia PDF Downloads 304
3021 Establishment and Application of Numerical Simulation Model for Shot Peen Forming Stress Field Method

Authors: Shuo Tian, Xuepiao Bai, Jianqin Shang, Pengtao Gai, Yuansong Zeng

Abstract:

Shot peen forming is an essential forming process for aircraft metal wing panel. With the development of computer simulation technology, scholars have proposed a numerical simulation method of shot peen forming based on stress field. Three shot peen forming indexes of crater diameter, shot speed and surface coverage are required as simulation parameters in the stress field method. It is necessary to establish the relationship between simulation and experimental process parameters in order to simulate the deformation under different shot peen forming parameters. The shot peen forming tests of the 2024-T351 aluminum alloy workpieces were carried out using uniform test design method, and three factors of air pressure, feed rate and shot flow were selected. The second-order response surface model between simulation parameters and uniform test factors was established by stepwise regression method using MATLAB software according to the results. The response surface model was combined with the stress field method to simulate the shot peen forming deformation of the workpiece. Compared with the experimental results, the simulated values were smaller than the corresponding test values, the maximum and average errors were 14.8% and 9%, respectively.

Keywords: shot peen forming, process parameter, response surface model, numerical simulation

Procedia PDF Downloads 53
3020 ScRNA-Seq RNA Sequencing-Based Program-Polygenic Risk Scores Associated with Pancreatic Cancer Risks in the UK Biobank Cohort

Authors: Yelin Zhao, Xinxiu Li, Martin Smelik, Oleg Sysoev, Firoj Mahmud, Dina Mansour Aly, Mikael Benson

Abstract:

Background: Early diagnosis of pancreatic cancer is clinically challenging due to vague, or no symptoms, and lack of biomarkers. Polygenic risk score (PRS) scores may provide a valuable tool to assess increased or decreased risk of PC. This study aimed to develop such PRS by filtering genetic variants identified by GWAS using transcriptional programs identified by single-cell RNA sequencing (scRNA-seq). Methods: ScRNA-seq data from 24 pancreatic ductal adenocarcinoma (PDAC) tumor samples and 11 normal pancreases were analyzed to identify differentially expressed genes (DEGs) in in tumor and microenvironment cell types compared to healthy tissues. Pathway analysis showed that the DEGs were enriched for hundreds of significant pathways. These were clustered into 40 “programs” based on gene similarity, using the Jaccard index. Published genetic variants associated with PDAC were mapped to each program to generate program PRSs (pPRSs). These pPRSs, along with five previously published PRSs (PGS000083, PGS000725, PGS000663, PGS000159, and PGS002264), were evaluated in a European-origin population from the UK Biobank, consisting of 1,310 PDAC participants and 407,473 non-pancreatic cancer participants. Stepwise Cox regression analysis was performed to determine associations between pPRSs with the development of PC, with adjustments of sex and principal components of genetic ancestry. Results: The PDAC genetic variants were mapped to 23 programs and were used to generate pPRSs for these programs. Four distinct pPRSs (P1, P6, P11, and P16) and two published PRSs (PGS000663 and PGS002264) were significantly associated with an increased risk of developing PC. Among these, P6 exhibited the greatest hazard ratio (adjusted HR[95% CI] = 1.67[1.14-2.45], p = 0.008). In contrast, P10 and P4 were associated with lower risk of developing PC (adjusted HR[95% CI] = 0.58[0.42-0.81], p = 0.001, and adjusted HR[95% CI] = 0.75[0.59-0.96], p = 0.019). By comparison, two of the five published PRS exhibited an association with PDAC onset with HR (PGS000663: adjusted HR[95% CI] = 1.24[1.14-1.35], p < 0.001 and PGS002264: adjusted HR[95% CI] = 1.14[1.07-1.22], p < 0.001). Conclusion: Compared to published PRSs, scRNA-seq-based pPRSs may be used not only to assess increased but also decreased risk of PDAC.

Keywords: cox regression, pancreatic cancer, polygenic risk score, scRNA-seq, UK biobank

Procedia PDF Downloads 58
3019 Prediction of the Thermodynamic Properties of Hydrocarbons Using Gaussian Process Regression

Authors: N. Alhazmi

Abstract:

Knowing the thermodynamics properties of hydrocarbons is vital when it comes to analyzing the related chemical reaction outcomes and understanding the reaction process, especially in terms of petrochemical industrial applications, combustions, and catalytic reactions. However, measuring the thermodynamics properties experimentally is time-consuming and costly. In this paper, Gaussian process regression (GPR) has been used to directly predict the main thermodynamic properties - standard enthalpy of formation, standard entropy, and heat capacity -for more than 360 cyclic and non-cyclic alkanes, alkenes, and alkynes. A simple workflow has been proposed that can be applied to directly predict the main properties of any hydrocarbon by knowing its descriptors and chemical structure and can be generalized to predict the main properties of any material. The model was evaluated by calculating the statistical error R², which was more than 0.9794 for all the predicted properties.

Keywords: thermodynamic, Gaussian process regression, hydrocarbons, regression, supervised learning, entropy, enthalpy, heat capacity

Procedia PDF Downloads 188