Search results for: logistic regression factor analysis
32151 Application Difference between Cox and Logistic Regression Models
Authors: Idrissa Kayijuka
Abstract:
The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio
Procedia PDF Downloads 45232150 The Theory behind Logistic Regression
Authors: Jan Henrik Wosnitza
Abstract:
The logistic regression has developed into a standard approach for estimating conditional probabilities in a wide range of applications including credit risk prediction. The article at hand contributes to the current literature on logistic regression fourfold: First, it is demonstrated that the binary logistic regression automatically meets its model assumptions under very general conditions. This result explains, at least in part, the logistic regression's popularity. Second, the requirement of homoscedasticity in the context of binary logistic regression is theoretically substantiated. The variances among the groups of defaulted and non-defaulted obligors have to be the same across the level of the aggregated default indicators in order to achieve linear logits. Third, this article sheds some light on the question why nonlinear logits might be superior to linear logits in case of a small amount of data. Fourth, an innovative methodology for estimating correlations between obligor-specific log-odds is proposed. In order to crystallize the key ideas, this paper focuses on the example of credit risk prediction. However, the results presented in this paper can easily be transferred to any other field of application.Keywords: correlation, credit risk estimation, default correlation, homoscedasticity, logistic regression, nonlinear logistic regression
Procedia PDF Downloads 42532149 Factors for Entry Timing Choices Using Principal Axis Factorial Analysis and Logistic Regression Model
Authors: C. M. Mat Isa, H. Mohd Saman, S. R. Mohd Nasir, A. Jaapar
Abstract:
International market expansion involves a strategic process of market entry decision through which a firm expands its operation from domestic to the international domain. Hence, entry timing choices require the needs to balance the early entry risks and the problems in losing opportunities as a result of late entry into a new market. Questionnaire surveys administered to 115 Malaysian construction firms operating in 51 countries worldwide have resulted in 39.1 percent response rate. Factor analysis was used to determine the most significant factors affecting entry timing choices of the firms to penetrate the international market. A logistic regression analysis used to examine the firms’ entry timing choices, indicates that the model has correctly classified 89.5 per cent of cases as late movers. The findings reveal that the most significant factor influencing the construction firms’ choices as late movers was the firm factor related to the firm’s international experience, resources, competencies and financing capacity. The study also offers valuable information to construction firms with intention to internationalize their businesses.Keywords: factors, early movers, entry timing choices, late movers, logistic regression model, principal axis factorial analysis, Malaysian construction firms
Procedia PDF Downloads 37532148 Logistic Regression Model versus Additive Model for Recurrent Event Data
Authors: Entisar A. Elgmati
Abstract:
Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event
Procedia PDF Downloads 63432147 Use of Multistage Transition Regression Models for Credit Card Income Prediction
Authors: Denys Osipenko, Jonathan Crook
Abstract:
Because of the variety of the card holders’ behaviour types and income sources each consumer account can be transferred to a variety of states. Each consumer account can be inactive, transactor, revolver, delinquent, defaulted and requires an individual model for the income prediction. The estimation of transition probabilities between statuses at the account level helps to avoid the memorylessness of the Markov Chains approach. This paper investigates the transition probabilities estimation approaches to credit cards income prediction at the account level. The key question of empirical research is which approach gives more accurate results: multinomial logistic regression or multistage conditional logistic regression with binary target. Both models have shown moderate predictive power. Prediction accuracy for conditional logistic regression depends on the order of stages for the conditional binary logistic regression. On the other hand, multinomial logistic regression is easier for usage and gives integrate estimations for all states without priorities. Thus further investigations can be concentrated on alternative modeling approaches such as discrete choice models.Keywords: multinomial regression, conditional logistic regression, credit account state, transition probability
Procedia PDF Downloads 48432146 Instability Index Method and Logistic Regression to Assess Landslide Susceptibility in County Route 89, Taiwan
Authors: Y. H. Wu, Ji-Yuan Lin, Yu-Ming Liou
Abstract:
This study aims to set up the landslide susceptibility map of County Route 89 at Ren-Ai Township in Nantou County using the Instability Index Method and Logistic regression. Seven susceptibility factors including Slope Angle, Aspect, Elevation, Distance to fold, Distance to River, Distance to Road and Accumulated Rainfall were obtained by GIS based on the Typhoon Toraji landslide area identified by Industrial Technology Research Institute in 2001. To calculate the landslide percentage of each factor and acquire the weight and grade the grid by means of Instability Index Method. In this study, landslide susceptibility can be classified into four grades: high, medium high, medium low and low, in order to determine the advantages and disadvantages of the two models. The precision of this model is verified by classification error matrix and SRC curve. These results suggest that the logistic regression model is a preferred method than instability index in the assessment of landslide susceptibility. It is suitable for the landslide prediction and precaution in this area in the future.Keywords: instability index method, logistic regression, landslide susceptibility, SRC curve
Procedia PDF Downloads 28932145 MapReduce Logistic Regression Algorithms with RHadoop
Authors: Byung Ho Jung, Dong Hoon Lim
Abstract:
Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.Keywords: big data, logistic regression, MapReduce, RHadoop
Procedia PDF Downloads 28232144 Probability Model Accidents of Motorcyclist Based on Driver's Personality
Authors: Margareth E. Bolla, Ludfi Djakfar, Achmad Wicaksono
Abstract:
The increase in the number of motorcycle users in Indonesia is in line with the increase in accidents involving motorcycles. Several previous studies have shown that humans are the biggest factor causing accidents, and the driver's personality factor will affect his behavior on the road. This study was conducted to see how a person's personality traits will affect the probability of having an accident while driving. The Big Five Inventory (BFI) questionnaire and the Honda Riding Trainer (HRT) simulator were used as measuring tools, while the analysis carried out was logistic regression analysis. The results of the descriptive analysis of the respondent's personality based on the BFI show that the majority of drivers have the dominant character of neuroticism (34%), while the smallest group is the driver with the dominant type of openness character (6%). The percentage of motorists who were not involved in an accident was 54%. The results of the logistic regression analysis form a mathematical model as follows Y = -3.852 - 0.288 X1 + 0.596 X2 + 0.429 X3 - 0.386 X4 - 0.094 X5 + 0.436 X6 + 0.162 X7, where the results of hypothesis testing indicate that the variables openness, conscientiousness, extraversion, agreeableness, neuroticism, history of traffic accidents and age at starting driving did not have a significant effect on the probability of a motorcyclist being involved in an accident.Keywords: accidents, BFI, probability, simulator
Procedia PDF Downloads 14532143 Generalized Additive Model for Estimating Propensity Score
Authors: Tahmidul Islam
Abstract:
Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching
Procedia PDF Downloads 36632142 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation
Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen
Abstract:
Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning
Procedia PDF Downloads 7132141 Determining the Causality Variables in Female Genital Mutilation: A Factor Screening Approach
Authors: Ekele Alih, Enejo Jalija
Abstract:
Female Genital Mutilation (FGM) is made up of three types namely: Clitoridectomy, Excision and Infibulation. In this study, we examine the factors responsible for FGM in order to identify the causality variables in a logistic regression approach. From the result of the survey conducted by the Public Health Division, Nigeria Institute of Medical Research, Yaba, Lagos State, the tau statistic, τ was used to screen 9 factors that causes FGM in order to select few of the predictors before multiple regression equation is obtained. The need for this may be that the sample size may not be able to sustain having a regression with all the predictors or to avoid multi-collinearity. A total of 300 respondents, comprising 150 adult males and 150 adult females were selected for the household survey based on the multi-stage sampling procedure. The tau statistic,Keywords: female genital mutilation, logistic regression, tau statistic, African society
Procedia PDF Downloads 26032140 A Kolmogorov-Smirnov Type Goodness-Of-Fit Test of Multinomial Logistic Regression Model in Case-Control Studies
Authors: Chen Li-Ching
Abstract:
The multinomial logistic regression model is used popularly for inferring the relationship of risk factors and disease with multiple categories. This study based on the discrepancy between the nonparametric maximum likelihood estimator and semiparametric maximum likelihood estimator of the cumulative distribution function to propose a Kolmogorov-Smirnov type test statistic to assess adequacy of the multinomial logistic regression model for case-control data. A bootstrap procedure is presented to calculate the critical value of the proposed test statistic. Empirical type I error rates and powers of the test are performed by simulation studies. Some examples will be illustrated the implementation of the test.Keywords: case-control studies, goodness-of-fit test, Kolmogorov-Smirnov test, multinomial logistic regression
Procedia PDF Downloads 45532139 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities
Authors: Retius Chifurira
Abstract:
Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities
Procedia PDF Downloads 19932138 Reminiscence Therapy for Alzheimer’s Disease Restrained on Logistic Regression Based Linear Bootstrap Aggregating
Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Xianpei Li, Yanmin Yuan, Tracy Lin Huan
Abstract:
Researchers are doing enchanting research into the inherited features of Alzheimer’s disease and probable consistent therapies. In Alzheimer’s, memories are extinct in reverse order; memories formed lately are more transitory than those from formerly. Reminiscence therapy includes the conversation of past actions, trials and knowledges with another individual or set of people, frequently with the help of perceptible reminders such as photos, household and other acquainted matters from the past, music and collection of tapes. In this manuscript, the competence of reminiscence therapy for Alzheimer’s disease is measured using logistic regression based linear bootstrap aggregating. Logistic regression is used to envisage the experiential features of the patient’s memory through various therapies. Linear bootstrap aggregating shows better stability and accuracy of reminiscence therapy used in statistical classification and regression of memories related to validation therapy, supportive psychotherapy, sensory integration and simulated presence therapy.Keywords: Alzheimer’s disease, linear bootstrap aggregating, logistic regression, reminiscence therapy
Procedia PDF Downloads 30832137 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco
Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui
Abstract:
The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate
Procedia PDF Downloads 18632136 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors
Authors: Katawut Kaewbanjong
Abstract:
We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.Keywords: prediction model, statistical analysis, software project, user satisfaction factor
Procedia PDF Downloads 12332135 Credit Risk Prediction Based on Bayesian Estimation of Logistic Regression Model with Random Effects
Authors: Sami Mestiri, Abdeljelil Farhat
Abstract:
The aim of this current paper is to predict the credit risk of banks in Tunisia, over the period (2000-2005). For this purpose, two methods for the estimation of the logistic regression model with random effects: Penalized Quasi Likelihood (PQL) method and Gibbs Sampler algorithm are applied. By using the information on a sample of 528 Tunisian firms and 26 financial ratios, we show that Bayesian approach improves the quality of model predictions in terms of good classification as well as by the ROC curve result.Keywords: forecasting, credit risk, Penalized Quasi Likelihood, Gibbs Sampler, logistic regression with random effects, curve ROC
Procedia PDF Downloads 54132134 On Estimating the Headcount Index by Using the Logistic Regression Estimator
Authors: Encarnación Álvarez, Rosa M. García-Fernández, Juan F. Muñoz, Francisco J. Blanco-Encomienda
Abstract:
The problem of estimating a proportion has important applications in the field of economics, and in general, in many areas such as social sciences. A common application in economics is the estimation of the headcount index. In this paper, we define the general headcount index as a proportion. Furthermore, we introduce a new quantitative method for estimating the headcount index. In particular, we suggest to use the logistic regression estimator for the problem of estimating the headcount index. Assuming a real data set, results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the traditional estimator of the headcount index.Keywords: poverty line, poor, risk of poverty, Monte Carlo simulations, sample
Procedia PDF Downloads 42132133 A Hybrid Model Tree and Logistic Regression Model for Prediction of Soil Shear Strength in Clay
Authors: Ehsan Mehryaar, Seyed Armin Motahari Tabari
Abstract:
Without a doubt, soil shear strength is the most important property of the soil. The majority of fatal and catastrophic geological accidents are related to shear strength failure of the soil. Therefore, its prediction is a matter of high importance. However, acquiring the shear strength is usually a cumbersome task that might need complicated laboratory testing. Therefore, prediction of it based on common and easy to get soil properties can simplify the projects substantially. In this paper, A hybrid model based on the classification and regression tree algorithm and logistic regression is proposed where each leaf of the tree is an independent regression model. A database of 189 points for clay soil, including Moisture content, liquid limit, plastic limit, clay content, and shear strength, is collected. The performance of the developed model compared to the existing models and equations using root mean squared error and coefficient of correlation.Keywords: model tree, CART, logistic regression, soil shear strength
Procedia PDF Downloads 19532132 Heart Attack Prediction Using Several Machine Learning Methods
Authors: Suzan Anwar, Utkarsh Goyal
Abstract:
Heart rate (HR) is a predictor of cardiovascular, cerebrovascular, and all-cause mortality in the general population, as well as in patients with cardio and cerebrovascular diseases. Machine learning (ML) significantly improves the accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment while avoiding unnecessary treatment of others. This research examines relationship between the individual's various heart health inputs like age, sex, cp, trestbps, thalach, oldpeaketc, and the likelihood of developing heart disease. Machine learning techniques like logistic regression and decision tree, and Python are used. The results of testing and evaluating the model using the Heart Failure Prediction Dataset show the chance of a person having a heart disease with variable accuracy. Logistic regression has yielded an accuracy of 80.48% without data handling. With data handling (normalization, standardscaler), the logistic regression resulted in improved accuracy of 87.80%, decision tree 100%, random forest 100%, and SVM 100%.Keywords: heart rate, machine learning, SVM, decision tree, logistic regression, random forest
Procedia PDF Downloads 13632131 Self-Image of Police Officers
Authors: Leo Carlo B. Rondina
Abstract:
Self-image is an important factor to improve the self-esteem of the personnel. The purpose of the study is to determine the self-image of the police. The respondents were the 503 policemen assigned in different Police Station in Davao City, and they were chosen with the used of random sampling. With the used of Exploratory Factor Analysis (EFA), latent construct variables of police image were identified as follows; professionalism, obedience, morality and justice and fairness. Further, ordinal regression indicates statistical characteristics on ages 21-40 which means the age of the respondent statistically improves self-image.Keywords: police image, exploratory factor analysis, ordinal regression, Galatea effect
Procedia PDF Downloads 28532130 An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia
Authors: Carol Anne Hargreaves
Abstract:
A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.Keywords: machine learning, stock market trading, logistic regression, cluster analysis, factor analysis, decision trees, neural networks, automated stock investment system
Procedia PDF Downloads 15632129 Robustified Asymmetric Logistic Regression Model for Global Fish Stock Assessment
Authors: Osamu Komori, Shinto Eguchi, Hiroshi Okamura, Momoko Ichinokawa
Abstract:
The long time-series data on population assessments are essential for global ecosystem assessment because the temporal change of biomass in such a database reflects the status of global ecosystem properly. However, the available assessment data usually have limited sample sizes and the ratio of populations with low abundance of biomass (collapsed) to those with high abundance (non-collapsed) is highly imbalanced. To allow for the imbalance and uncertainty involved in the ecological data, we propose a binary regression model with mixed effects for inferring ecosystem status through an asymmetric logistic model. In the estimation equation, we observe that the weights for the non-collapsed populations are relatively reduced, which in turn puts more importance on the small number of observations of collapsed populations. Moreover, we extend the asymmetric logistic regression model using propensity score to allow for the sample biases observed in the labeled and unlabeled datasets. It robustified the estimation procedure and improved the model fitting.Keywords: double robust estimation, ecological binary data, mixed effect logistic regression model, propensity score
Procedia PDF Downloads 26432128 Qualitative and Quantitative Analysis of Motivation Letters to Model Turnover in Non-Governmental Organization
Authors: A. Porshnev, A. Zaporozhtchuk
Abstract:
Motivation regarded as a key factor of labor turnover, is especially important for volunteers working on an altruistic basis in NGO. Despite the motivational letter, candidate selection depends on the impression of the selection committee, which can be subject to human bias. We expect that structured and unstructured information provided in motivation letters could be used to improve candidate selection procedures. In our paper, we perform qualitative and quantitative analysis of 2280 motivation letters, create logistic regression, and build a decision tree to improve selection procedures. Our analysis showed that motivation factors are significant and enable human resources department to forecast labor turnover and provide extra information to demographic, professional and timing questions. In spite of the average level of accuracy the model demonstrates the selection procedures of company of under consideration can be improved. We also discuss interrelation between answers to open and closed motivation questions, recommend changes in motivational letter templates to ensure more relevant information about applicants and further steps to create more accurate model.Keywords: decision trees, logistic regression, model, motivational letter, non-governmental organization, retention, turnover
Procedia PDF Downloads 17432127 A Hybrid Fuzzy Clustering Approach for Fertile and Unfertile Analysis
Authors: Shima Soltanzadeh, Mohammad Hosain Fazel Zarandi, Mojtaba Barzegar Astanjin
Abstract:
Diagnosis of male infertility by the laboratory tests is expensive and, sometimes it is intolerable for patients. Filling out the questionnaire and then using classification method can be the first step in decision-making process, so only in the cases with a high probability of infertility we can use the laboratory tests. In this paper, we evaluated the performance of four classification methods including naive Bayesian, neural network, logistic regression and fuzzy c-means clustering as a classification, in the diagnosis of male infertility due to environmental factors. Since the data are unbalanced, the ROC curves are most suitable method for the comparison. In this paper, we also have selected the more important features using a filtering method and examined the impact of this feature reduction on the performance of each methods; generally, most of the methods had better performance after applying the filter. We have showed that using fuzzy c-means clustering as a classification has a good performance according to the ROC curves and its performance is comparable to other classification methods like logistic regression.Keywords: classification, fuzzy c-means, logistic regression, Naive Bayesian, neural network, ROC curve
Procedia PDF Downloads 33532126 Comparative Analysis of Predictive Models for Customer Churn Prediction in the Telecommunication Industry
Authors: Deepika Christopher, Garima Anand
Abstract:
To determine the best model for churn prediction in the telecom industry, this paper compares 11 machine learning algorithms, namely Logistic Regression, Support Vector Machine, Random Forest, Decision Tree, XGBoost, LightGBM, Cat Boost, AdaBoost, Extra Trees, Deep Neural Network, and Hybrid Model (MLPClassifier). It also aims to pinpoint the top three factors that lead to customer churn and conducts customer segmentation to identify vulnerable groups. According to the data, the Logistic Regression model performs the best, with an F1 score of 0.6215, 81.76% accuracy, 68.95% precision, and 56.57% recall. The top three attributes that cause churn are found to be tenure, Internet Service Fiber optic, and Internet Service DSL; conversely, the top three models in this article that perform the best are Logistic Regression, Deep Neural Network, and AdaBoost. The K means algorithm is applied to establish and analyze four different customer clusters. This study has effectively identified customers that are at risk of churn and may be utilized to develop and execute strategies that lower customer attrition.Keywords: attrition, retention, predictive modeling, customer segmentation, telecommunications
Procedia PDF Downloads 5632125 Lean Implementation Analysis on the Safety Performance of Construction Projects in the Philippines
Authors: Kim Lindsay F. Restua, Jeehan Kyra A. Rivero, Joneka Myles D. Taguba
Abstract:
Lean construction is defined as an approach in construction with the purpose of reducing waste in the process without compromising the value of the project. There are numerous lean construction tools that are applied in the construction process, which maximizes the efficiency of work and satisfaction of customers while minimizing waste. However, the complexity and differences of construction projects cause a rise in challenges on achieving the lean benefits construction can give, such as improvement in safety performance. The objective of this study is to determine the relationship between lean construction tools and their effects on safety performance. The relationship between construction tools applied in construction and safety performance is identified through Logistic Regression Analysis, and Correlation Analysis was conducted thereafter. Based on the findings, it was concluded that almost 60% of the factors listed in the study, which are different tools and effects of lean construction, were determined to have a significant relationship with the level of safety in construction projects.Keywords: correlation analysis, lean construction tools, lean construction, logistic regression analysis, risk management, safety
Procedia PDF Downloads 18332124 An Information Matrix Goodness-of-Fit Test of the Conditional Logistic Model for Matched Case-Control Studies
Authors: Li-Ching Chen
Abstract:
The case-control design has been widely applied in clinical and epidemiological studies to investigate the association between risk factors and a given disease. The retrospective design can be easily implemented and is more economical over prospective studies. To adjust effects for confounding factors, methods such as stratification at the design stage and may be adopted. When some major confounding factors are difficult to be quantified, a matching design provides an opportunity for researchers to control the confounding effects. The matching effects can be parameterized by the intercepts of logistic models and the conditional logistic regression analysis is then adopted. This study demonstrates an information-matrix-based goodness-of-fit statistic to test the validity of the logistic regression model for matched case-control data. The asymptotic null distribution of this proposed test statistic is inferred. It needs neither to employ a simulation to evaluate its critical values nor to partition covariate space. The asymptotic power of this test statistic is also derived. The performance of the proposed method is assessed through simulation studies. An example of the real data set is applied to illustrate the implementation of the proposed method as well.Keywords: conditional logistic model, goodness-of-fit, information matrix, matched case-control studies
Procedia PDF Downloads 29032123 A New Method to Estimate the Low Income Proportion: Monte Carlo Simulations
Authors: Encarnación Álvarez, Rosa M. García-Fernández, Juan F. Muñoz
Abstract:
Estimation of a proportion has many applications in economics and social studies. A common application is the estimation of the low income proportion, which gives the proportion of people classified as poor into a population. In this paper, we present this poverty indicator and propose to use the logistic regression estimator for the problem of estimating the low income proportion. Various sampling designs are presented. Assuming a real data set obtained from the European Survey on Income and Living Conditions, Monte Carlo simulation studies are carried out to analyze the empirical performance of the logistic regression estimator under the various sampling designs considered in this paper. Results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the customary estimator under the various sampling designs considered in this paper. The stratified sampling design can also provide more accurate results.Keywords: poverty line, risk of poverty, auxiliary variable, ratio method
Procedia PDF Downloads 45532122 Teachers’ Intention to Leave: Educational Policies as External Stress Factor
Authors: A. Myrzabekova, D. Nurmukhamed, K. Nurumov, A. Zhulbarissova
Abstract:
It is widely believed that stress can affect teachers’ intention to change the workplace. While existing research primarily focuses on the intrinsic sources of stress stemming from the school climate, the current attempt analyzes educational policies as one of the determinants of teacher’s intention to leave schools. In this respect, Kazakhstan presents a unique case since the country endorsed several educational policies which directly impacted teaching and administrative practices within schools. Using Teaching and Learning International Survey 2018 (TALIS) data with the country specific questionnaire, we construct a statistical measure of stress caused by the implementation of educational policies and test its impact on teacher’s intention to leave through the logistic regression. In addition, we control for sociodemographic, professional, and students related covariates while considering the intrinsic dimension of stress stemming from the school climate. Overall, our results suggest that stress caused by the educational policies has a statistically significant positive effect on teachers’ intentions to transfer between schools. Both policy makers and educational scholars could find these results beneficial. For the former careful planning and addressing the negative effects of the educational policies is critical for the sustainability of the educational process. For the latter, accounting for exogenous sources of stress can lead to a more complete understanding of why teachers decide to change their schools.Keywords: educational policies, Kazakhstani teachers, logistic regression factor analysis, sustainability education TALIS, teacher turnover intention, work stress
Procedia PDF Downloads 108