Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 17

Logistic Regression Related Publications

17 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Text Mining, Logistic Regression, Artificial Neural Network, text classification, Competitive dynamics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 111
16 The Relationship between Class Attendance and Performance of Industrial Engineering Students Enrolled for a Statistics Subject at the University of Technology

Authors: Tshaudi Motsima

Abstract:

Class attendance is key at all levels of education. At tertiary level many students develop a tendency of not attending all classes without being aware of the repercussions of not attending all classes. It is important for all students to attend all classes as they can receive first-hand information and they can benefit more. The student who attends classes is likely to perform better academically than the student who does not. The aim of this paper is to assess the relationship between class attendance and academic performance of industrial engineering students. The data for this study were collected through the attendance register of students and the other data were accessed from the Integrated Tertiary Software and the Higher Education Data Analyzer Portal. Data analysis was conducted on a sample of 93 students. The results revealed that students with medium predicate scores (OR = 3.8; p = 0.027) and students with low predicate scores (OR = 21.4, p < 0.001) were significantly likely to attend less than 80% of the classes as compared to students with high predicate scores. Students with examination performance of less than 50% were likely to attend less than 80% of classes than students with examination performance of 50% and above, but the differences were not statistically significant (OR = 1.3; p = 0.750).

Keywords: Logistic Regression, class attendance, examination performance, final outcome

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 109
15 Machine Learning Techniques in Bank Credit Analysis

Authors: Fernanda M. Assef, Maria Teresinha A. Steiner

Abstract:

The aim of this paper is to compare and discuss better classifier algorithm options for credit risk assessment by applying different Machine Learning techniques. Using records from a Brazilian financial institution, this study uses a database of 5,432 companies that are clients of the bank, where 2,600 clients are classified as non-defaulters, 1,551 are classified as defaulters and 1,281 are temporarily defaulters, meaning that the clients are overdue on their payments for up 180 days. For each case, a total of 15 attributes was considered for a one-against-all assessment using four different techniques: Artificial Neural Networks Multilayer Perceptron (ANN-MLP), Artificial Neural Networks Radial Basis Functions (ANN-RBF), Logistic Regression (LR) and finally Support Vector Machines (SVM). For each method, different parameters were analyzed in order to obtain different results when the best of each technique was compared. Initially the data were coded in thermometer code (numerical attributes) or dummy coding (for nominal attributes). The methods were then evaluated for each parameter and the best result of each technique was compared in terms of accuracy, false positives, false negatives, true positives and true negatives. This comparison showed that the best method, in terms of accuracy, was ANN-RBF (79.20% for non-defaulter classification, 97.74% for defaulters and 75.37% for the temporarily defaulter classification). However, the best accuracy does not always represent the best technique. For instance, on the classification of temporarily defaulters, this technique, in terms of false positives, was surpassed by SVM, which had the lowest rate (0.07%) of false positive classifications. All these intrinsic details are discussed considering the results found, and an overview of what was presented is shown in the conclusion of this study.

Keywords: Artificial Neural Networks, Machine Learning, Support Vector Machines, Logistic Regression, credit risk assessment, ANNs, Classifier algorithms

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 562
14 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial Neural Networks, Precision, Machine Learning, Breast Cancer, Cervical Cancer, Logistic Regression, support vector machine, classifiers, recall, F-score, cancer dataset

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1093
13 Nuclear Fuel Safety Threshold Determined by Logistic Regression Plus Uncertainty

Authors: D. S. Gomes, A. T. Silva

Abstract:

Analysis of the uncertainty quantification related to nuclear safety margins applied to the nuclear reactor is an important concept to prevent future radioactive accidents. The nuclear fuel performance code may involve the tolerance level determined by traditional deterministic models producing acceptable results at burn cycles under 62 GWd/MTU. The behavior of nuclear fuel can simulate applying a series of material properties under irradiation and physics models to calculate the safety limits. In this study, theoretical predictions of nuclear fuel failure under transient conditions investigate extended radiation cycles at 75 GWd/MTU, considering the behavior of fuel rods in light-water reactors under reactivity accident conditions. The fuel pellet can melt due to the quick increase of reactivity during a transient. Large power excursions in the reactor are the subject of interest bringing to a treatment that is known as the Fuchs-Hansen model. The point kinetic neutron equations show similar characteristics of non-linear differential equations. In this investigation, the multivariate logistic regression is employed to a probabilistic forecast of fuel failure. A comparison of computational simulation and experimental results was acceptable. The experiments carried out use the pre-irradiated fuels rods subjected to a rapid energy pulse which exhibits the same behavior during a nuclear accident. The propagation of uncertainty utilizes the Wilk's formulation. The variables chosen as essential to failure prediction were the fuel burnup, the applied peak power, the pulse width, the oxidation layer thickness, and the cladding type.

Keywords: Logistic Regression, Uncertainty Propagation, reactivity-initiated accident, safety margins

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 703
12 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

Authors: Ghada A. Alfattni

Abstract:

Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates. 

Keywords: Machine Learning, Logistic Regression, support vector machine, nearest neighbour, SMOTE, imbalanced datasets

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1000
11 Measuring Enterprise Growth: Pitfalls and Implications

Authors: N. Šarlija, S. Pfeifer, M. Jeger, A. Bilandžić

Abstract:

Enterprise growth is generally considered as a key driver of competitiveness, employment, economic development and social inclusion. As such, it is perceived to be a highly desirable outcome of entrepreneurship for scholars and decision makers. The huge academic debate resulted in the multitude of theoretical frameworks focused on explaining growth stages, determinants and future prospects. It has been widely accepted that enterprise growth is most likely nonlinear, temporal and related to the variety of factors which reflect the individual, firm, organizational, industry or environmental determinants of growth. However, factors that affect growth are not easily captured, instruments to measure those factors are often arbitrary, causality between variables and growth is elusive, indicating that growth is not easily modeled. Furthermore, in line with heterogeneous nature of the growth phenomenon, there is a vast number of measurement constructs assessing growth which are used interchangeably. Differences among various growth measures, at conceptual as well as at operationalization level, can hinder theory development which emphasizes the need for more empirically robust studies. In line with these highlights, the main purpose of this paper is twofold. Firstly, to compare structure and performance of three growth prediction models based on the main growth measures: Revenues, employment and assets growth. Secondly, to explore the prospects of financial indicators, set as exact, visible, standardized and accessible variables, to serve as determinants of enterprise growth. Finally, to contribute to the understanding of the implications on research results and recommendations for growth caused by different growth measures. The models include a range of financial indicators as lag determinants of the enterprises’ performances during the 2008-2013, extracted from the national register of the financial statements of SMEs in Croatia. The design and testing stage of the modeling used the logistic regression procedures. Findings confirm that growth prediction models based on different measures of growth have different set of predictors. Moreover, the relationship between particular predictors and growth measure is inconsistent, namely the same predictor positively related to one growth measure may exert negative effect on a different growth measure. Overall, financial indicators alone can serve as good proxy of growth and yield adequate predictive power of the models. The paper sheds light on both methodology and conceptual framework of enterprise growth by using a range of variables which serve as a proxy for the multitude of internal and external determinants, but are unlike them, accessible, available, exact and free of perceptual nuances in building up the model. Selection of the growth measure seems to have significant impact on the implications and recommendations related to growth. Furthermore, the paper points out to potential pitfalls of measuring and predicting growth. Overall, the results and the implications of the study are relevant for advancing academic debates on growth-related methodology, and can contribute to evidence-based decisions of policy makers.

Keywords: Logistic Regression, small and medium-sized enterprises, growth measurement constructs, prediction of growth potential

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1630
10 The Risk Factors Associated with Under-Five Mortality in Lesotho Using the 2009 Lesotho Demographic and Health Survey

Authors: T. Motsima

Abstract:

The under-5 mortality rate is high in sub-Saharan Africa with Lesotho being amongst the highest under-5 mortality rates in the world. The objective of the study is to determine the factors associated with under-5 mortality in Lesotho. The data used for this analysis come from the nationally representative household survey called the 2009 Lesotho Demographic and Health Survey. Odds ratios produced by the logistic regression models were used to measure the effect of each independent variable on the dependent variable. Female children were significantly 38% less likely to die than male children. Children who were breastfed for 13 to 18 months and those who were breastfed for more than 19 months were significantly less likely to die than those who were breastfed for 12 months or less. Furthermore, children of mothers who stayed in Quthing, Qacha’s Nek and Thaba Tseka ran the greatest risk of dying. The results suggested that: sex of child, type of birth, breastfeeding duration, district, source of energy and marital status were significant predictors of under-5 mortality, after correcting for all variables.

Keywords: Breastfeeding, Logistic Regression, Risk Factors, millennium development goals, Under-5 mortality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1007
9 Applying the Regression Technique for Prediction of the Acute Heart Attack

Authors: Paria Soleimani, Arezoo Neshati

Abstract:

Myocardial infarction is one of the leading causes of death in the world. Some of these deaths occur even before the patient reaches the hospital. Myocardial infarction occurs as a result of impaired blood supply. Because the most of these deaths are due to coronary artery disease, hence the awareness of the warning signs of a heart attack is essential. Some heart attacks are sudden and intense, but most of them start slowly, with mild pain or discomfort, then early detection and successful treatment of these symptoms is vital to save them. Therefore, importance and usefulness of a system designing to assist physicians in early diagnosis of the acute heart attacks is obvious. The main purpose of this study would be to enable patients to become better informed about their condition and to encourage them to seek professional care at an earlier stage in the appropriate situations. For this purpose, the data were collected on 711 heart patients in Iran hospitals. 28 attributes of clinical factors can be reported by patients; were studied. Three logistic regression models were made on the basis of the 28 features to predict the risk of heart attacks. The best logistic regression model in terms of performance had a C-index of 0.955 and with an accuracy of 94.9%. The variables, severe chest pain, back pain, cold sweats, shortness of breath, nausea and vomiting, were selected as the main features.

Keywords: coronary heart disease, Logistic Regression, prediction, Acute heart attacks

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2015
8 Comparative Study - Three Artificial Intelligence Techniques for Rain Domain in Precipitation Forecast

Authors: Nabilah Filzah Mohd Radzuan, Andi Putra, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Precipitation forecast is important in avoid incident of natural disaster which can cause loss in involved area. This review paper involves three techniques from artificial intelligence namely logistic regression, decisions tree, and random forest which used in making precipitation forecast. These combination techniques through VAR model in finding advantages and strength for every technique in forecast process. Data contains variables from rain domain. Adaptation of artificial intelligence techniques involved on rain domain enables the process to be easier and systematic for precipitation forecast.

Keywords: Logistic Regression, decisions tree, random forest, VAR model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1704
7 Comparison of Neural Network and Logistic Regression Methods to Predict Xerostomia after Radiotherapy

Authors: Hui-Min Ting, Tsair-Fwu Lee, Ming-Yuan Cho, Pei-Ju Chao, Chun-Ming Chang, Long-Chang Chen, Fu-Min Fang

Abstract:

To evaluate the ability to predict xerostomia after radiotherapy, we constructed and compared neural network and logistic regression models. In this study, 61 patients who completed a questionnaire about their quality of life (QoL) before and after a full course of radiation therapy were included. Based on this questionnaire, some statistical data about the condition of the patients’ salivary glands were obtained, and these subjects were included as the inputs of the neural network and logistic regression models in order to predict the probability of xerostomia. Seven variables were then selected from the statistical data according to Cramer’s V and point-biserial correlation values and were trained by each model to obtain the respective outputs which were 0.88 and 0.89 for AUC, 9.20 and 7.65 for SSE, and 13.7% and 19.0% for MAPE, respectively. These parameters demonstrate that both neural network and logistic regression methods are effective for predicting conditions of parotid glands.

Keywords: Logistic Regression, ANN, xerostomia, NPC

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1289
6 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data

Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch

Abstract:

It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.

Keywords: Decision trees, Logistic Regression, clinical outcome, risk of mortality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2244
5 Household Demand for Solid Waste Disposal Options in Malaysia

Authors: Pek Chuen-Khee, Jamal Othman

Abstract:

This paper estimates the economic values of household preference for enhanced solid waste disposal services in Malaysia. The contingent valuation (CV) method estimates an average additional monthly willingness-to-pay (WTP) in solid waste management charges of Ôé¼0.77 to 0.80 for improved waste disposal services quality. The finding of a slightly higher WTP from the generic CV question than that of label-specific, further reveals a higher WTP for sanitary landfill, at Ôé¼0.90, than incineration, at Ôé¼0.63. This suggests that sanitary landfill is a more preferred alternative. The logistic regression estimation procedure reveals that household-s concern of where their rubbish is disposed, age, ownership of house, household income and format of CV question are significant factors in influencing WTP.

Keywords: Logistic Regression, solid waste disposal, contingent valuation, willingness-to-pay

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2256
4 Application of Company Financial Crisis Early Warning Model- Use of “Financial Reference Database“

Authors: Chiung-ying Lee, Chia-hua Chang

Abstract:

In July 1, 2007, Taiwan Stock Exchange (TWSE) on market observation post system (MOPS) adds a new "Financial reference database" for investors to do investment reference. This database as a warning to public offering companies listed on the public financial information and it original within eight targets. In this paper, this database provided by the indicators for the application of company financial crisis early warning model verify that the database provided by the indicator forecast for the financial crisis, whether or not companies have a high accuracy rate as opposed to domestic and foreign scholars have positive results. There is use of Logistic Regression Model application of the financial early warning model, in which no joined back-conditions is the first model, joined it in is the second model, has been taken occurred in the financial crisis of companies to research samples and then business took place before the financial crisis point with T-1 and T-2 sample data to do positive analysis. The results show that this database provided the debt ratio and net per share for the best forecast variables.

Keywords: Logistic Regression, Financial reference database, Financial early warning model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1148
3 Categorical Data Modeling: Logistic Regression Software

Authors: Abdellatif Tchantchane

Abstract:

A Matlab based software for logistic regression is developed to enhance the process of teaching quantitative topics and assist researchers with analyzing wide area of applications where categorical data is involved. The software offers an option of performing stepwise logistic regression to select the most significant predictors. The software includes a feature to detect influential observations in data, and investigates the effect of dropping or misclassifying an observation on a predictor variable. The input data may consist either as a set of individual responses (yes/no) with the predictor variables or as grouped records summarizing various categories for each unique set of predictor variables' values. Graphical displays are used to output various statistical results and to assess the goodness of fit of the logistic regression model. The software recognizes possible convergence constraints when present in data, and the user is notified accordingly.

Keywords: MATLAB, Logistic Regression, categorical data, Influential observation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1568
2 Defect Cause Modeling with Decision Tree and Regression Analysis

Authors: B. Bakır, İ. Batmaz, F. A. Güntürkün, İ. A. İpekçi, G. Köksal, N. E. Özdemirel

Abstract:

The main aim of this study is to identify the most influential variables that cause defects on the items produced by a casting company located in Turkey. To this end, one of the items produced by the company with high defective percentage rates is selected. Two approaches-the regression analysis and decision treesare used to model the relationship between process parameters and defect types. Although logistic regression models failed, decision tree model gives meaningful results. Based on these results, it can be claimed that the decision tree approach is a promising technique for determining the most important process variables.

Keywords: Logistic Regression, Quality improvement, Casting Industry, decision tree algorithm C5.0

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2139
1 Parametric Approach for Reserve Liability Estimate in Mortgage Insurance

Authors: Rajinder Singh, Ram Valluru

Abstract:

Chain Ladder (CL) method, Expected Loss Ratio (ELR) method and Bornhuetter-Ferguson (BF) method, in addition to more complex transition-rate modeling, are commonly used actuarial reserving methods in general insurance. There is limited published research about their relative performance in the context of Mortgage Insurance (MI). In our experience, these traditional techniques pose unique challenges and do not provide stable claim estimates for medium to longer term liabilities. The relative strengths and weaknesses among various alternative approaches revolve around: stability in the recent loss development pattern, sufficiency and reliability of loss development data, and agreement/disagreement between reported losses to date and ultimate loss estimate. CL method results in volatile reserve estimates, especially for accident periods with little development experience. The ELR method breaks down especially when ultimate loss ratios are not stable and predictable. While the BF method provides a good tradeoff between the loss development approach (CL) and ELR, the approach generates claim development and ultimate reserves that are disconnected from the ever-to-date (ETD) development experience for some accident years that have more development experience. Further, BF is based on subjective a priori assumption. The fundamental shortcoming of these methods is their inability to model exogenous factors, like the economy, which impact various cohorts at the same chronological time but at staggered points along their life-time development. This paper proposes an alternative approach of parametrizing the loss development curve and using logistic regression to generate the ultimate loss estimate for each homogeneous group (accident year or delinquency period). The methodology was tested on an actual MI claim development dataset where various cohorts followed a sigmoidal trend, but levels varied substantially depending upon the economic and operational conditions during the development period spanning over many years. The proposed approach provides the ability to indirectly incorporate such exogenous factors and produce more stable loss forecasts for reserving purposes as compared to the traditional CL and BF methods.

Keywords: Logistic Regression, volatility, actuarial loss reserving techniques, parametric function

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 89