Search results for: multinomial logistic regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3427

Search results for: multinomial logistic regression

3397 Local Interpretable Model-agnostic Explanations (LIME) Approach to Email Spam Detection

Authors: Rohini Hariharan, Yazhini R., Blessy Maria Mathew

Abstract:

The task of detecting email spam is a very important one in the era of digital technology that needs effective ways of curbing unwanted messages. This paper presents an approach aimed at making email spam categorization algorithms transparent, reliable and more trustworthy by incorporating Local Interpretable Model-agnostic Explanations (LIME). Our technique assists in providing interpretable explanations for specific classifications of emails to help users understand the decision-making process by the model. In this study, we developed a complete pipeline that incorporates LIME into the spam classification framework and allows creating simplified, interpretable models tailored to individual emails. LIME identifies influential terms, pointing out key elements that drive classification results, thus reducing opacity inherent in conventional machine learning models. Additionally, we suggest a visualization scheme for displaying keywords that will improve understanding of categorization decisions by users. We test our method on a diverse email dataset and compare its performance with various baseline models, such as Gaussian Naive Bayes, Multinomial Naive Bayes, Bernoulli Naive Bayes, Support Vector Classifier, K-Nearest Neighbors, Decision Tree, and Logistic Regression. Our testing results show that our model surpasses all other models, achieving an accuracy of 96.59% and a precision of 99.12%.

Keywords: text classification, LIME (local interpretable model-agnostic explanations), stemming, tokenization, logistic regression.

Procedia PDF Downloads 47
3396 Breast Cancer Mortality and Comorbidities in Portugal: A Predictive Model Built with Real World Data

Authors: Cecília M. Antão, Paulo Jorge Nogueira

Abstract:

Breast cancer (BC) is the first cause of cancer mortality among Portuguese women. This retrospective observational study aimed at identifying comorbidities associated with BC female patients admitted to Portuguese public hospitals (2010-2018), investigating the effect of comorbidities on BC mortality rate, and building a predictive model using logistic regression. Results showed that the BC mortality in Portugal decreased in this period and reached 4.37% in 2018. Adjusted odds ratio indicated that secondary malignant neoplasms of liver, of bone and bone marrow, congestive heart failure, and diabetes were associated with an increased chance of dying from breast cancer. Although the Lisbon district (the most populated area) accounted for the largest percentage of BC patients, the logistic regression model showed that, besides patient’s age, being resident in Bragança, Castelo Branco, or Porto districts was directly associated with an increase of the mortality rate.

Keywords: breast cancer, comorbidities, logistic regression, adjusted odds ratio

Procedia PDF Downloads 87
3395 Relationship and Associated Factors of Breastfeeding Self-efficacy among Postpartum Couples in Malawi: A Cross-sectional Study

Authors: Roselyn Chipojola, Shu-yu Kuo

Abstract:

Background: Breastfeeding self-efficacy in both mothers and fathers play a crucial role in improving exclusive breastfeeding rates. However, less is known on the relationship and predictors of paternal and maternal breastfeeding self-efficacy. This study aimed to examine the relationship and associated factors of breastfeeding self-efficacy (BSE) among mothers and fathers in Malawi. Methods: A cross-sectional study was conducted on 180 pairs of postpartum mothers and fathers at a tertiary maternity facility in central Malawi. BSE was measured using the Breastfeeding Self-Efficacy Scale Short-Form. Depressive symptoms were assessed by the Edinburgh Postnatal Depression Scale. A structured questionnaire was used to collect demographic and health variables. Data were analyzed using multivariable logistic regression and multinomial logistic regression. Results: A higher score of self-efficacy was found in mothers (mean=55.7, Standard Deviation (SD) =6.5) compared to fathers (mean=50.2, SD=11.9). A significant association between paternal and maternal breastfeeding self-efficacy was found (r= 0. 32). Age, employment status, mode of birth was significantly related to maternal and paternal BSE, respectively. Older age and caesarean section delivery were significant factors of combined BSE scores in couples. A higher BSE score in either the mother or her partner predicted higher exclusive breastfeeding rates. BSE scores were lower when couples’ depressive symptoms were high. Conclusion: BSE are highly correlated between Malawian mothers and fathers, with a relatively higher score in maternal BSE. Importantly, a high BSE in couples predicted higher odds of exclusive breastfeeding, which highlights the need to include both mothers and fathers in future breastfeeding promotion strategies.

Keywords: paternal, maternal, exclusive breastfeeding, breastfeeding self‑efficacy, malawi

Procedia PDF Downloads 67
3394 A Hybrid Fuzzy Clustering Approach for Fertile and Unfertile Analysis

Authors: Shima Soltanzadeh, Mohammad Hosain Fazel Zarandi, Mojtaba Barzegar Astanjin

Abstract:

Diagnosis of male infertility by the laboratory tests is expensive and, sometimes it is intolerable for patients. Filling out the questionnaire and then using classification method can be the first step in decision-making process, so only in the cases with a high probability of infertility we can use the laboratory tests. In this paper, we evaluated the performance of four classification methods including naive Bayesian, neural network, logistic regression and fuzzy c-means clustering as a classification, in the diagnosis of male infertility due to environmental factors. Since the data are unbalanced, the ROC curves are most suitable method for the comparison. In this paper, we also have selected the more important features using a filtering method and examined the impact of this feature reduction on the performance of each methods; generally, most of the methods had better performance after applying the filter. We have showed that using fuzzy c-means clustering as a classification has a good performance according to the ROC curves and its performance is comparable to other classification methods like logistic regression.

Keywords: classification, fuzzy c-means, logistic regression, Naive Bayesian, neural network, ROC curve

Procedia PDF Downloads 336
3393 Comparative Study od Three Artificial Intelligence Techniques for Rain Domain in Precipitation Forecast

Authors: Nabilah Filzah Mohd Radzuan, Andi Putra, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Precipitation forecast is important to avoid natural disaster incident which can cause losses in the involved area. This paper reviews three techniques logistic regression, decision tree, and random forest which are used in making precipitation forecast. These combination techniques through the vector auto-regression (VAR) model help in finding the advantages and strengths of each technique in the forecast process. The data-set contains variables of the rain’s domain. Adaptation of artificial intelligence techniques involved in rain domain enables the forecast process to be easier and systematic for precipitation forecast.

Keywords: logistic regression, decisions tree, random forest, VAR model

Procedia PDF Downloads 446
3392 Comparative Analysis of Predictive Models for Customer Churn Prediction in the Telecommunication Industry

Authors: Deepika Christopher, Garima Anand

Abstract:

To determine the best model for churn prediction in the telecom industry, this paper compares 11 machine learning algorithms, namely Logistic Regression, Support Vector Machine, Random Forest, Decision Tree, XGBoost, LightGBM, Cat Boost, AdaBoost, Extra Trees, Deep Neural Network, and Hybrid Model (MLPClassifier). It also aims to pinpoint the top three factors that lead to customer churn and conducts customer segmentation to identify vulnerable groups. According to the data, the Logistic Regression model performs the best, with an F1 score of 0.6215, 81.76% accuracy, 68.95% precision, and 56.57% recall. The top three attributes that cause churn are found to be tenure, Internet Service Fiber optic, and Internet Service DSL; conversely, the top three models in this article that perform the best are Logistic Regression, Deep Neural Network, and AdaBoost. The K means algorithm is applied to establish and analyze four different customer clusters. This study has effectively identified customers that are at risk of churn and may be utilized to develop and execute strategies that lower customer attrition.

Keywords: attrition, retention, predictive modeling, customer segmentation, telecommunications

Procedia PDF Downloads 57
3391 An Information Matrix Goodness-of-Fit Test of the Conditional Logistic Model for Matched Case-Control Studies

Authors: Li-Ching Chen

Abstract:

The case-control design has been widely applied in clinical and epidemiological studies to investigate the association between risk factors and a given disease. The retrospective design can be easily implemented and is more economical over prospective studies. To adjust effects for confounding factors, methods such as stratification at the design stage and may be adopted. When some major confounding factors are difficult to be quantified, a matching design provides an opportunity for researchers to control the confounding effects. The matching effects can be parameterized by the intercepts of logistic models and the conditional logistic regression analysis is then adopted. This study demonstrates an information-matrix-based goodness-of-fit statistic to test the validity of the logistic regression model for matched case-control data. The asymptotic null distribution of this proposed test statistic is inferred. It needs neither to employ a simulation to evaluate its critical values nor to partition covariate space. The asymptotic power of this test statistic is also derived. The performance of the proposed method is assessed through simulation studies. An example of the real data set is applied to illustrate the implementation of the proposed method as well.

Keywords: conditional logistic model, goodness-of-fit, information matrix, matched case-control studies

Procedia PDF Downloads 292
3390 Investigating Income Diversification Strategies into Off-Farm Activities Among Rural Households in Ethiopia

Authors: Kibret Berhanu Getinet

Abstract:

Off-farm income diversification by farm rural households has gained the attention of researchers and policymakers due to the fact that agriculture failed to meet the needs of people in developing countries like Ethiopia. The objective of this study was to investigate income diversification strategies into off-farm activities among rural households in Hawassa Zuria Woreda, Sidama National Regional State, Ethiopia. The study used primary and secondary data sources for the primary data collection questionnaire employed as a data collection instrument. A multistage sampling technique was used to collect data from a total of 197 sample households from four kebeles of the study area. Descriptive statistics, as well as econometrics methods of data analysis, were employed. The descriptive statistics result indicates that the majority of sample rural households (68.53 %) have engaged in off-farm income diversification activities while the remaining 31.47% of households did not participate in the diversification in the study area. The choice of participants among the strategies indicates that 6.60% of respondents participated in off-farm wage employment, 30.46% participated in off-farm self-employment, and about 31.47% of them participated in both off-farm wage employment. The study revealed that the share of off-farm income in total annual earnings of households was about 48.457%, and thus, the off-farm diversification significantly contributes to the rural household income. Moreover, binary and multinomial logistic regression models were employed to identify factors that affect the participation and the choices of the off-farm income diversification strategies, respectively. The binary logit model result indicated that agro-ecological zone, education status of the households, available technical skills of the household, household saving, total livestock owned by the households, access to electricity, road access and being married of household head were significant and positively affected the chance of diversification in off-farm activities while the on-farm income of households is negatively affected the chance of diversification. Similarly, the multinomial logistic regression model estimate revealed that agroecological zone, on-farm income, available technical skills, household savings, and access to electricity are positively related and significantly influenced the household’s choice of employment into off-farm wage employment. The off-farm self-employment diversification choice is significantly influenced by on-farm income, available technical skills, household savings, total livestock owned, and access to electricity. Moreover, the result showed that the factors that affect the choice of farm households to engage in both off-farm wage and self-employment are ecological zone, education status, on-farm income, available technical skills, household own saving, market access, total livestock owned, access to electricity and road access. Thus, due attention should be given to addressing the demographic, socio-economic, and institutional constraints to strengthen off-farm income diversification strategies to improve the income of rural households.

Keywords: off-farm, incoem, diversification, logit model

Procedia PDF Downloads 53
3389 Minimizing the Impact of Covariate Detection Limit in Logistic Regression

Authors: Shahadut Hossain, Jacek Wesolowski, Zahirul Hoque

Abstract:

In many epidemiological and environmental studies covariate measurements are subject to the detection limit. In most applications, covariate measurements are usually truncated from below which is known as left-truncation. Because the measuring device, which we use to measure the covariate, fails to detect values falling below the certain threshold. In regression analyses, it causes inflated bias and inaccurate mean squared error (MSE) to the estimators. This paper suggests a response-based regression calibration method to correct the deleterious impact introduced by the covariate detection limit in the estimators of the parameters of simple logistic regression model. Compared to the maximum likelihood method, the proposed method is computationally simpler, and hence easier to implement. It is robust to the violation of distributional assumption about the covariate of interest. In producing correct inference, the performance of the proposed method compared to the other competing methods has been investigated through extensive simulations. A real-life application of the method is also shown using data from a population-based case-control study of non-Hodgkin lymphoma.

Keywords: environmental exposure, detection limit, left truncation, bias, ad-hoc substitution

Procedia PDF Downloads 236
3388 Measurement Errors and Misclassifications in Covariates in Logistic Regression: Bayesian Adjustment of Main and Interaction Effects and the Sample Size Implications

Authors: Shahadut Hossain

Abstract:

Measurement errors in continuous covariates and/or misclassifications in categorical covariates are common in epidemiological studies. Regression analysis ignoring such mismeasurements seriously biases the estimated main and interaction effects of covariates on the outcome of interest. Thus, adjustments for such mismeasurements are necessary. In this research, we propose a Bayesian parametric framework for eliminating deleterious impacts of covariate mismeasurements in logistic regression. The proposed adjustment method is unified and thus can be applied to any generalized linear and non-linear regression models. Furthermore, adjustment for covariate mismeasurements requires validation data usually in the form of either gold standard measurements or replicates of the mismeasured covariates on a subset of the study population. Initial investigation shows that adequacy of such adjustment depends on the sizes of main and validation samples, especially when prevalences of the categorical covariates are low. Thus, we investigate the impact of main and validation sample sizes on the adjusted estimates, and provide a general guideline about these sample sizes based on simulation studies.

Keywords: measurement errors, misclassification, mismeasurement, validation sample, Bayesian adjustment

Procedia PDF Downloads 408
3387 Poverty Dynamics in Thailand: Evidence from Household Panel Data

Authors: Nattabhorn Leamcharaskul

Abstract:

This study aims to examine determining factors of the dynamics of poverty in Thailand by using panel data of 3,567 households in 2007-2017. Four techniques of estimation are employed to analyze the situation of poverty across households and time periods: the multinomial logit model, the sequential logit model, the quantile regression model, and the difference in difference model. Households are categorized based on their experiences into 5 groups, namely chronically poor, falling into poverty, re-entering into poverty, exiting from poverty and never poor households. Estimation results emphasize the effects of demographic and socioeconomic factors as well as unexpected events on the economic status of a household. It is found that remittances have positive impact on household’s economic status in that they are likely to lower the probability of falling into poverty or trapping in poverty while they tend to increase the probability of exiting from poverty. In addition, not only receiving a secondary source of household income can raise the probability of being a never poor household, but it also significantly increases household income per capita of the chronically poor and falling into poverty households. Public work programs are recommended as an important tool to relieve household financial burden and uncertainty and thus consequently increase a chance for households to escape from poverty.

Keywords: difference in difference, dynamic, multinomial logit model, panel data, poverty, quantile regression, remittance, sequential logit model, Thailand, transfer

Procedia PDF Downloads 112
3386 Influence of HIV Testing on Knowledge of HIV/AIDS Prevention Practices and Transmission among Undergraduate Youths in North-West University, Mafikeng

Authors: Paul Bigala, Samuel Oladipo, Steven Adebowale

Abstract:

This study examines factors influencing knowledge of HIV/AIDS Prevention Practices and Transmission (KHAPPT) among young undergraduate students (15-24 years). Knowledge composite index was computed for 820 randomly selected students. Chi-square, ANOVA, and multinomial logistic regression were used for the analyses (α=.05). The overall mean knowledge score was 16.5±3.4 out of a possible score of 28. About 83% of the students have undergone HIV test, 21.0% have high KHAPPT, 18% said there is cure for the disease, 23% believed that asking for condom is embarrassing and 11.7% said it is safe to share unsterilized sharp objects with friends or family members. The likelihood of high KHAPPT was higher among students who have had HIV test (OR=3.314; C.I=1.787-6.145, p<0.001) even when other variables were used as control. The identified predictors of high KHAPPT were; ever had HIV test, faculty, and ever used any HIV/AIDS prevention services. North-West University Mafikeng should intensify efforts on the HIV/AIDS awareness program on the campus.

Keywords: HIV/AIDS knowledge, undergraduate students, HIV testing, Mafikeng

Procedia PDF Downloads 443
3385 Factors for Entry Timing Choices Using Principal Axis Factorial Analysis and Logistic Regression Model

Authors: C. M. Mat Isa, H. Mohd Saman, S. R. Mohd Nasir, A. Jaapar

Abstract:

International market expansion involves a strategic process of market entry decision through which a firm expands its operation from domestic to the international domain. Hence, entry timing choices require the needs to balance the early entry risks and the problems in losing opportunities as a result of late entry into a new market. Questionnaire surveys administered to 115 Malaysian construction firms operating in 51 countries worldwide have resulted in 39.1 percent response rate. Factor analysis was used to determine the most significant factors affecting entry timing choices of the firms to penetrate the international market. A logistic regression analysis used to examine the firms’ entry timing choices, indicates that the model has correctly classified 89.5 per cent of cases as late movers. The findings reveal that the most significant factor influencing the construction firms’ choices as late movers was the firm factor related to the firm’s international experience, resources, competencies and financing capacity. The study also offers valuable information to construction firms with intention to internationalize their businesses.

Keywords: factors, early movers, entry timing choices, late movers, logistic regression model, principal axis factorial analysis, Malaysian construction firms

Procedia PDF Downloads 378
3384 Applying the Regression Technique for ‎Prediction of the Acute Heart Attack ‎

Authors: Paria Soleimani, Arezoo Neshati

Abstract:

Myocardial infarction is one of the leading causes of ‎death in the world. Some of these deaths occur even before the patient ‎reaches the hospital. Myocardial infarction occurs as a result of ‎impaired blood supply. Because the most of these deaths are due to ‎coronary artery disease, hence the awareness of the warning signs of a ‎heart attack is essential. Some heart attacks are sudden and intense, but ‎most of them start slowly, with mild pain or discomfort, then early ‎detection and successful treatment of these symptoms is vital to save ‎them. Therefore, importance and usefulness of a system designing to ‎assist physicians in the early diagnosis of the acute heart attacks is ‎obvious.‎ The purpose of this study is to determine how well a predictive ‎model would perform based on the only patient-reportable clinical ‎history factors, without using diagnostic tests or physical exams. This ‎type of the prediction model might have application outside of the ‎hospital setting to give accurate advice to patients to influence them to ‎seek care in appropriate situations. For this purpose, the data were ‎collected on 711 heart patients in Iran hospitals. 28 attributes of clinical ‎factors can be reported by patients; were studied. Three logistic ‎regression models were made on the basis of the 28 features to predict ‎the risk of heart attacks. The best logistic regression model in terms of ‎performance had a C-index of 0.955 and with an accuracy of 94.9%. ‎The variables, severe chest pain, back pain, cold sweats, shortness of ‎breath, nausea, and vomiting were selected as the main features.‎

Keywords: Coronary heart disease, Acute heart attacks, Prediction, Logistic ‎regression‎

Procedia PDF Downloads 449
3383 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Artificial Neural network, Competitive dynamics, Logistic Regression, Text classification, Text mining

Procedia PDF Downloads 121
3382 The Impact of International Financial Reporting Standards (IFRS) Adoption on Performance’s Measure: A Study of UK Companies

Authors: Javad Izadi, Sahar Majioud

Abstract:

This study presents an approach of assessing the choice of performance measures of companies in the United Kingdom after the application of IFRS in 2005. The aim of this study is to investigate the effects of IFRS on the choice of performance evaluation methods for UK companies. We analyse through an econometric model the relationship of the dependent variable, the firm’s performance, which is a nominal variable with the independent ones. Independent variables are split into two main groups: the first one is the group of accounting-based measures: Earning per share, return on assets and return on equities. The second one is the group of market-based measures: market value of property plant and equipment, research and development, sales growth, market to book value, leverage, segment and size of companies. Concerning the regression used, it is a multinomial logistic regression performed on a sample of 130 UK listed companies. Our finding shows after IFRS adoption, and companies give more importance to some variables such as return on equities and sales growth to assess their performance, whereas the return on assets and market to book value ratio does not have as much importance as before IFRS in evaluating the performance of companies. Also, there are some variables that have no impact on the performance measures anymore, such as earning per share. This article finding is empirically important for business in subjects related to IFRS and companies’ performance measurement.

Keywords: performance’s Measure, nominal variable, econometric model, evaluation methods

Procedia PDF Downloads 138
3381 Logistic Regression Based Model for Predicting Students’ Academic Performance in Higher Institutions

Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu

Abstract:

In recent years, there has been a desire to forecast student academic achievement prior to graduation. This is to help them improve their grades, particularly for individuals with poor performance. The goal of this study is to employ supervised learning techniques to construct a predictive model for student academic achievement. Many academics have already constructed models that predict student academic achievement based on factors such as smoking, demography, culture, social media, parent educational background, parent finances, and family background, to name a few. This feature and the model employed may not have correctly classified the students in terms of their academic performance. This model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester as a prerequisite to predict if the student will perform well in future on related courses. The model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost, returning a 96.7% accuracy. This model is available as a desktop application, allowing both instructors and students to benefit from user-friendly interfaces for predicting student academic achievement. As a result, it is recommended that both students and professors use this tool to better forecast outcomes.

Keywords: artificial intelligence, ML, logistic regression, performance, prediction

Procedia PDF Downloads 97
3380 Binary Logistic Regression Model in Predicting the Employability of Senior High School Graduates

Authors: Cromwell F. Gopo, Joy L. Picar

Abstract:

This study aimed to predict the employability of senior high school graduates for S.Y. 2018- 2019 in the Davao del Norte Division through quantitative research design using the descriptive status and predictive approaches among the indicated parameters, namely gender, school type, academics, academic award recipient, skills, values, and strand. The respondents of the study were the 33 secondary schools offering senior high school programs identified through simple random sampling, which resulted in 1,530 cases of graduates’ secondary data, which were analyzed using frequency, percentage, mean, standard deviation, and binary logistic regression. Results showed that the majority of the senior high school graduates who come from large schools were females. Further, less than half of these graduates received any academic award in any semester. In general, the graduates’ performance in academics, skills, and values were proficient. Moreover, less than half of the graduates were not employed. Then, those who were employed were either contractual, casual, or part-time workers dominated by GAS graduates. Further, the predictors of employability were gender and the Information and Communications Technology (ICT) strand, while the remaining variables did not add significantly to the model. The null hypothesis had been rejected as the coefficients of the predictors in the binary logistic regression equation did not take the value of 0. After utilizing the model, it was concluded that Technical-Vocational-Livelihood (TVL) graduates except ICT had greater estimates of employability.

Keywords: employability, senior high school graduates, Davao del Norte, Philippines

Procedia PDF Downloads 152
3379 Determinants of Poverty: A Logit Regression Analysis of Zakat Applicants

Authors: Zunaidah Ab Hasan, Azhana Othman, Abd Halim Mohd Noor, Nor Shahrina Mohd Rafien

Abstract:

Zakat is a portion of wealth contributed from financially able Muslims to be distributed to predetermine recipients; main among them are the poor and the needy. Distribution of the zakat fund is given with the objective to lift the recipients from poverty. Due to the multidimensional and multifaceted nature of poverty, it is imperative that the causes of poverty are properly identified for assistance given by zakat authorities reached the intended target. Despite, various studies undertaken to identify the poor correctly, there are reports of the poor not receiving the adequate assistance required from zakat. Thus, this study examines the determinants of poverty among applicants for zakat assistance distributed by the State Islamic Religious Council in Malacca (SIRCM). Malacca is a state in Malaysia. The respondents were based on the list of names of new zakat applicants for the month of April and May 2014 provided by SIRCM. A binary logistic regression was estimated based on this data with either zakat applications is rejected or accepted as the dependent variable and set of demographic variables and health as the explanatory variables. Overall, the logistic model successfully predicted factors of acceptance of zakat applications. Three independent variables namely gender, age; size of households and health significantly explain the likelihood of a successful zakat application. Among others, the finding suggests the importance of focusing on providing education opportunity in helping the poor.

Keywords: logistic regression, zakat distribution, status of zakat applications, poverty, education

Procedia PDF Downloads 336
3378 Efficient Credit Card Fraud Detection Based on Multiple ML Algorithms

Authors: Neha Ahirwar

Abstract:

In the contemporary digital era, the rise of credit card fraud poses a significant threat to both financial institutions and consumers. As fraudulent activities become more sophisticated, there is an escalating demand for robust and effective fraud detection mechanisms. Advanced machine learning algorithms have become crucial tools in addressing this challenge. This paper conducts a thorough examination of the design and evaluation of a credit card fraud detection system, utilizing four prominent machine learning algorithms: random forest, logistic regression, decision tree, and XGBoost. The surge in digital transactions has opened avenues for fraudsters to exploit vulnerabilities within payment systems. Consequently, there is an urgent need for proactive and adaptable fraud detection systems. This study addresses this imperative by exploring the efficacy of machine learning algorithms in identifying fraudulent credit card transactions. The selection of random forest, logistic regression, decision tree, and XGBoost for scrutiny in this study is based on their documented effectiveness in diverse domains, particularly in credit card fraud detection. These algorithms are renowned for their capability to model intricate patterns and provide accurate predictions. Each algorithm is implemented and evaluated for its performance in a controlled environment, utilizing a diverse dataset comprising both genuine and fraudulent credit card transactions.

Keywords: efficient credit card fraud detection, random forest, logistic regression, XGBoost, decision tree

Procedia PDF Downloads 66
3377 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

Authors: Ghada A. Alfattni

Abstract:

Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates. 

Keywords: imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour

Procedia PDF Downloads 350
3376 Comparing Performance Indicators among Mechanistic, Organic, and Bureaucratic Organizations

Authors: Benchamat Laksaniyanon, Padcharee Phasuk, Rungtawan Boonphanakan

Abstract:

With globalization, organizations had to adjust to an unstable environment in order to survive in a competitive arena. Typically within the field of management, different types of organizations include mechanistic, bureaucratic and organic ones. In fact, bureaucratic and mechanistic organizations have some characteristics in common. Bureaucracy is one type of Thailand organization which adapted from mechanistic concept to develop an organization that is suitable for the characteristic and culture of Thailand. The objective of this study is to compare the adjustment strategies of both organizations in order to find key performance indicators (KPI) suitable for improving organization in Thailand. The methodology employed is binary logistic regression. The results of this study will be valuable for developing future management strategies for both bureaucratic and mechanistic organizations.

Keywords: mechanistic, bureaucratic and organic organization, binary logistic regression, key performance indicators (KPI)

Procedia PDF Downloads 359
3375 Can Empowering Women Farmers Reduce Household Food Insecurity? Evidence from Malawi

Authors: Christopher Manyamba

Abstract:

Women in Malawi produce perform between 50-70 percent of all agricultural tasks and yet the majority remain food insecure. The aim of his paper is to build on existing mixed evidence that indicates that empowering women in agriculture is conducive to improving food security. The WEAI is used to provide evidence on the relationship between women’s empowerment in agriculture and household food security. A multinomial logistic regression is applied to the Women Empowerment in Agriculture Index (WEAI) components and the Household Hunger Scale. The overall results show that the WEAI can be used to determine household food insecurity; however it has to be contextually adapted. Assets ownership, credit, group membership and leisure time are positively associated with food security. Contrary to other literature, empowerment in having control and decisions on income indicate negative association with household food security. These results could potentially better inform public, private and civil society stakeholders’ dialogues in creating the most effective and sustainable interventions to help women attain long-term food security.

Keywords: food security, gender, empowerment, agriculture index, framework for African food security, household hunger scale

Procedia PDF Downloads 368
3374 Business Constraints and Growth Potential of Smes: Case Study of Electrical Industry in Pakistan

Authors: Muhammad Waseem Akram

Abstract:

The current study attempts to analyze the impact of business constraints on the growth potential and performance of Small and Medium Enterprises (SMEs) in the electrical industry of Pakistan. Primary data have been utilized for the study collected from the electrical industry cluster in Sargodha, Pakistan. OLS regression is used to assess the impact of business constraints on the performance of SMEs by controlling the effect of Technology Level, Innovations, and Firm Size. To associate business constraints with the growth potential of SMEs, the study utilized Tetrachoric Correlation and Logistic Regression. Findings reveal that all the business constraints negatively affect the performance of SMEs in the electrical industry except Political Instability. Results of Tetrachoric Correlation show that all the business constraints are negatively correlated with the growth potential of SMEs. Logistic Regression results show that Energy Constraint, Inflation and Price Instability, and Bad Business Practices, all three business constraints cause to reduce the probability of income growth in sample SMEs.

Keywords: SMEs, business constraints, performance, growth potential

Procedia PDF Downloads 169
3373 Paraoxonase 1 (PON 1) Arylesterase Activity and Apolipoprotein B: Predictors of Myocardial Infarction

Authors: Mukund Ramchandra Mogarekar, Pankaj Kumar, Shraddha Vilas More

Abstract:

Background: Myocardial infarction (MI) is defined as myocardial cell death due to prolonged ischemia as a consequence of atherosclerosis. TC, low-density lipoprotein cholesterol (LDL-C), very low-density lipoprotein cholesterol (VLDL-C), Apo B, and lipoprotein(a) was found as atherogenic factors while high-density lipoprotein cholesterol (HDL-C) was anti-atherogenic. Methods and Results: The study group consists of 40, MI subjects and 40 healthy individuals in control group. PON 1 Arylesterase activity (ARE) was measured by using phenylacetate. Phenotyping was done by double substrate method, serum AOPP by using chloramine T and Apo B by Turbidimetric immunoassay. PON 1 ARE activities were significantly lower (p< 0.05) and AOPPs & Apo B were higher in MI subjects (p> 0.05). Trimodal distribution of QQ, QR, and RR phenotypes of study population showed no significant difference among cases and controls (p> 0.05). Univariate binary logistic regression analysis showed independent association of TC, HDL, LDL, AOPP, Apo B, and PON 1 ARE activity with MI and multiple forward binary logistic regression showed PON 1 ARE activity and serum Apo B as an independent predictor of MI. Conclusions: Decrease in PON 1 ARE activity in MI subjects than in controls suggests increased oxidative stress in MI which is reflected by significantly increased AOPP and Apo B. PON1 polymorphism of QQ, QR and RR showed no significant difference in protection against MI. Univariate and multiple binary logistic regression showed PON1 ARE activity and serum Apo B as an independent predictor of MI.

Keywords: advanced oxidation protein product, apolipoprotein B, PON 1 arylesterase activity, myocardial infarction

Procedia PDF Downloads 265
3372 Monocytic Paraoxonase 2 (PON 2) Lactonase Activity Is Related to Myocardial Infarction

Authors: Mukund Ramchandra Mogarekar, Pankaj Kumar, Shraddha V. More

Abstract:

Background: Total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), very low-density lipoprotein cholesterol (VLDL-C), Apo B, and lipoprotein(a) was found as atherogenic factors while high-density lipoprotein cholesterol (HDL-C) was anti-atherogenic. Methods and Results: The study group consists of 40 MI subjects as cases and 40 healthy as controls. Monocytic PON 2 Lactonase (LACT) activity was measured by using Dihydrocoumarine (DHC) as substrate. Phenotyping was done by method of Mogarekar MR et al, serum AOPP by modified method of Witko-Sarsat V et al and Apo B by Turbidimetric immunoassay. PON 2 LACT activities were significantly lower (p< 0.05) and AOPPs & Apo B were higher in MI subjects (p> 0.05). Trimodal distribution of QQ, QR & RR phenotypes of study population showed no significant difference among cases and controls (p> 0.05). Univariate binary logistic regression analysis showed independent association of TC, HDL, LDL, AOPP, Apo B, and PON 2 LACT activity with MI and multiple forward binary logistic regression showed PON 2 LACT activity and serum Apo B as an independent predictor of MI. Conclusions- Decrease in PON 2 LACT activity in MI subjects than in controls suggests increased oxidative stress in MI which is reflected by significantly increased AOPP and Apo B. PON 1 polymorphism of QQ, QR and RR showed no significant difference in protection against MI. Univariate and multiple forward binary logistic regression showed PON 2 LACT activity and serum Apo B as an independent predictor of MI.

Keywords: advanced oxidation protein products, apolipoprotein-B, myocardial infarction, paraoxonase 2 lactonase

Procedia PDF Downloads 237
3371 Prediction of Coronary Artery Stenosis Severity Based on Machine Learning Algorithms

Authors: Yu-Jia Jian, Emily Chia-Yu Su, Hui-Ling Hsu, Jian-Jhih Chen

Abstract:

Coronary artery is the major supplier of myocardial blood flow. When fat and cholesterol are deposit in the coronary arterial wall, narrowing and stenosis of the artery occurs, which may lead to myocardial ischemia and eventually infarction. According to the World Health Organization (WHO), estimated 740 million people have died of coronary heart disease in 2015. According to Statistics from Ministry of Health and Welfare in Taiwan, heart disease (except for hypertensive diseases) ranked the second among the top 10 causes of death from 2013 to 2016, and it still shows a growing trend. According to American Heart Association (AHA), the risk factors for coronary heart disease including: age (> 65 years), sex (men to women with 2:1 ratio), obesity, diabetes, hypertension, hyperlipidemia, smoking, family history, lack of exercise and more. We have collected a dataset of 421 patients from a hospital located in northern Taiwan who received coronary computed tomography (CT) angiography. There were 300 males (71.26%) and 121 females (28.74%), with age ranging from 24 to 92 years, and a mean age of 56.3 years. Prior to coronary CT angiography, basic data of the patients, including age, gender, obesity index (BMI), diastolic blood pressure, systolic blood pressure, diabetes, hypertension, hyperlipidemia, smoking, family history of coronary heart disease and exercise habits, were collected and used as input variables. The output variable of the prediction module is the degree of coronary artery stenosis. The output variable of the prediction module is the narrow constriction of the coronary artery. In this study, the dataset was randomly divided into 80% as training set and 20% as test set. Four machine learning algorithms, including logistic regression, stepwise regression, neural network and decision tree, were incorporated to generate prediction results. We used area under curve (AUC) / accuracy (Acc.) to compare the four models, the best model is neural network, followed by stepwise logistic regression, decision tree, and logistic regression, with 0.68 / 79 %, 0.68 / 74%, 0.65 / 78%, and 0.65 / 74%, respectively. Sensitivity of neural network was 27.3%, specificity was 90.8%, stepwise Logistic regression sensitivity was 18.2%, specificity was 92.3%, decision tree sensitivity was 13.6%, specificity was 100%, logistic regression sensitivity was 27.3%, specificity 89.2%. From the result of this study, we hope to improve the accuracy by improving the module parameters or other methods in the future and we hope to solve the problem of low sensitivity by adjusting the imbalanced proportion of positive and negative data.

Keywords: decision support, computed tomography, coronary artery, machine learning

Procedia PDF Downloads 229
3370 Using Machine-Learning Methods for Allergen Amino Acid Sequence's Permutations

Authors: Kuei-Ling Sun, Emily Chia-Yu Su

Abstract:

Allergy is a hypersensitive overreaction of the immune system to environmental stimuli, and a major health problem. These overreactions include rashes, sneezing, fever, food allergies, anaphylaxis, asthmatic, shock, or other abnormal conditions. Allergies can be caused by food, insect stings, pollen, animal wool, and other allergens. Their development of allergies is due to both genetic and environmental factors. Allergies involve immunoglobulin E antibodies, a part of the body’s immune system. Immunoglobulin E antibodies will bind to an allergen and then transfer to a receptor on mast cells or basophils triggering the release of inflammatory chemicals such as histamine. Based on the increasingly serious problem of environmental change, changes in lifestyle, air pollution problem, and other factors, in this study, we both collect allergens and non-allergens from several databases and use several machine learning methods for classification, including logistic regression (LR), stepwise regression, decision tree (DT) and neural networks (NN) to do the model comparison and determine the permutations of allergen amino acid’s sequence.

Keywords: allergy, classification, decision tree, logistic regression, machine learning

Procedia PDF Downloads 303
3369 Multinomial Dirichlet Gaussian Process Model for Classification of Multidimensional Data

Authors: Wanhyun Cho, Soonja Kang, Sanggoon Kim, Soonyoung Park

Abstract:

We present probabilistic multinomial Dirichlet classification model for multidimensional data and Gaussian process priors. Here, we have considered an efficient computational method that can be used to obtain the approximate posteriors for latent variables and parameters needed to define the multiclass Gaussian process classification model. We first investigated the process of inducing a posterior distribution for various parameters and latent function by using the variational Bayesian approximations and important sampling method, and next we derived a predictive distribution of latent function needed to classify new samples. The proposed model is applied to classify the synthetic multivariate dataset in order to verify the performance of our model. Experiment result shows that our model is more accurate than the other approximation methods.

Keywords: multinomial dirichlet classification model, Gaussian process priors, variational Bayesian approximation, importance sampling, approximate posterior distribution, marginal likelihood evidence

Procedia PDF Downloads 444
3368 Naïve Bayes: A Classical Approach for the Epileptic Seizures Recognition

Authors: Bhaveek Maini, Sanjay Dhanka, Surita Maini

Abstract:

Electroencephalography (EEG) is used to classify several epileptic seizures worldwide. It is a very crucial task for the neurologist to identify the epileptic seizure with manual EEG analysis, as it takes lots of effort and time. Human error is always at high risk in EEG, as acquiring signals needs manual intervention. Disease diagnosis using machine learning (ML) has continuously been explored since its inception. Moreover, where a large number of datasets have to be analyzed, ML is acting as a boon for doctors. In this research paper, authors proposed two different ML models, i.e., logistic regression (LR) and Naïve Bayes (NB), to predict epileptic seizures based on general parameters. These two techniques are applied to the epileptic seizures recognition dataset, available on the UCI ML repository. The algorithms are implemented on an 80:20 train test ratio (80% for training and 20% for testing), and the performance of the model was validated by 10-fold cross-validation. The proposed study has claimed accuracy of 81.87% and 95.49% for LR and NB, respectively.

Keywords: epileptic seizure recognition, logistic regression, Naïve Bayes, machine learning

Procedia PDF Downloads 61