Search results for: logistic regression models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9405

Search results for: logistic regression models

9015 Free Fatty Acid Assessment of Crude Palm Oil Using a Non-Destructive Approach

Authors: Siti Nurhidayah Naqiah Abdull Rani, Herlina Abdul Rahim, Rashidah Ghazali, Noramli Abdul Razak

Abstract:

Near infrared (NIR) spectroscopy has always been of great interest in the food and agriculture industries. The development of prediction models has facilitated the estimation process in recent years. In this study, 110 crude palm oil (CPO) samples were used to build a free fatty acid (FFA) prediction model. 60% of the collected data were used for training purposes and the remaining 40% used for testing. The visible peaks on the NIR spectrum were at 1725 nm and 1760 nm, indicating the existence of the first overtone of C-H bands. Principal component regression (PCR) was applied to the data in order to build this mathematical prediction model. The optimal number of principal components was 10. The results showed R2=0.7147 for the training set and R2=0.6404 for the testing set.

Keywords: palm oil, fatty acid, NIRS, regression

Procedia PDF Downloads 506
9014 Weighted Rank Regression with Adaptive Penalty Function

Authors: Kang-Mo Jung

Abstract:

The use of regularization for statistical methods has become popular. The least absolute shrinkage and selection operator (LASSO) framework has become the standard tool for sparse regression. However, it is well known that the LASSO is sensitive to outliers or leverage points. We consider a new robust estimation which is composed of the weighted loss function of the pairwise difference of residuals and the adaptive penalty function regulating the tuning parameter for each variable. Rank regression is resistant to regression outliers, but not to leverage points. By adopting a weighted loss function, the proposed method is robust to leverage points of the predictor variable. Furthermore, the adaptive penalty function gives us good statistical properties in variable selection such as oracle property and consistency. We develop an efficient algorithm to compute the proposed estimator using basic functions in program R. We used an optimal tuning parameter based on the Bayesian information criterion (BIC). Numerical simulation shows that the proposed estimator is effective for analyzing real data set and contaminated data.

Keywords: adaptive penalty function, robust penalized regression, variable selection, weighted rank regression

Procedia PDF Downloads 474
9013 An Application of Quantile Regression to Large-Scale Disaster Research

Authors: Katarzyna Wyka, Dana Sylvan, JoAnn Difede

Abstract:

Background and significance: The following disaster, population-based screening programs are routinely established to assess physical and psychological consequences of exposure. These data sets are highly skewed as only a small percentage of trauma-exposed individuals develop health issues. Commonly used statistical methodology in post-disaster mental health generally involves population-averaged models. Such models aim to capture the overall response to the disaster and its aftermath; however, they may not be sensitive enough to accommodate population heterogeneity in symptomatology, such as post-traumatic stress or depressive symptoms. Methods: We use an archival longitudinal data set from Weill-Cornell 9/11 Mental Health Screening Program established following the World Trade Center (WTC) terrorist attacks in New York in 2001. Participants are rescue and recovery workers who participated in the site cleanup and restoration (n=2960). The main outcome is the post-traumatic stress symptoms (PTSD) severity score assessed via clinician interviews (CAPS). For a detailed understanding of response to the disaster and its aftermath, we are adapting quantile regression methodology with particular focus on predictors of extreme distress and resilience to trauma. Results: The response variable was defined as the quantile of the CAPS score for each individual under two different scenarios specifying the unconditional quantiles based on: 1) clinically meaningful CAPS cutoff values and 2) CAPS distribution in the population. We present graphical summaries of the differential effects. For instance, we found that the effect of the WTC exposures, namely seeing bodies and feeling that life was in danger during rescue/recovery work was associated with very high PTSD symptoms. A similar effect was apparent in individuals with prior psychiatric history. Differential effects were also present for age and education level of the individuals. Conclusion: We evaluate the utility of quantile regression in disaster research in contrast to the commonly used population-averaged models. We focused on assessing the distribution of risk factors for post-traumatic stress symptoms across quantiles. This innovative approach provides a comprehensive understanding of the relationship between dependent and independent variables and could be used for developing tailored training programs and response plans for different vulnerability groups.

Keywords: disaster workers, post traumatic stress, PTSD, quantile regression

Procedia PDF Downloads 284
9012 Human Immunodeficiency Virus (HIV) Test Predictive Modeling and Identify Determinants of HIV Testing for People with Age above Fourteen Years in Ethiopia Using Data Mining Techniques: EDHS 2011

Authors: S. Abera, T. Gidey, W. Terefe

Abstract:

Introduction: Testing for HIV is the key entry point to HIV prevention, treatment, and care and support services. Hence, predictive data mining techniques can greatly benefit to analyze and discover new patterns from huge datasets like that of EDHS 2011 data. Objectives: The objective of this study is to build a predictive modeling for HIV testing and identify determinants of HIV testing for adults with age above fourteen years using data mining techniques. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes among adult Ethiopians. Decision tree, Naïve-Bayes, logistic regression and artificial neural networks of data mining techniques were used to build the predictive models. Results: The target dataset contained 30,625 study participants; of which 16, 515 (53.9%) were women. Nearly two-fifth; 17,719 (58%), have never been tested for HIV while the rest 12,906 (42%) had been tested. Ethiopians with higher wealth index, higher educational level, belonging 20 to 29 years old, having no stigmatizing attitude towards HIV positive person, urban residents, having HIV related knowledge, information about family planning on mass media and knowing a place where to get testing for HIV showed an increased patterns with respect to HIV testing. Conclusion and Recommendation: Public health interventions should consider the identified determinants to promote people to get testing for HIV.

Keywords: data mining, HIV, testing, ethiopia

Procedia PDF Downloads 496
9011 Prediction of Mechanical Strength of Multiscale Hybrid Reinforced Cementitious Composite

Authors: Salam Alrekabi, A. B. Cundy, Mohammed Haloob Al-Majidi

Abstract:

Novel multiscale hybrid reinforced cementitious composites based on carbon nanotubes (MHRCC-CNT), and carbon nanofibers (MHRCC-CNF) are new types of cement-based material fabricated with micro steel fibers and nanofilaments, featuring superior strain hardening, ductility, and energy absorption. This study focused on established models to predict the compressive strength, and direct and splitting tensile strengths of the produced cementitious composites. The analysis was carried out based on the experimental data presented by the previous author’s study, regression analysis, and the established models that available in the literature. The obtained models showed small differences in the predictions and target values with experimental verification indicated that the estimation of the mechanical properties could be achieved with good accuracy.

Keywords: multiscale hybrid reinforced cementitious composites, carbon nanotubes, carbon nanofibers, mechanical strength prediction

Procedia PDF Downloads 161
9010 Walmart Sales Forecasting using Machine Learning in Python

Authors: Niyati Sharma, Om Anand, Sanjeev Kumar Prasad

Abstract:

Assuming future sale value for any of the organizations is one of the major essential characteristics of tactical development. Walmart Sales Forecasting is the finest illustration to work with as a beginner; subsequently, it has the major retail data set. Walmart uses this sales estimate problem for hiring purposes also. We would like to analyzing how the internal and external effects of one of the largest companies in the US can walk out their Weekly Sales in the future. Demand forecasting is the planned prerequisite of products or services in the imminent on the basis of present and previous data and different stages of the market. Since all associations is facing the anonymous future and we do not distinguish in the future good demand. Hence, through exploring former statistics and recent market statistics, we envisage the forthcoming claim and building of individual goods, which are extra challenging in the near future. As a result of this, we are producing the required products in pursuance of the petition of the souk in advance. We will be using several machine learning models to test the exactness and then lastly, train the whole data by Using linear regression and fitting the training data into it. Accuracy is 8.88%. The extra trees regression model gives the best accuracy of 97.15%.

Keywords: random forest algorithm, linear regression algorithm, extra trees classifier, mean absolute error

Procedia PDF Downloads 149
9009 The Relationship between Self-Injurious Behavior and Manner of Death

Authors: Sait Ozsoy, Hacer Yasar Teke, Mustafa Dalgic, Cetin Ketenci, Ertugrul Gok, Kenan Karbeyaz, Azem Irez, Mesut Akyol

Abstract:

Self-mutilating behavior or self-injury behavior (SIB) is defined as: intentional harm to one’s body without intends to commit suicide”. SIB cases are commonly seen in psychiatry and forensic medicine practices. Despite variety of SIB methods, cuts in the skin is the most common (70-97%) injury in this group of patients. Subjects with SIB have one or more other comorbidities which include depression, anxiety, depersonalization, and feeling of worthlessness, borderline personality disorder, antisocial behaviors, and histrionic personality. These individuals feel a high level of hostility towards themselves and their surroundings. Researches have also revealed a strong relationship between antisocial personality disorder, criminal behavior, and SIB. This study has retrospectively evaluated 6,599 autopsy cases performed at forensic medicine institutes of six major cities (Ankara, Izmir, Diyarbakir, Erzurum, Trabzon, Eskisehir) of Turkey in 2013. The study group consisted of all cases with SIB findings (psychopathic cuts, cigarette burns, scars, and etc.). The relationship between causes of death in the study group (SIB subjects) and the control group was investigated. The control group was created from subjects without signs of SIB. Mann-Whitney U test was used for age variables and Chi-square test for categorical variables. Multinomial logistic regression analysis was used in order to analyze group differences in respect to manner of death (natural, accident, homicide, suicide) and analysis of risk factors associated with each group was determined by the Binomial logistic regression analysis. This study used SPSS statistics 15.0 for all its statistical and calculation needs. The statistical significance was p <0.05. There was no significant difference between accidental and natural death among the groups (p=0.737). Also there was a unit increase in number of cuts in psychopathic group while number of accidental death decreased (95% CI: 0.941-0.993) by 0.967 times (p=0.015). In contrast, there was a significant difference between suicidal and natural death (p<0.001), and also between homicidal and natural death (p=0.025). SIB is often seen with borderline and antisocial personality disorder but may be associated with many psychiatric illnesses. Studies have shown a relationship between antisocial personality disorders with criminal behavior and SIB with suicidal behavior. In our study, rate of suicide, murder and intoxication was higher compared to the control group. It could be concluded that SIB can be used as a predictor of possibility of one’s harm to him/herself and other people.

Keywords: autopsy, cause of death, forensic science, self-injury behaviour

Procedia PDF Downloads 510
9008 Arsenic Contamination in Drinking Water Is Associated with Dyslipidemia in Pregnancy

Authors: Begum Rokeya, Rahelee Zinnat, Fatema Jebunnesa, Israt Ara Hossain, A. Rahman

Abstract:

Background and Aims: Arsenic in drinking water is a global environmental health problem, and the exposure may increase dyslipidemia and cerebrovascular diseases mortalities, most likely through causing atherosclerosis. However, the mechanism of lipid metabolism, atherosclerosis formation, arsenic exposure and impact in pregnancy is still unclear. Recent epidemiological evidences indicate close association between inorganic arsenic exposure via drinking water and Dyslipidemia. However, the exact mechanism of this arsenic-mediated increase in atherosclerosis risk factors remains enigmatic. We explore the association of the effect of arsenic on serum lipid profile in pregnant subjects. Methods: A total 200 pregnant mother screened in this study from arsenic exposed area. Our study group included 100 exposed subjects were cases and 100 Non exposed healthy pregnant were controls requited by a cross-sectional study. Clinical and anthropometric measurements were done by standard techniques. Lipidemic status was assessed by enzymatic endpoint method. Urinary As was measured by inductively coupled plasma-mass spectrometry and adjusted with specific gravity and Arsenic exposure was assessed by the level of urinary arsenic level > 100 μg/L was categorized as arsenic exposed and < 100 μg/L were categorized as non-exposed. Multivariate logistic regression and Student’s t - test was used for statistical analysis. Results: Systolic and diastolic blood pressure both were significantly higher in the Arsenic exposed pregnant subjects compared to the Non-exposed group (p<0.001). Arsenic exposed subjects had 2 times higher chance of developing hypertensive pregnancy (Odds Ratio 2.2). In parallel to the findings in Ar exposed subjects showed significantly higher proportion of triglyceride and total cholesterol and low density of lipo protein when compare to non- arsenic exposed pregnant subjects. Significant correlation of urinary arsenic level was also found with SBP, DBP, TG, T chol and serum LDL-Cholesterol. On multivariate logistic regression showed urinary arsenic had a positive association with DBP, SBP, Triglyceride and LDL-c. Conclusion: In conclusion, arsenic exposure may induce dyslipidemia like atherosclerosis through modifying reverse cholesterol transport in cholesterol metabolism. For decreasing atherosclerosis related mortality associated with arsenic, preventing exposure from environmental sources in early life is an important element.

Keywords: Arsenic Exposure, Dyslipidemia, Gestational Diabetes Mellitus, Serum lipid profile

Procedia PDF Downloads 125
9007 The Prognostic Prediction Value of Positive Lymph Nodes Numbers for the Hypopharyngeal Squamous Cell Carcinoma

Authors: Wendu Pang, Yaxin Luo, Junhong Li, Yu Zhao, Danni Cheng, Yufang Rao, Minzi Mao, Ke Qiu, Yijun Dong, Fei Chen, Jun Liu, Jian Zou, Haiyang Wang, Wei Xu, Jianjun Ren

Abstract:

We aimed to compare the prognostic prediction value of positive lymph node number (PLNN) to the American Joint Committee on Cancer (AJCC) tumor, lymph node, and metastasis (TNM) staging system for patients with hypopharyngeal squamous cell carcinoma (HPSCC). A total of 826 patients with HPSCC from the Surveillance, Epidemiology, and End Results database (2004–2015) were identified and split into two independent cohorts: training (n=461) and validation (n=365). Univariate and multivariate Cox regression analyses were used to evaluate the prognostic effects of PLNN in patients with HPSCC. We further applied six Cox regression models to compare the survival predictive values of the PLNN and AJCC TNM staging system. PLNN showed a significant association with overall survival (OS) and cancer-specific survival (CSS) (P < 0.001) in both univariate and multivariable analyses, and was divided into three groups (PLNN 0, PLNN 1-5, and PLNN>5). In the training cohort, multivariate analysis revealed that the increased PLNN of HPSCC gave rise to significantly poor OS and CSS after adjusting for age, sex, tumor size, and cancer stage; this trend was also verified by the validation cohort. Additionally, the survival model incorporating a composite of PLNN and TNM classification (C-index, 0.705, 0.734) performed better than the PLNN and AJCC TNM models. PLNN can serve as a powerful survival predictor for patients with HPSCC and is a surrogate supplement for cancer staging systems.

Keywords: hypopharyngeal squamous cell carcinoma, positive lymph nodes number, prognosis, prediction models, survival predictive values

Procedia PDF Downloads 154
9006 The Perspective of Waste Frying Oil in São Paulo and Its Dimensions in the Reverse Logistics of the Production of Biodiesel

Authors: Max Filipe Goncalves, Alessandra Concilio, Rodrigo Shimada

Abstract:

The waste frying oil is highly pollutant when disposed incorrectly in the environment. Is necessary search of the Reverse Logistics to identify how can be structure to return the waste like this to productive chain and to be used in the new process. In this context, the objective of this paper is to analyze the perspective of the waste frying oil in São Paulo, and its dimensions in the production of biodiesel. Subjacent factors such as the agents, motivators and legal aspects were analyzed to demonstrate it. Then, the SWOT matrix was built with the aspects observed and the forces, weaknesses, opportunities and threats of the reverse logistic chain in São Paulo.

Keywords: biodiesel, perspective, reverse logistic, WFO

Procedia PDF Downloads 209
9005 A Generalized Weighted Loss for Support Vextor Classification and Multilayer Perceptron

Authors: Filippo Portera

Abstract:

Usually standard algorithms employ a loss where each error is the mere absolute difference between the true value and the prediction, in case of a regression task. In the present, we present several error weighting schemes that are a generalization of the consolidated routine. We study both a binary classification model for Support Vextor Classification and a regression net for Multylayer Perceptron. Results proves that the error is never worse than the standard procedure and several times it is better.

Keywords: loss, binary-classification, MLP, weights, regression

Procedia PDF Downloads 95
9004 Factor Associated with Uncertainty Undergoing Hematopoietic Stem Cell Transplantation

Authors: Sandra Adarve, Jhon Osorio

Abstract:

Uncertainty has been studied in patients with different types of cancer, except in patients with hematologic cancer and undergoing transplantation. The purpose of this study was to identify factors associated with uncertainty in adults patients with malignant hemato-oncology diseases who are scheduled to undergo hematopoietic stem cell transplantation based on Merle Mishel´s Uncertainty theory. This was a cross-sectional study with an analytical purpose. The study sample included 50 patients with leukemia, myeloma, and lymphoma selected by non-probability sampling by convenience and intention. Sociodemographic and clinical variables were measured. Mishel´s Scale of Uncertainty in Illness was used for the measurement of uncertainty. A bivariate and multivariate analyses were performed to explore the relationships and associations between the different variables and uncertainty level. For this analysis, the distribution of the uncertainty scale values was evaluated through the Shapiro-Wilk normality test to identify statistical tests to be used. A multivariate analysis was conducted through a logistic regression using step-by-step technique. Patients were 18-74 years old, with a mean age of 44.8. Over time, the disease course had a median of 9.5 months, an opportunity was found in the performance of the transplantation of < 20 days for 50% of the patients. Regarding the uncertainty scale, a mean score of 95.46 was identified. When the dimensions of the scale were analyzed, the mean score of the framework of stimuli was 25.6, of cognitive ability was 47.4 and structure providers was 22.8. Age was identified to correlate with the total uncertainty score (p=0.012). Additionally, a statistically significant difference was evidenced between different religious creeds and uncertainty score (p=0.023), education level (p=0.012), family history of cancer (p=0.001), the presence of comorbidities (p=0.023) and previous radiotherapy treatment (p=0.022). After performing logistic regression, previous radiotherapy treatment (OR=0.04 IC95% (0.004-0.48)) and family history of cancer (OR=30.7 IC95% (2.7-349)) were found to be factors associated with the high level of uncertainty. Uncertainty is present in high levels in patients who are going to be subjected to bone marrow transplantation, and it is the responsibility of the nurse to assess the levels of uncertainty and the presence of factors that may contribute to their presence. Once it has been valued, the uncertainty must be intervened from the identified associated factors, especially all those that have to do with the cognitive capacity. This implies the implementation and design of intervention strategies to improve the knowledge related to the disease and the therapeutic procedures to which the patients will be subjected. All interventions should favor the adaptation of these patients to their current experience and contribute to seeing uncertainty as an opportunity for growth and transcendence.

Keywords: hematopoietic stem cell transplantation, hematologic diseases, nursing, uncertainty

Procedia PDF Downloads 166
9003 Bayesian Value at Risk Forecast Using Realized Conditional Autoregressive Expectiel Mdodel with an Application of Cryptocurrency

Authors: Niya Chen, Jennifer Chan

Abstract:

In the financial market, risk management helps to minimize potential loss and maximize profit. There are two ways to assess risks; the first way is to calculate the risk directly based on the volatility. The most common risk measurements are Value at Risk (VaR), sharp ratio, and beta. Alternatively, we could look at the quantile of the return to assess the risk. Popular return models such as GARCH and stochastic volatility (SV) focus on modeling the mean of the return distribution via capturing the volatility dynamics; however, the quantile/expectile method will give us an idea of the distribution with the extreme return value. It will allow us to forecast VaR using return which is direct information. The advantage of using these non-parametric methods is that it is not bounded by the distribution assumptions from the parametric method. But the difference between them is that expectile uses a second-order loss function while quantile regression uses a first-order loss function. We consider several quantile functions, different volatility measures, and estimates from some volatility models. To estimate the expectile of the model, we use Realized Conditional Autoregressive Expectile (CARE) model with the bayesian method to achieve this. We would like to see if our proposed models outperform existing models in cryptocurrency, and we will test it by using Bitcoin mainly as well as Ethereum.

Keywords: expectile, CARE Model, CARR Model, quantile, cryptocurrency, Value at Risk

Procedia PDF Downloads 109
9002 Interference among Lambsquarters and Oil Rapeseed Cultivars

Authors: Reza Siyami, Bahram Mirshekari

Abstract:

Seed and oil yield of rapeseed is considerably affected by weeds interference including mustard (Sinapis arvensis L.), lambsquarters (Chenopodium album L.) and redroot pigweed (Amaranthus retroflexus L.) throughout the East Azerbaijan province in Iran. To formulate the relationship between four independent growth variables measured in our experiment with a dependent variable, multiple regression analysis was carried out for the weed leaves number per plant (X1), green cover percentage (X2), LAI (X3) and leaf area per plant (X4) as independent variables and rapeseed oil yield as a dependent variable. The multiple regression equation is shown as follows: Seed essential oil yield (kg/ha) = 0.156 + 0.0325 (X1) + 0.0489 (X2) + 0.0415 (X3) + 0.133 (X4). Furthermore, the stepwise regression analysis was also carried out for the data obtained to test the significance of the independent variables affecting the oil yield as a dependent variable. The resulted stepwise regression equation is shown as follows: Oil yield = 4.42 + 0.0841 (X2) + 0.0801 (X3); R2 = 81.5. The stepwise regression analysis verified that the green cover percentage and LAI of weed had a marked increasing effect on the oil yield of rapeseed.

Keywords: green cover percentage, independent variable, interference, regression

Procedia PDF Downloads 420
9001 Statistical Model to Examine the Impact of the Inflation Rate and Real Interest Rate on the Bahrain Economy

Authors: Ghada Abo-Zaid

Abstract:

Introduction: Oil is one of the most income source in Bahrain. Low oil price influence on the economy growth and the investment rate in Bahrain. For example, the economic growth was 3.7% in 2012, and it reduced to 2.9% in 2015. Investment rate was 9.8% in 2012, and it is reduced to be 5.9% and -12.1% in 2014 and 2015, respectively. The inflation rate is increased to the peak point in 2013 with 3.3 %. Objectives: The objectives here are to build statistical models to examine the effect of the interest rate inflation rate on the growth economy in Bahrain from 2000 to 2018. Methods: This study based on 18 years, and the multiple regression model is used for the analysis. All of the missing data are omitted from the analysis. Results: Regression model is used to examine the association between the Growth national product (GNP), the inflation rate, and real interest rate. We found that (i) Increase the real interest rate decrease the GNP. (ii) Increase the inflation rate does not effect on the growth economy in Bahrain since the average of the inflation rate was almost 2%, and this is considered as a low percentage. Conclusion: There is a positive impact of the real interest rate on the GNP in Bahrain. While the inflation rate does not show any negative influence on the GNP as the inflation rate was not large enough to effect negatively on the economy growth rate in Bahrain.

Keywords: growth national product, egypt, regression model, interest rate

Procedia PDF Downloads 164
9000 Association of Preoperative Pain Catastrophizing with Postoperative Pain after Lower Limb Trauma Surgery

Authors: Asish Subedi, Krishna Pokharel, Birendra Prasad Sah, Pashupati Chaudhary

Abstract:

Objectives: To evaluate an association between preoperative Nepali pain catastrophizing scale (N-PCS) scores and postoperative pain intensity and total opioid consumption. Methods: In this prospective cohort study we enrolled 135 patients with an American Society of Anaesthesiologists physical status I or II, aged between 18 and 65 years, and scheduled for surgery for lower-extremity fracture under spinal anaesthesia. Maximum postoperative pain reported during the 24 h was classified into two groups, no-mild pain group (Numeric rating scale [NRS] scores 1 to 3) and a moderate-severe pain group (NRS 4-10). The Spearman correlation coefficient was used to compare the association between the baseline N-PCS scores and outcome variables, i.e., the maximum NRS pain score and the total tramadol consumption within the first 24 h after surgery. Logistic regression models were used to identify the predictors for the intensity of postoperative pain. Results: As four patients violated the protocol, the data of 131 patients were analysed. Mean N-PCS scores reported by the moderate-severe pain group was 27.39 ±9.50 compared to 18.64 ±10 mean N-PCS scores by the no-mild pain group (p<0.001). Preoperative PCS scores correlated positively with postoperative pain intensity (r =0.39, [95% CI 0.23-0.52], p<0.001) and total tramadol consumption (r =0.32, [95% CI 0.16-0.47], p<0.001). An increase in catastrophizing scores was associated with postoperative moderate-severe pain (odds ratio, 1.08 [95% confidence interval, 1.02-1.15], p=0.006) after adjusting for gender, ethnicity and preoperative anxiety. Conclusion: Patients who reported higher pain catastrophizing preoperatively were at increased risk of experiencing moderate-severe postoperative pain.

Keywords: nepali, pain catastrophizing, postoperative pain, trauma

Procedia PDF Downloads 120
8999 Logistic Model Tree and Expectation-Maximization for Pollen Recognition and Grouping

Authors: Endrick Barnacin, Jean-Luc Henry, Jack Molinié, Jimmy Nagau, Hélène Delatte, Gérard Lebreton

Abstract:

Palynology is a field of interest for many disciplines. It has multiple applications such as chronological dating, climatology, allergy treatment, and even honey characterization. Unfortunately, the analysis of a pollen slide is a complicated and time-consuming task that requires the intervention of experts in the field, which is becoming increasingly rare due to economic and social conditions. So, the automation of this task is a necessity. Pollen slides analysis is mainly a visual process as it is carried out with the naked eye. That is the reason why a primary method to automate palynology is the use of digital image processing. This method presents the lowest cost and has relatively good accuracy in pollen retrieval. In this work, we propose a system combining recognition and grouping of pollen. It consists of using a Logistic Model Tree to classify pollen already known by the proposed system while detecting any unknown species. Then, the unknown pollen species are divided using a cluster-based approach. Success rates for the recognition of known species have been achieved, and automated clustering seems to be a promising approach.

Keywords: pollen recognition, logistic model tree, expectation-maximization, local binary pattern

Procedia PDF Downloads 182
8998 Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia

Authors: The Danh Phan

Abstract:

House price forecasting is a main topic in the real estate market research. Effective house price prediction models could not only allow home buyers and real estate agents to make better data-driven decisions but may also be beneficial for the property policymaking process. This study investigates the housing market by using machine learning techniques to analyze real historical house sale transactions in Australia. It seeks useful models which could be deployed as an application for house buyers and sellers. Data analytics show a high discrepancy between the house price in the most expensive suburbs and the most affordable suburbs in the city of Melbourne. In addition, experiments demonstrate that the combination of Stepwise and Support Vector Machine (SVM), based on the Mean Squared Error (MSE) measurement, consistently outperforms other models in terms of prediction accuracy.

Keywords: house price prediction, regression trees, neural network, support vector machine, stepwise

Procedia PDF Downloads 230
8997 Lean Models Classification: Towards a Holistic View

Authors: Y. Tiamaz, N. Souissi

Abstract:

The purpose of this paper is to present a classification of Lean models which aims to capture all the concepts related to this approach and thus facilitate its implementation. This classification allows the identification of the most relevant models according to several dimensions. From this perspective, we present a review and an analysis of Lean models literature and we propose dimensions for the classification of the current proposals while respecting among others the axes of the Lean approach, the maturity of the models as well as their application domains. This classification allowed us to conclude that researchers essentially consider the Lean approach as a toolbox also they design their models to solve problems related to a specific environment. Since Lean approach is no longer intended only for the automotive sector where it was invented, but to all fields (IT, Hospital, ...), we consider that this approach requires a generic model that is capable of being implemented in all areas.

Keywords: lean approach, lean models, classification, dimensions, holistic view

Procedia PDF Downloads 434
8996 A Framework for Auditing Multilevel Models Using Explainability Methods

Authors: Debarati Bhaumik, Diptish Dey

Abstract:

Multilevel models, increasingly deployed in industries such as insurance, food production, and entertainment within functions such as marketing and supply chain management, need to be transparent and ethical. Applications usually result in binary classification within groups or hierarchies based on a set of input features. Using open-source datasets, we demonstrate that popular explainability methods, such as SHAP and LIME, consistently underperform inaccuracy when interpreting these models. They fail to predict the order of feature importance, the magnitudes, and occasionally even the nature of the feature contribution (negative versus positive contribution to the outcome). Besides accuracy, the computational intractability of SHAP for binomial classification is a cause of concern. For transparent and ethical applications of these hierarchical statistical models, sound audit frameworks need to be developed. In this paper, we propose an audit framework for technical assessment of multilevel regression models focusing on three aspects: (i) model assumptions & statistical properties, (ii) model transparency using different explainability methods, and (iii) discrimination assessment. To this end, we undertake a quantitative approach and compare intrinsic model methods with SHAP and LIME. The framework comprises a shortlist of KPIs, such as PoCE (Percentage of Correct Explanations) and MDG (Mean Discriminatory Gap) per feature, for each of these three aspects. A traffic light risk assessment method is furthermore coupled to these KPIs. The audit framework will assist regulatory bodies in performing conformity assessments of AI systems using multilevel binomial classification models at businesses. It will also benefit businesses deploying multilevel models to be future-proof and aligned with the European Commission’s proposed Regulation on Artificial Intelligence.

Keywords: audit, multilevel model, model transparency, model explainability, discrimination, ethics

Procedia PDF Downloads 93
8995 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool

Procedia PDF Downloads 435
8994 A Cros Sectional Observational Study of Prescription Pattern of Gastro-Protective Drugs with Non-Steroidal Anti-Inflammatory Drugs in Nilgiris, India

Authors: B.S. Roopa

Abstract:

Objectives: To investigate the prevalence of concomitant use of GPDs in patients treated with NSAIDs and GPDs in recommended dose and frequency as prophylaxis. And also to know the association between risk factors and prescription of GPDs in patients treated with NSAIDs. Methods: Study was a prospective, observational, cross-sectional survey. Data from patients with prescription of NSAIDs at the out-patient departments of secondary care Hospital, Nilgiris, India were collected in a specially designed proforma for a period of 45 days. Analysis using χ2 tests for discrete variables. Factors that might be associated with prescription of GPD with NSIADs were assessed in multiple logistic regression models. Results: Three hundred and three patients were included in this study, and the rate of GPD prescription was 89.1%. Most of the patients received H2-receptor antagonist, and, to a lesser degree, antacid and proton pump inhibitor. Patients with history of GI ulcer/bleeding were much more likely to be co-prescribed GPD than those who had no history of GI disorders .Compared with patients who were managed in general outpatient clinic, those managed in Secondary care hospital in Nilgrisis, India were more likely to receive GPD. Conclusions: The prescription rate of GPD with NSAIDs is high. Patients were prescribed with H2RA with dose of 150mg twice daily, which are not effective in reducing the risk of NSAIDs induced gastric ulcer. Only the frequency of NSAIDs prescription was considered significant determinant for the co-prescription with GPAs in patients who are < 65 years and ≥ 65 years old.

Keywords: gastro protective agents, non steridol anti inlfammatory agents

Procedia PDF Downloads 296
8993 Predicting the Impact of Scope Changes on Project Cost and Schedule Using Machine Learning Techniques

Authors: Soheila Sadeghi

Abstract:

In the dynamic landscape of project management, scope changes are an inevitable reality that can significantly impact project performance. These changes, whether initiated by stakeholders, external factors, or internal project dynamics, can lead to cost overruns and schedule delays. Accurately predicting the consequences of these changes is crucial for effective project control and informed decision-making. This study aims to develop predictive models to estimate the impact of scope changes on project cost and schedule using machine learning techniques. The research utilizes a comprehensive dataset containing detailed information on project tasks, including the Work Breakdown Structure (WBS), task type, productivity rate, estimated cost, actual cost, duration, task dependencies, scope change magnitude, and scope change timing. Multiple machine learning models are developed and evaluated to predict the impact of scope changes on project cost and schedule. These models include Linear Regression, Decision Tree, Ridge Regression, Random Forest, Gradient Boosting, and XGBoost. The dataset is split into training and testing sets, and the models are trained using the preprocessed data. Cross-validation techniques are employed to assess the robustness and generalization ability of the models. The performance of the models is evaluated using metrics such as Mean Squared Error (MSE) and R-squared. Residual plots are generated to assess the goodness of fit and identify any patterns or outliers. Hyperparameter tuning is performed to optimize the XGBoost model and improve its predictive accuracy. The feature importance analysis reveals the relative significance of different project attributes in predicting the impact on cost and schedule. Key factors such as productivity rate, scope change magnitude, task dependencies, estimated cost, actual cost, duration, and specific WBS elements are identified as influential predictors. The study highlights the importance of considering both cost and schedule implications when managing scope changes. The developed predictive models provide project managers with a data-driven tool to proactively assess the potential impact of scope changes on project cost and schedule. By leveraging these insights, project managers can make informed decisions, optimize resource allocation, and develop effective mitigation strategies. The findings of this research contribute to improved project planning, risk management, and overall project success.

Keywords: cost impact, machine learning, predictive modeling, schedule impact, scope changes

Procedia PDF Downloads 39
8992 Using Machine Learning to Enhance Win Ratio for College Ice Hockey Teams

Authors: Sadixa Sanjel, Ahmed Sadek, Naseef Mansoor, Zelalem Denekew

Abstract:

Collegiate ice hockey (NCAA) sports analytics is different from the national level hockey (NHL). We apply and compare multiple machine learning models such as Linear Regression, Random Forest, and Neural Networks to predict the win ratio for a team based on their statistics. Data exploration helps determine which statistics are most useful in increasing the win ratio, which would be beneficial to coaches and team managers. We ran experiments to select the best model and chose Random Forest as the best performing. We conclude with how to bridge the gap between the college and national levels of sports analytics and the use of machine learning to enhance team performance despite not having a lot of metrics or budget for automatic tracking.

Keywords: NCAA, NHL, sports analytics, random forest, regression, neural networks, game predictions

Procedia PDF Downloads 114
8991 Spatial Pattern and Predictors of Malaria in Ethiopia: Application of Auto Logistics Spatial Regression

Authors: Melkamu A. Zeru, Yamral M. Warkaw, Aweke A. Mitku, Muluwerk Ayele

Abstract:

Introduction: Malaria is a severe health threat in the World, mainly in Africa. It is the major cause of health problems in which the risk of morbidity and mortality associated with malaria cases are characterized by spatial variations across the county. This study aimed to investigate the spatial patterns and predictors of malaria distribution in Ethiopia. Methods: A weighted sample of 15,239 individuals with rapid diagnosis tests was obtained from the Central Statistical Agency and Ethiopia malaria indicator survey of 2015. Global Moran's I and Moran scatter plots were used in determining the distribution of malaria cases, whereas the local Moran's I statistic was used in identifying exposed areas. In data manipulation, machine learning was used for variable reduction and statistical software R, Stata, and Python were used for data management and analysis. The auto logistics spatial binary regression model was used to investigate the predictors of malaria. Results: The final auto logistics regression model reported that male clients had a positive significant effect on malaria cases as compared to female clients [AOR=2.401, 95 % CI: (2.125 - 2.713)]. The distribution of malaria across the regions was different. The highest incidence of malaria was found in Gambela [AOR=52.55, 95%CI: (40.54-68.12)] followed by Beneshangul [AOR=34.95, 95%CI: (27.159 - 44.963)]. Similarly, individuals in Amhara [AOR=0.243, 95% CI:(0.1950.303],Oromiya[AOR=0.197,95%CI:(0.1580.244)],DireDawa[AOR=0.064,95%CI(0.049-0.082)],AddisAbaba[AOR=0.057,95%CI:(0.044-0.075)], Somali[AOR=0.077,95%CI:(0.059-0.097)], SNNPR[OR=0.329, 95%CI: (0.261- 0.413)] and Harari [AOR=0.256, 95%CI:(0.201 - 0.325)] were less likely to had low incidence of malaria as compared with Tigray. Furthermore, for a one-meter increase in altitude, the odds of a positive rapid diagnostic test (RDT) decrease by 1.6% [AOR = 0.984, 95% CI :( 0.984 - 0.984)]. The use of a shared toilet facility was found as a protective factor for malaria in Ethiopia [AOR=1.671, 95% CI: (1.504 - 1.854)]. The spatial autocorrelation variable changes the constant from AOR = 0.471 for logistic regression to AOR = 0.164 for auto logistics regression. Conclusions: This study found that the incidence of malaria in Ethiopia had a spatial pattern that is associated with socio-economic, demographic, and geographic risk factors. Spatial clustering of malaria cases had occurred in all regions, and the risk of clustering was different across the regions. The risk of malaria was found to be higher for those who live in soil floor-type houses as compared to those who live in cement or ceramics floor type. Similarly, households with thatched, metal and thin, and other roof-type houses have a higher risk of malaria than ceramic tiles roof houses. Moreover, using a protected anti-mosquito net reduced the risk of malaria incidence.

Keywords: malaria, Ethiopia, auto logistics, spatial model, spatial clustering

Procedia PDF Downloads 34
8990 A Survey on Quasi-Likelihood Estimation Approaches for Longitudinal Set-ups

Authors: Naushad Mamode Khan

Abstract:

The Com-Poisson (CMP) model is one of the most popular discrete generalized linear models (GLMS) that handles both equi-, over- and under-dispersed data. In longitudinal context, an integer-valued autoregressive (INAR(1)) process that incorporates covariate specification has been developed to model longitudinal CMP counts. However, the joint likelihood CMP function is difficult to specify and thus restricts the likelihood based estimating methodology. The joint generalized quasilikelihood approach (GQL-I) was instead considered but is rather computationally intensive and may not even estimate the regression effects due to a complex and frequently ill conditioned covariance structure. This paper proposes a new GQL approach for estimating the regression parameters (GQLIII) that are based on a single score vector representation. The performance of GQL-III is compared with GQL-I and separate marginal GQLs (GQL-II) through some simulation experiments and is proved to yield equally efficient estimates as GQL-I and is far more computationally stable.

Keywords: longitudinal, com-Poisson, ill-conditioned, INAR(1), GLMS, GQL

Procedia PDF Downloads 354
8989 A Regression Model for Residual-State Creep Failure

Authors: Deepak Raj Bhat, Ryuichi Yatabe

Abstract:

In this study, a residual-state creep failure model was developed based on the residual-state creep test results of clayey soils. To develop the proposed model, the regression analyses were done by using the R. The model results of the failure time (tf) and critical displacement (δc) were compared with experimental results and found in close agreements to each others. It is expected that the proposed regression model for residual-state creep failure will be more useful for the prediction of displacement of different clayey soils in the future.

Keywords: regression model, residual-state creep failure, displacement prediction, clayey soils

Procedia PDF Downloads 408
8988 Logistics Model for Improving Quality in Railway Transport

Authors: Eva Nedeliakova, Juraj Camaj, Jaroslav Masek

Abstract:

This contribution is focused on the methodology for identifying levels of quality and improving quality through new logistics model in railway transport. It is oriented on the application of dynamic quality models, which represent an innovative method of evaluation quality services. Through this conception, time factor, expected, and perceived quality in each moment of the transportation process within logistics chain can be taken into account. Various models describe the improvement of the quality which emphases the time factor throughout the whole transportation logistics chain. Quality of services in railway transport can be determined by the existing level of service quality, by detecting the causes of dissatisfaction employees but also customers, to uncover strengths and weaknesses. This new logistics model is able to recognize critical processes in logistic chain. It includes service quality rating that must respect its specific properties, which are unrepeatability, impalpability, their use right at the time they are provided and particularly changeability, which is significant factor in the conditions of rail transport as well. These peculiarities influence the quality of service regarding the constantly increasing requirements and that result in new ways of finding progressive attitudes towards the service quality rating.

Keywords: logistics model, quality, railway transport

Procedia PDF Downloads 568
8987 The Effect of Particle Porosity in Mixed Matrix Membrane Permeation Models

Authors: Z. Sadeghi, M. R. Omidkhah, M. E. Masoomi

Abstract:

The purpose of this paper is to examine gas transport behavior of mixed matrix membranes (MMMs) combined with porous particles. Main existing models are categorized in two main groups; two-phase (ideal contact) and three-phase (non-ideal contact). A new coefficient, J, was obtained to express equations for estimating effect of the particle porosity in two-phase and three-phase models. Modified models evaluates with existing models and experimental data using Matlab software. Comparison of gas permeability of proposed modified models with existing models in different MMMs shows a better prediction of gas permeability in MMMs.

Keywords: mixed matrix membrane, permeation models, porous particles, porosity

Procedia PDF Downloads 384
8986 Assessing and Identifying Factors Affecting Customers Satisfaction of Commercial Bank of Ethiopia: The Case of West Shoa Zone (Bako, Gedo, Ambo, Ginchi and Holeta), Ethiopia

Authors: Habte Tadesse Likassa, Bacha Edosa

Abstract:

Customer’s satisfaction was very important thing that is required for the existence of banks to be more productive and success in any organization and business area. The main goal of the study is assessing and identifying factors that influence customer’s satisfaction in West Shoa Zone of Commercial Bank of Ethiopia (Holeta, Ginchi, Ambo, Gedo and Bako). Stratified random sampling procedure was used in the study and by using simple random sampling (lottery method) 520 customers were drawn from the target population. By using Probability Proportional Size Techniques sample size for each branch of banks were allocated. Both descriptive and inferential statistics methods were used in the study. A binary logistic regression model was fitted to see the significance of factors affecting customer’s satisfaction in this study. SPSS statistical package was used for data analysis. The result of the study reveals that the overall level of customer’s satisfaction in the study area is low (38.85%) as compared those who were not satisfied (61.15%). The result of study showed that all most all factors included in the study were significantly associated with customer’s satisfaction. Therefore, it can be concluded that based on the comparison of branches on their customers satisfaction by using odd ratio customers who were using Ambo and Bako are less satisfied as compared to customers who were in Holeta branch. Additionally, customers who were in Ginchi and Gedo were more satisfied than that of customers who were in Holeta. Since the level of customers satisfaction was low in the study area, it is more advisable and recommended for concerned body works cooperatively more in maximizing satisfaction of their customers.

Keywords: customers, satisfaction, binary logistic, complain handling process, waiting time

Procedia PDF Downloads 464