Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 3198

Search results for: multi-variable regression

3168 Model Averaging for Poisson Regression

Abstract:

Model averaging is a desirable approach to deal with model uncertainty, which, however, has rarely been explored for Poisson regression. In this paper, we propose a model averaging procedure based on an unbiased estimator of the expected Kullback-Leibler distance for the Poisson regression. Simulation study shows that the proposed model average estimator outperforms some other commonly used model selection and model average estimators in some situations. Our proposed methods are further applied to a real data example and the advantage of this method is demonstrated again.

Keywords: model averaging, poission regression, Kullback-Leibler distance, statistics

Procedia PDF Downloads 509

3167 Establishment of the Regression Uncertainty of the Critical Heat Flux Power Correlation for an Advanced Fuel Bundle

Authors: L. Q. Yuan, J. Yang, A. Siddiqui

Abstract:

A new regression uncertainty analysis methodology was applied to determine the uncertainties of the critical heat flux (CHF) power correlation for an advanced 43-element bundle design, which was developed by Canadian Nuclear Laboratories (CNL) to achieve improved economics, resource utilization and energy sustainability. The new methodology is considered more appropriate than the traditional methodology in the assessment of the experimental uncertainty associated with regressions. The methodology was first assessed using both the Monte Carlo Method (MCM) and the Taylor Series Method (TSM) for a simple linear regression model, and then extended successfully to a non-linear CHF power regression model (CHF power as a function of inlet temperature, outlet pressure and mass flow rate). The regression uncertainty assessed by MCM agrees well with that by TSM. An equation to evaluate the CHF power regression uncertainty was developed and expressed as a function of independent variables that determine the CHF power.

Keywords: CHF experiment, CHF correlation, regression uncertainty, Monte Carlo Method, Taylor Series Method

Procedia PDF Downloads 408

3166 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size

Authors: Jude Opara, Esemokumo Perewarebo Akpos

Abstract:

This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.

Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS

Procedia PDF Downloads 299

3165 The Relationship between Proximity to Sources of Industrial-Related Outdoor Air Pollution and Children Emergency Department Visits for Asthma in the Census Metropolitan Area of Edmonton, Canada, 2004/2005 to 2009/2010

Authors: Laura A. Rodriguez-Villamizar, Alvaro Osornio-Vargas, Brian H. Rowe, Rhonda J. Rosychuk

Abstract:

Introduction/Objectives: The Census Metropolitan Area of Edmonton (CMAE) has important industrial emissions to the air from the Industrial Heartland Alberta (IHA) at the Northeast and the coal-fired power plants (CFPP) at the West. The objective of the study was to explore the presence of clusters of children asthma ED visits in the areas around the IHA and the CFPP. Methods: Retrospective data on children asthma ED visits was collected at the dissemination area (DA) level for children between 2 and 14 years of age, living in the CMAE between April 1, 2004, and March 31, 2010. We conducted a spatial analysis of disease clusters around putative sources with count (ecological) data using descriptive, hypothesis testing, and multivariable modeling analysis. Results: The mean crude rate of asthma ED visits was 9.3/1,000 children population per year during the study period. Circular spatial scan test for cases and events identified a cluster of children asthma ED visits in the DA where the CFPP are located in the Wabamum area. No clusters were identified around the IHA area. The multivariable models suggest that there is a significant decline in risk for children asthma ED visits as distance increases around the CFPP area this effect is modified at the SE direction with mean angle 125.58 degrees, where the risk increases with distance. In contrast, the regression models for IHA suggest that there is a significant increase in risk for children asthma ED visits as distance increases around the IHA area and this effect is modified at SW direction with mean angle 216.52 degrees, where the risk increases at shorter distances. Conclusions: Different methods for detecting clusters of disease consistently suggested the existence of a cluster of children asthma ED visits around the CFPP but not around the IHA within the CMAE. These results are probably explained by the direction of the air pollutants dispersion caused by the predominant and subdominant wind direction at each point. The use of different approaches to detect clusters of disease is valuable to have a better understanding of the presence, shape, direction and size of clusters of disease around pollution sources.

Keywords: air pollution, asthma, disease cluster, industry

Procedia PDF Downloads 280

3164 Cigarette Smoking and Alcohol Use among Mauritian Adolescents: Analysis of 2017 WHO Global School-Based Student Health Survey

Authors: Iyanujesu Adereti, Tajudeen Basiru, Ayodamola Olanipekun

Abstract:

Background: Substance abuse among adolescents is of public health concern globally. Despite being the most abused by adolescents, there are limited studies on the prevalence of alcohol use and cigarette smoking among adolescents in Mauritius. Objectives: To determine the prevalence of cigarette smoking, alcohol use and associated correlates among school-going adolescents in Mauritius. Methodology: Data obtained from 2017 WHO Global School-based Student Health Survey (GSHS) survey of 3,012 school-going adolescents in Mauritius was analyzed using STATA. Descriptive statistics were used to obtain prevalence. Bivariate and multivariate logistic regression analysis was used to evaluate predictors of cigarette smoking and alcohol use. Results: Prevalence of alcohol consumption and cigarette smoking were 26.0% and 17.1%, respectively. Smoking and alcohol use was more prevalent among males, younger adolescents, and those in higher school grades (p-value <.000). In multivariable logistic regression, male gender was associated with a higher risk of cigarette smoking (adjusted Odds Ratio (aOR) [95%Confidence Interval (CI)]= 1.51[1.06-2.14]) but lower risk of alcohol use (aOR[95%CI]= 0.69[0.53-0.90]) while older age (mid and late adolescence) and parental smoking were found to be associated with increased risk of alcohol use (aOR[95%CI]= 1.94[1.34-2.99] and 1.36[1.05-1.78] respectively). Marijuana use, truancy, being in a fight and suicide ideation were associated with increased odds of alcohol use (aOR[95%CI]= 3.82[3.39-6.09]; 2.15[1.62-2.87]; 1.83[1.34-2.49] and 1.93[1.38-2.69] respectively) and cigarette smoking (aOR[95%CI]= 17.28[10.4 - 28.51]; 1.73[1.21-2. 49]; 1.67[1.14-2.45] and 2.17[1.43-3.28] respectively) while involvement in sexual activity was associated with reduced risk of alcohol use (aOR[95%CI]= 0.50[0.37-0.68]) and cigarette smoking (aOR[95%CI]= 0.47[0.33-0.69]). Parental support and parental monitoring were uniquely associated with lower risk of cigarette smoking (aOR[95%CI]= 0.69[0.47-0.99] and 0.62[0.43-0.91] respectively). Conclusion: The high prevalence of alcohol use and cigarette smoking in this study shows the need for the government of Mauritius to enhance policies that will help address this issue putting into accounts the various risk and protective factors.

Keywords: adolescent health, alcohol use, cigarette smoking, global school-based student health survey

Procedia PDF Downloads 241

3163 Relationship and Associated Factors of Breastfeeding Self-efficacy among Postpartum Couples in Malawi: A Cross-sectional Study

Authors: Roselyn Chipojola, Shu-yu Kuo

Abstract:

Background: Breastfeeding self-efficacy in both mothers and fathers play a crucial role in improving exclusive breastfeeding rates. However, less is known on the relationship and predictors of paternal and maternal breastfeeding self-efficacy. This study aimed to examine the relationship and associated factors of breastfeeding self-efficacy (BSE) among mothers and fathers in Malawi. Methods: A cross-sectional study was conducted on 180 pairs of postpartum mothers and fathers at a tertiary maternity facility in central Malawi. BSE was measured using the Breastfeeding Self-Efficacy Scale Short-Form. Depressive symptoms were assessed by the Edinburgh Postnatal Depression Scale. A structured questionnaire was used to collect demographic and health variables. Data were analyzed using multivariable logistic regression and multinomial logistic regression. Results: A higher score of self-efficacy was found in mothers (mean=55.7, Standard Deviation (SD) =6.5) compared to fathers (mean=50.2, SD=11.9). A significant association between paternal and maternal breastfeeding self-efficacy was found (r= 0. 32). Age, employment status, mode of birth was significantly related to maternal and paternal BSE, respectively. Older age and caesarean section delivery were significant factors of combined BSE scores in couples. A higher BSE score in either the mother or her partner predicted higher exclusive breastfeeding rates. BSE scores were lower when couples’ depressive symptoms were high. Conclusion: BSE are highly correlated between Malawian mothers and fathers, with a relatively higher score in maternal BSE. Importantly, a high BSE in couples predicted higher odds of exclusive breastfeeding, which highlights the need to include both mothers and fathers in future breastfeeding promotion strategies.

Keywords: paternal, maternal, exclusive breastfeeding, breastfeeding self‑efficacy, malawi

Procedia PDF Downloads 63

3162 Use of Multistage Transition Regression Models for Credit Card Income Prediction

Authors: Denys Osipenko, Jonathan Crook

Abstract:

Because of the variety of the card holders’ behaviour types and income sources each consumer account can be transferred to a variety of states. Each consumer account can be inactive, transactor, revolver, delinquent, defaulted and requires an individual model for the income prediction. The estimation of transition probabilities between statuses at the account level helps to avoid the memorylessness of the Markov Chains approach. This paper investigates the transition probabilities estimation approaches to credit cards income prediction at the account level. The key question of empirical research is which approach gives more accurate results: multinomial logistic regression or multistage conditional logistic regression with binary target. Both models have shown moderate predictive power. Prediction accuracy for conditional logistic regression depends on the order of stages for the conditional binary logistic regression. On the other hand, multinomial logistic regression is easier for usage and gives integrate estimations for all states without priorities. Thus further investigations can be concentrated on alternative modeling approaches such as discrete choice models.

Keywords: multinomial regression, conditional logistic regression, credit account state, transition probability

Procedia PDF Downloads 478

3161 Semiparametric Regression Of Truncated Spline Biresponse On Farmer Loyalty And Attachment Modeling

Authors: Adji Achmad Rinaldo Fernandes

Abstract:

Regression analysis is a statistical method that is able to describe and predict causal relationships between individuals. Not all relationships have a known curve shape; often, there are relationship patterns that cannot be known in the shape of the curve; besides that, a cause can have an impact on more than one effect, so that between effects can also have a close relationship in it. Regression analysis that can be done to find out the relationship can be brought closer to the semiparametric regression of truncated spline biresponse. The purpose of this study is to examine the function estimator and determine the best model of truncated spline biresponse semiparametric regression. The results of the secondary data study showed that the best model with the highest order of quadratic and a maximum of two knots with a Goodness of fit value in the form of Adjusted R2 of 88.5%.

Keywords: biresponse, farmer attachment, farmer loyalty, truncated spline

Procedia PDF Downloads 27

3160 Internet Purchases in European Union Countries: Multiple Linear Regression Approach

Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić

Abstract:

This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analysed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analysed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.

Keywords: European union, Internet purchases, multiple linear regression model, outlier

Procedia PDF Downloads 296

3159 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Models

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the direct and indirect effects of variables in path models. One or more structural regression equations are used to estimate a series of parameters in path models to find the better fit of data. However, sometimes the assumptions of classical regression models, such as ordinary least squares (OLS), are violated by the nature of the data, resulting in insignificant direct and indirect effects of exogenous variables. This article aims to explore the effectiveness of a copula-based regression approach as an alternative to classical regression, specifically when variables are linked through an elliptical copula.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 30

3158 Optimization of Slider Crank Mechanism Using Design of Experiments and Multi-Linear Regression

Authors: Galal Elkobrosy, Amr M. Abdelrazek, Bassuny M. Elsouhily, Mohamed E. Khidr

Abstract:

Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3^rd degree to 1^stdegree and suggested valid predictions and stable explanations.

Keywords: design of experiments, regression analysis, SI engine, statistical modeling

Procedia PDF Downloads 178

3157 Determinants of Diarrhoea Prevalence Variations in Mountainous Informal Settlements of Kigali City, Rwanda

Authors: Dieudonne Uwizeye

Abstract:

Introduction: Diarrhoea is one of the major causes of morbidity and mortality among communities living in urban informal settlements of developing countries. It is assumed that mountainous environment introduces variations of the burden among residents of the same settlements. Design and Objective: A cross-sectional study was done in Kigali to explore the effect of mountainous informal settlements on diarrhoea risk variations. Data were collected among 1,152 households through household survey and transect walk to observe the status of sanitation. The outcome variable was the incidence of diarrhoea among household members of any age. The study used the most knowledgeable person in the household as the main respondent. Mostly this was the woman of the house as she was more likely to know the health status of every household member as she plays various roles: mother, wife, and head of the household among others. The analysis used cross tabulation and logistic regression analysis. Results: Results suggest that risks for diarrhoea vary depending on home location in the settlements. Diarrhoea risk increased as the distance from the road increased. The results of the logistic regression analysis indicate the adjusted odds ratio of 2.97 with 95% confidence interval being 1.35-6.55 and 3.50 adjusted odds ratio with 95% confidence interval being 1.61-7.60 in level two and three respectively compared with level one. The status of sanitation within and around homes was also significantly associated with the increase of diarrhoea. Equally, it is indicated that stable households were less likely to have diarrhoea. The logistic regression analysis indicated the adjusted odds ratio of 0.45 with 95% confidence interval being 0.25-0.81. However, the study did not find evidence for a significant association between diarrhoea risks and household socioeconomic status in the multivariable model. It is assumed that environmental factors in mountainous settings prevailed. Households using the available public water sources were more likely to have diarrhoea in their households. Recommendation: The study recommends the provision and extension of infrastructure for improved water, drainage, sanitation and wastes management facilities. Equally, studies should be done to identify the level of contamination and potential origin of contaminants for water sources in the valleys to adequately control the risks for diarrhoea in mountainous urban settings.

Keywords: urbanisation, diarrhoea risk, mountainous environment, urban informal settlements in Rwanda

Procedia PDF Downloads 164

3156 An Epsilon Hierarchical Fuzzy Twin Support Vector Regression

Authors: Arindam Chaudhuri

Abstract:

The research presents epsilon- hierarchical fuzzy twin support vector regression (epsilon-HFTSVR) based on epsilon-fuzzy twin support vector regression (epsilon-FTSVR) and epsilon-twin support vector regression (epsilon-TSVR). Epsilon-FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon-TSVR which takes care of uncertainty existing in forecasting problems. Epsilon-FTSVR determines a pair of epsilon-insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon-FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon-FTSVR is then reformulated as epsilon-HFTSVR consisting of a set of hierarchical layers each containing epsilon-FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon-HFTSVR has remarkable generalization performance with minimum training time.

Keywords: regression, epsilon-TSVR, epsilon-FTSVR, epsilon-HFTSVR

Procedia PDF Downloads 363

3155 The Prognostic Prediction Value of Positive Lymph Nodes Numbers for the Hypopharyngeal Squamous Cell Carcinoma

Authors: Wendu Pang, Yaxin Luo, Junhong Li, Yu Zhao, Danni Cheng, Yufang Rao, Minzi Mao, Ke Qiu, Yijun Dong, Fei Chen, Jun Liu, Jian Zou, Haiyang Wang, Wei Xu, Jianjun Ren

Abstract:

We aimed to compare the prognostic prediction value of positive lymph node number (PLNN) to the American Joint Committee on Cancer (AJCC) tumor, lymph node, and metastasis (TNM) staging system for patients with hypopharyngeal squamous cell carcinoma (HPSCC). A total of 826 patients with HPSCC from the Surveillance, Epidemiology, and End Results database (2004–2015) were identified and split into two independent cohorts: training (n=461) and validation (n=365). Univariate and multivariate Cox regression analyses were used to evaluate the prognostic effects of PLNN in patients with HPSCC. We further applied six Cox regression models to compare the survival predictive values of the PLNN and AJCC TNM staging system. PLNN showed a significant association with overall survival (OS) and cancer-specific survival (CSS) (P < 0.001) in both univariate and multivariable analyses, and was divided into three groups (PLNN 0, PLNN 1-5, and PLNN>5). In the training cohort, multivariate analysis revealed that the increased PLNN of HPSCC gave rise to significantly poor OS and CSS after adjusting for age, sex, tumor size, and cancer stage; this trend was also verified by the validation cohort. Additionally, the survival model incorporating a composite of PLNN and TNM classification (C-index, 0.705, 0.734) performed better than the PLNN and AJCC TNM models. PLNN can serve as a powerful survival predictor for patients with HPSCC and is a surrogate supplement for cancer staging systems.

Keywords: hypopharyngeal squamous cell carcinoma, positive lymph nodes number, prognosis, prediction models, survival predictive values

Procedia PDF Downloads 144

3154 Association of Maternal Age, Ethnicity and BMI with Gestational Diabetes Prevalence in Multi-Racial Singapore

Authors: Nur Atiqah Adam, Mor Jack Ng, Bernard Chern, Kok Hian Tan

Abstract:

Introduction: Gestational diabetes (GDM) is a common pregnancy complication with short and long-term health consequences for both mother and fetus. Factors such as family history of diabetes mellitus, maternal obesity, maternal age, ethnicity and parity have been reported to influence the risk of GDM. In a multi-racial country like Singapore, it is worthwhile to study the GDM prevalences of different ethnicities. We aim to investigate the influence of ethnicity on the racial prevalences of GDM in Singapore. This is important as it may help us to improve guidelines on GDM healthcare services according to significant risk factors unique to Singapore. Materials and Methods: Obstetric cohort data of 926 singleton deliveries in KK Women’s and Children’s Hospital (KKH) from 2011 to 2013 was obtained. Only patients aged 18 and above and without complicated pregnancies or chronic illnesses were targeted. Factors such as ethnicity, maternal age, parity and maternal body mass index (BMI) at booking visit were studied. A multivariable logistic regression model, adjusted for confounders, was used to determine which of these factors are significantly associated with an increased risk of GDM. Results: The overall GDM prevalence rate based on WHO 1999 criteria & at risk screening (race alone not a risk factor) was 8.86%. GDM rates were higher among women above 35 years old (15.96%), obese (15.15%) and multiparous women (10.12%). Indians had a higher GDM rate (13.0 %) compared to the Chinese (9.57%) and Malays (5.20%). However, using multiple logistic regression model, variables that are significantly related to GDM rates were maternal age (p < 0.001) and maternal BMI at booking visit (p = 0.006). Conclusion: Maternal age (p < 0.001) and maternal booking BMI (p = 0.006) are the strongest risk factors for GDM. Ethnicity per se does not seem to have a significant influence on the prevalence of GDM in Singapore (p = 0.064). Hence we should tailor guidelines on GDM healthcare services according to maternal age and booking BMI rather than ethnicity.

Keywords: ethnicity, gestational diabetes, healthcare, pregnancy

Procedia PDF Downloads 223

3153 Nonparametric Truncated Spline Regression Model on the Data of Human Development Index in Indonesia

Authors: Kornelius Ronald Demu, Dewi Retno Sari Saputro, Purnami Widyaningsih

Abstract:

Human Development Index (HDI) is a standard measurement for a country's human development. Several factors may have influenced it, such as life expectancy, gross domestic product (GDP) based on the province's annual expenditure, the number of poor people, and the percentage of an illiterate people. The scatter plot between HDI and the influenced factors show that the plot does not follow a specific pattern or form. Therefore, the HDI's data in Indonesia can be applied with a nonparametric regression model. The estimation of the regression curve in the nonparametric regression model is flexible because it follows the shape of the data pattern. One of the nonparametric regression's method is a truncated spline. Truncated spline regression is one of the nonparametric approach, which is a modification of the segmented polynomial functions. The estimator of a truncated spline regression model was affected by the selection of the optimal knots point. Knot points is a focus point of spline truncated functions. The optimal knots point was determined by the minimum value of generalized cross validation (GCV). In this article were applied the data of Human Development Index with a truncated spline nonparametric regression model. The results of this research were obtained the best-truncated spline regression model to the HDI's data in Indonesia with the combination of optimal knots point 5-5-5-4. Life expectancy and the percentage of an illiterate people were the significant factors depend to the HDI in Indonesia. The coefficient of determination is 94.54%. This means the regression model is good enough to applied on the data of HDI in Indonesia.

Keywords: generalized cross validation (GCV), Human Development Index (HDI), knots point, nonparametric regression, truncated spline

Procedia PDF Downloads 328

3152 Regression Model Evaluation on Depth Camera Data for Gaze Estimation

Authors: James Purnama, Riri Fitri Sari

Abstract:

We investigate the machine learning algorithm selection problem in the term of a depth image based eye gaze estimation, with respect to its essential difficulty in reducing the number of required training samples and duration time of training. Statistics based prediction accuracy are increasingly used to assess and evaluate prediction or estimation in gaze estimation. This article evaluates Root Mean Squared Error (RMSE) and R-Squared statistical analysis to assess machine learning methods on depth camera data for gaze estimation. There are 4 machines learning methods have been evaluated: Random Forest Regression, Regression Tree, Support Vector Machine (SVM), and Linear Regression. The experiment results show that the Random Forest Regression has the lowest RMSE and the highest R-Squared, which means that it is the best among other methods.

Keywords: gaze estimation, gaze tracking, eye tracking, kinect, regression model, orange python

Procedia PDF Downloads 531

3151 Epidemiological Investigation of Abortion in Ewes in Algeria

Authors: Laatra Zemmouri, Said Boukhechem, Samia Haffaf, Mohamed Lafri

Abstract:

A study was conducted in order to determine the prevalence and risk factors associated with abortion in ewes in the region of M’sila, located in central-eastern Algeria. A questionnaire was carried out to obtain information about the occurrence of abortion, sheep housing conditions, vaccination, feeding and management practices, and whether the farmers kept other livestock. This cross-sectional study was conducted for 36 months (between 2016 and 2019). A total of 71 sheep flocks were visited. Among 8168 ewes, we recorded 734 (8.99%) abortions and 3861 lambings. The risk factor analysis using multivariable logistic regression showed an association between abortion and vaccination against brucellosis (CI 95%= 2,76-1,35; p<0,001). Abortion decreased when dogs are owned (CI 95%= 0,36-0,84; p= 0.006), however, abortion increased with the presence of cats in farms (CI 95%= 1,24-2,8; p=0.003). There was a significant association between abortion and keeping goats (CI 95%= 1,18-2,40; p= 0.004), bovins (CI 95%= 0,3-0,68; p<0,001) and poultry CI 95%= 0,39-0,77; p= 0.001) in farms. Through this study, it is noticed that a strong association between the occurrence of abortion and estrus synchronization, stillbirth occurrence, and feed supplementation (p<0.05). Identification of the causes of abortion is an important task to reduce foetal losses and to improve livestock productivity.

Keywords: abortion, ewes, questionnaire, risk factors

Procedia PDF Downloads 219

3150 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities

Authors: Retius Chifurira

Abstract:

Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.

Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities

Procedia PDF Downloads 192

3149 The Extended Skew Gaussian Process for Regression

Authors: M. T. Alodat

Abstract:

In this paper, we propose a generalization to the Gaussian process regression(GPR) model called the extended skew Gaussian process for regression(ESGPr) model. The ESGPR model works better than the GPR model when the errors are skewed. We derive the predictive distribution for the ESGPR model at a new input. Also we apply the ESGPR model to FOREX data and we find that it fits the Forex data better than the GPR model.

Keywords: extended skew normal distribution, Gaussian process for regression, predictive distribution, ESGPr model

Procedia PDF Downloads 546

3148 Integrated Nested Laplace Approximations For Quantile Regression

Authors: Kajingulu Malandala, Ranganai Edmore

Abstract:

The asymmetric Laplace distribution (ADL) is commonly used as the likelihood function of the Bayesian quantile regression, and it offers different families of likelihood method for quantile regression. Notwithstanding their popularity and practicality, ADL is not smooth and thus making it difficult to maximize its likelihood. Furthermore, Bayesian inference is time consuming and the selection of likelihood may mislead the inference, as the Bayes theorem does not automatically establish the posterior inference. Furthermore, ADL does not account for greater skewness and Kurtosis. This paper develops a new aspect of quantile regression approach for count data based on inverse of the cumulative density function of the Poisson, binomial and Delaporte distributions using the integrated nested Laplace Approximations. Our result validates the benefit of using the integrated nested Laplace Approximations and support the approach for count data.

Keywords: quantile regression, Delaporte distribution, count data, integrated nested Laplace approximation

Procedia PDF Downloads 156

3147 The Use of Geographically Weighted Regression for Deforestation Analysis: Case Study in Brazilian Cerrado

Authors: Ana Paula Camelo, Keila Sanches

Abstract:

The Geographically Weighted Regression (GWR) was proposed in geography literature to allow relationship in a regression model to vary over space. In Brazil, the agricultural exploitation of the Cerrado Biome is the main cause of deforestation. In this study, we propose a methodology using geostatistical methods to characterize the spatial dependence of deforestation in the Cerrado based on agricultural production indicators. Therefore, it was used the set of exploratory spatial data analysis tools (ESDA) and confirmatory analysis using GWR. It was made the calibration a non-spatial model, evaluation the nature of the regression curve, election of the variables by stepwise process and multicollinearity analysis. After the evaluation of the non-spatial model was processed the spatial-regression model, statistic evaluation of the intercept and verification of its effect on calibration. In an analysis of Spearman’s correlation the results between deforestation and livestock was +0.783 and with soybeans +0.405. The model presented R²=0.936 and showed a strong spatial dependence of agricultural activity of soybeans associated to maize and cotton crops. The GWR is a very effective tool presenting results closer to the reality of deforestation in the Cerrado when compared with other analysis.

Keywords: deforestation, geographically weighted regression, land use, spatial analysis

Procedia PDF Downloads 354

3146 Social Participation and Associated Life Satisfaction among Older Adults in India: Moderating Role of Marital Status and Living Arrangements

Authors: Varsha Pandurang Nagargoje, K. S. James

Abstract:

Background: Social participation is considered as one of the central components of successful and healthy aging. This study aimed to examine the moderating role of marital status and living arrangement in the relationship between social participation and life satisfaction and other potential factors associated with life satisfaction of Indian older adults. Method: For analyses, the nationally representative study sample of 31,464 adults aged ≥60 years old was extracted from the Longitudinal Ageing Study in India (LASI) wave 1, 2017-18. Descriptive statistics and bivariate analysis have been performed to determine the proportion of life satisfaction. The first set of multivariable linear regression analyses examined Diener’s Satisfaction with Life Scale and its association with various predictor variables, including social participation, marital status, living arrangements, socio-demographic, economic, and health-related variables. Further, the second and third sets of regression investigated the moderating role of marital status and living arrangements respectively in the association of social participation and level of life satisfaction among Indian older adults. Results: Overall, the proportion of life satisfaction among older men was relatively higher than women counterparts in most background characteristics. Regression results stressed the importance of older adults’ involvement in social participation [β = 0.39, p < 0.05], being in marital union [β = 0.68, p < 0.001] and co-residential living arrangements either only with spouse [β = 1.73, p < 0.001] or with other family members [β = 2.18, p < 0.001] for the improvement of life satisfaction. Results also showed that some factors were significant for life satisfaction: in particular, increased age, having a higher level of educational status, MPCE quintile, and caste category. Higher risk of life dissatisfaction found among Indian older adults who were exposed to vulnerabilities like consuming tobacco, poor self-rated health, having difficulty in performing ADL and IADL were of major concern. The interaction effect of social participation with marital status or with living arrangements explained that currently married older individuals, and those older adults who were either co-residing with their spouse only or with other family members irrespective of their involvement in social participation remained an important modifiable factor for life satisfaction. Conclusion: It would be crucial for policymakers and practitioners to advocate social policy programs and service delivery oriented towards meaningful social connections, especially for those Indian older adults who were staying alone or currently not in the marital union to enhance their overall life satisfaction.

Keywords: Indian, older adults, social participation, life satisfaction, marital status, living arrangement

Procedia PDF Downloads 125

3145 Weighted Rank Regression with Adaptive Penalty Function

Authors: Kang-Mo Jung

Abstract:

The use of regularization for statistical methods has become popular. The least absolute shrinkage and selection operator (LASSO) framework has become the standard tool for sparse regression. However, it is well known that the LASSO is sensitive to outliers or leverage points. We consider a new robust estimation which is composed of the weighted loss function of the pairwise difference of residuals and the adaptive penalty function regulating the tuning parameter for each variable. Rank regression is resistant to regression outliers, but not to leverage points. By adopting a weighted loss function, the proposed method is robust to leverage points of the predictor variable. Furthermore, the adaptive penalty function gives us good statistical properties in variable selection such as oracle property and consistency. We develop an efficient algorithm to compute the proposed estimator using basic functions in program R. We used an optimal tuning parameter based on the Bayesian information criterion (BIC). Numerical simulation shows that the proposed estimator is effective for analyzing real data set and contaminated data.

Keywords: adaptive penalty function, robust penalized regression, variable selection, weighted rank regression

Procedia PDF Downloads 458

3144 MapReduce Logistic Regression Algorithms with RHadoop

Authors: Byung Ho Jung, Dong Hoon Lim

Abstract:

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.

Keywords: big data, logistic regression, MapReduce, RHadoop

Procedia PDF Downloads 271

3143 A Generalized Weighted Loss for Support Vextor Classification and Multilayer Perceptron

Authors: Filippo Portera

Abstract:

Usually standard algorithms employ a loss where each error is the mere absolute difference between the true value and the prediction, in case of a regression task. In the present, we present several error weighting schemes that are a generalization of the consolidated routine. We study both a binary classification model for Support Vextor Classification and a regression net for Multylayer Perceptron. Results proves that the error is never worse than the standard procedure and several times it is better.

Keywords: loss, binary-classification, MLP, weights, regression

Procedia PDF Downloads 86

3142 Interference among Lambsquarters and Oil Rapeseed Cultivars

Authors: Reza Siyami, Bahram Mirshekari

Abstract:

Seed and oil yield of rapeseed is considerably affected by weeds interference including mustard (Sinapis arvensis L.), lambsquarters (Chenopodium album L.) and redroot pigweed (Amaranthus retroflexus L.) throughout the East Azerbaijan province in Iran. To formulate the relationship between four independent growth variables measured in our experiment with a dependent variable, multiple regression analysis was carried out for the weed leaves number per plant (X1), green cover percentage (X2), LAI (X3) and leaf area per plant (X4) as independent variables and rapeseed oil yield as a dependent variable. The multiple regression equation is shown as follows: Seed essential oil yield (kg/ha) = 0.156 + 0.0325 (X1) + 0.0489 (X2) + 0.0415 (X3) + 0.133 (X4). Furthermore, the stepwise regression analysis was also carried out for the data obtained to test the significance of the independent variables affecting the oil yield as a dependent variable. The resulted stepwise regression equation is shown as follows: Oil yield = 4.42 + 0.0841 (X2) + 0.0801 (X3); R2 = 81.5. The stepwise regression analysis verified that the green cover percentage and LAI of weed had a marked increasing effect on the oil yield of rapeseed.

Keywords: green cover percentage, independent variable, interference, regression

Procedia PDF Downloads 412

3141 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 65

3140 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool

Procedia PDF Downloads 424

3139 A Hybrid Model Tree and Logistic Regression Model for Prediction of Soil Shear Strength in Clay

Authors: Ehsan Mehryaar, Seyed Armin Motahari Tabari

Abstract:

Without a doubt, soil shear strength is the most important property of the soil. The majority of fatal and catastrophic geological accidents are related to shear strength failure of the soil. Therefore, its prediction is a matter of high importance. However, acquiring the shear strength is usually a cumbersome task that might need complicated laboratory testing. Therefore, prediction of it based on common and easy to get soil properties can simplify the projects substantially. In this paper, A hybrid model based on the classification and regression tree algorithm and logistic regression is proposed where each leaf of the tree is an independent regression model. A database of 189 points for clay soil, including Moisture content, liquid limit, plastic limit, clay content, and shear strength, is collected. The performance of the developed model compared to the existing models and equations using root mean squared error and coefficient of correlation.

Keywords: model tree, CART, logistic regression, soil shear strength

Procedia PDF Downloads 188