Search results for: ridge regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3161

Search results for: ridge regression

3131 Model-Based Software Regression Test Suite Reduction

Authors: Shiwei Deng, Yang Bao

Abstract:

In this paper, we present a model-based regression test suite reducing approach that uses EFSM model dependence analysis and probability-driven greedy algorithm to reduce software regression test suites. The approach automatically identifies the difference between the original model and the modified model as a set of elementary model modifications. The EFSM dependence analysis is performed for each elementary modification to reduce the regression test suite, and then the probability-driven greedy algorithm is adopted to select the minimum set of test cases from the reduced regression test suite that cover all interaction patterns. Our initial experience shows that the approach may significantly reduce the size of regression test suites.

Keywords: dependence analysis, EFSM model, greedy algorithm, regression test

Procedia PDF Downloads 399
3130 Factors Affecting the Ultimate Compressive Strength of the Quaternary Calcarenites, North Western Desert, Egypt

Authors: M. A. Rashed, A. S. Mansour, H. Faris, W. Afify

Abstract:

The calcarenites carbonate rocks of the Quaternary ridges, which extend along the northwestern Mediterranean coastal plain of Egypt, represent an excellent model for the transformation of loose sediments to real sedimentary rocks by the different stages of meteoric diagenesis. The depositional and diagenetic fabrics of the rocks, in addition to the strata orientation, highly affect their ultimate compressive strength and other geotechnical properties. There is a marked increase in the compressive strength (UCS) from the first to the fourth ridge rock samples. The lowest values are related to the loose packing, weakly cemented aragonitic ooid sediments with high porosity, besides the irregularly distributed of cement, which result in decreasing the ability of these rocks to withstand crushing under direct pressure. The high (UCS) values are attributed to the low porosity, the presence of micritic cement, the reduction in grain size and the occurrence of micritization and calcretization processes. The strata orientation has a notable effect on the measured (UCS). The lowest values have been recorded for the samples cored in the inclined direction; whereas the highest values have been noticed in most samples cored in the vertical and parallel directions to bedding plane. In case of the inclined direction, the bedding planes were oriented close to the plane of maximum shear stress. The lowest and highest anisotropy values have been recorded for the first and the third ridges rock samples, respectively, which may attributed to the relatively homogeneity and well sorted grain-stone of the first ridge rock samples, and relatively heterogeneity in grain and pore size distribution and degree of cementation of the third ridge rock samples, besides, the abundance of shell fragments with intra-particle pore spaces, which may produce lines of weakness within the rock.

Keywords: compressive strength, anisotropy, calcarenites, Egypt

Procedia PDF Downloads 347
3129 Segmentation of Piecewise Polynomial Regression Model by Using Reversible Jump MCMC Algorithm

Authors: Suparman

Abstract:

Piecewise polynomial regression model is very flexible model for modeling the data. If the piecewise polynomial regression model is matched against the data, its parameters are not generally known. This paper studies the parameter estimation problem of piecewise polynomial regression model. The method which is used to estimate the parameters of the piecewise polynomial regression model is Bayesian method. Unfortunately, the Bayes estimator cannot be found analytically. Reversible jump MCMC algorithm is proposed to solve this problem. Reversible jump MCMC algorithm generates the Markov chain that converges to the limit distribution of the posterior distribution of piecewise polynomial regression model parameter. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of piecewise polynomial regression model.

Keywords: piecewise regression, bayesian, reversible jump MCMC, segmentation

Procedia PDF Downloads 342
3128 A Fuzzy Linear Regression Model Based on Dissemblance Index

Authors: Shih-Pin Chen, Shih-Syuan You

Abstract:

Fuzzy regression models are useful for investigating the relationship between explanatory variables and responses in fuzzy environments. To overcome the deficiencies of previous models and increase the explanatory power of fuzzy data, the graded mean integration (GMI) representation is applied to determine representative crisp regression coefficients. A fuzzy regression model is constructed based on the modified dissemblance index (MDI), which can precisely measure the actual total error. Compared with previous studies based on the proposed MDI and distance criterion, the results from commonly used test examples show that the proposed fuzzy linear regression model has higher explanatory power and forecasting accuracy.

Keywords: dissemblance index, fuzzy linear regression, graded mean integration, mathematical programming

Procedia PDF Downloads 408
3127 The Theory behind Logistic Regression

Authors: Jan Henrik Wosnitza

Abstract:

The logistic regression has developed into a standard approach for estimating conditional probabilities in a wide range of applications including credit risk prediction. The article at hand contributes to the current literature on logistic regression fourfold: First, it is demonstrated that the binary logistic regression automatically meets its model assumptions under very general conditions. This result explains, at least in part, the logistic regression's popularity. Second, the requirement of homoscedasticity in the context of binary logistic regression is theoretically substantiated. The variances among the groups of defaulted and non-defaulted obligors have to be the same across the level of the aggregated default indicators in order to achieve linear logits. Third, this article sheds some light on the question why nonlinear logits might be superior to linear logits in case of a small amount of data. Fourth, an innovative methodology for estimating correlations between obligor-specific log-odds is proposed. In order to crystallize the key ideas, this paper focuses on the example of credit risk prediction. However, the results presented in this paper can easily be transferred to any other field of application.

Keywords: correlation, credit risk estimation, default correlation, homoscedasticity, logistic regression, nonlinear logistic regression

Procedia PDF Downloads 394
3126 Study of Dermatoglyphics Pattern in Patient with Hypertension

Authors: Ajeevan Gautam, Gulam Anwer Khan, Pratibha Pokhrel

Abstract:

Introduction: Dermatoglyphics is the science which deals with the study of dermal ridge configuration on the digits, palms and soles. It is grooved by ridges and forms variety of configurations. The aim of the study was to identify dermal ridge patterns on fingertip of hypertensive patients and in normal population and to compare patterns among them. Methods: The subjects of the study were 130 hypertensives and 130 non-hypertensives cases of Kathmandu Valley aged between 40 to 80 years. Case history was recorded after consent finger prints were taken. Different parameters as whorl, loop, arch and composite patterns were studied and analysed. Result: It revealed, increased whorl pattern in hypertensive. It showed 65.69% whorl, 29.23% loop and 5.07% arch patterns in right hand of hypertensive people. In control, it was found to be 34.46% whorl, 58.15% loop and 5.38% arch patterns respectively. Similarly in left hand 63.69% whorl, 32% loop and 4.30% arch in hypertensive group. In control group it was 60.15% as loop, 35.69% as whorl and 15% as arch. Discussion: Based on findings of the result, it was concluded that the whorl, loop and arch patterns observed as 65.69%, 29.23% and 5.07% respectively in hypertensive cases in right hand. Similarly in left hand, it was found to be 4.30% as arch, 32% as loop and 63.69% as whorl patterns, but in normotensive subjects these patterns were recorded as 36.43%, 58.15%, 5.38% in right hand and 35.69%, 60.15%, 4.15% in left hand as whorl, loop and arch respectively.

Keywords: arch, dermatoglyphics, hypertension, loop, whorl

Procedia PDF Downloads 267
3125 Offline Signature Verification Using Minutiae and Curvature Orientation

Authors: Khaled Nagaty, Heba Nagaty, Gerard McKee

Abstract:

A signature is a behavioral biometric that is used for authenticating users in most financial and legal transactions. Signatures can be easily forged by skilled forgers. Therefore, it is essential to verify whether a signature is genuine or forged. The aim of any signature verification algorithm is to accommodate the differences between signatures of the same person and increase the ability to discriminate between signatures of different persons. This work presented in this paper proposes an automatic signature verification system to indicate whether a signature is genuine or not. The system comprises four phases: (1) The pre-processing phase in which image scaling, binarization, image rotation, dilation, thinning, and connecting ridge breaks are applied. (2) The feature extraction phase in which global and local features are extracted. The local features are minutiae points, curvature orientation, and curve plateau. The global features are signature area, signature aspect ratio, and Hu moments. (3) The post-processing phase, in which false minutiae are removed. (4) The classification phase in which features are enhanced before feeding it into the classifier. k-nearest neighbors and support vector machines are used. The classifier was trained on a benchmark dataset to compare the performance of the proposed offline signature verification system against the state-of-the-art. The accuracy of the proposed system is 92.3%.

Keywords: signature, ridge breaks, minutiae, orientation

Procedia PDF Downloads 121
3124 Jalovchat Gabbroic Intrusive of the Caucasus: Petrological Study, Geochemical Peculiarities and Formation Conditions

Authors: Giorgi Chichinadze, David Shengelia, Tamara Tsutsunava, Nikoloz Maisuradze, Giorgi Beridze

Abstract:

The Jalovchat intrusive is built up of hornblende gabbros, gabbro-norites and norites. Within the intrusive hornblende-bearing gabbro-pegmatites are widespread. That is a coarse-grained rock with gigantic hornblende crystals. By its unusual composition, the Jalovchat intrusive has no analogue in the Caucasus. However, petrologically and geochemically, the intrusive rocks were studied insufficiently. For comprehensive investigations, the authors applied appropriate methodologies: Microscopic study of thin sections, petro- and geochemical analyses of the samples and also different petrogenic, rare and rare earth elements diagrams and spidergrams. Analytical study established that the Jalovchat intrusive by its composition corresponds mainly to the mid-ocean ridge basalts and according to geodynamic type belongs to the subduction type. In general, it is an anomalous phenomenon, as in the rocks of such composition crystallization of hornblende and especially of its gigantic crystals is atypical. The authors believe that the water-rich magma reservoir, which was necessary for the crystallization of gigantic hornblende crystals, appeared as a result of melting of water-rich mid-ocean ridge basaltic rocks during the subduction process in Bajocian time.

Keywords: gabbro-pegmatite, intrusive, petrogenesis, petrogeochemistry, the Caucasus

Procedia PDF Downloads 177
3123 Model Averaging for Poisson Regression

Authors: Zhou Jianhong

Abstract:

Model averaging is a desirable approach to deal with model uncertainty, which, however, has rarely been explored for Poisson regression. In this paper, we propose a model averaging procedure based on an unbiased estimator of the expected Kullback-Leibler distance for the Poisson regression. Simulation study shows that the proposed model average estimator outperforms some other commonly used model selection and model average estimators in some situations. Our proposed methods are further applied to a real data example and the advantage of this method is demonstrated again.

Keywords: model averaging, poission regression, Kullback-Leibler distance, statistics

Procedia PDF Downloads 488
3122 Establishment of the Regression Uncertainty of the Critical Heat Flux Power Correlation for an Advanced Fuel Bundle

Authors: L. Q. Yuan, J. Yang, A. Siddiqui

Abstract:

A new regression uncertainty analysis methodology was applied to determine the uncertainties of the critical heat flux (CHF) power correlation for an advanced 43-element bundle design, which was developed by Canadian Nuclear Laboratories (CNL) to achieve improved economics, resource utilization and energy sustainability. The new methodology is considered more appropriate than the traditional methodology in the assessment of the experimental uncertainty associated with regressions. The methodology was first assessed using both the Monte Carlo Method (MCM) and the Taylor Series Method (TSM) for a simple linear regression model, and then extended successfully to a non-linear CHF power regression model (CHF power as a function of inlet temperature, outlet pressure and mass flow rate). The regression uncertainty assessed by MCM agrees well with that by TSM. An equation to evaluate the CHF power regression uncertainty was developed and expressed as a function of independent variables that determine the CHF power.

Keywords: CHF experiment, CHF correlation, regression uncertainty, Monte Carlo Method, Taylor Series Method

Procedia PDF Downloads 391
3121 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size

Authors: Jude Opara, Esemokumo Perewarebo Akpos

Abstract:

This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.

Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS

Procedia PDF Downloads 276
3120 Development and Validation of a Coronary Heart Disease Risk Score in Indian Type 2 Diabetes Mellitus Patients

Authors: Faiz N. K. Yusufi, Aquil Ahmed, Jamal Ahmad

Abstract:

Diabetes in India is growing at an alarming rate and the complications caused by it need to be controlled. Coronary heart disease (CHD) is one of the complications that will be discussed for prediction in this study. India has the second most number of diabetes patients in the world. To the best of our knowledge, there is no CHD risk score for Indian type 2 diabetes patients. Any form of CHD has been taken as the event of interest. A sample of 750 was determined and randomly collected from the Rajiv Gandhi Centre for Diabetes and Endocrinology, J.N.M.C., A.M.U., Aligarh, India. Collected variables include patients data such as sex, age, height, weight, body mass index (BMI), blood sugar fasting (BSF), post prandial sugar (PP), glycosylated haemoglobin (HbA1c), diastolic blood pressure (DBP), systolic blood pressure (SBP), smoking, alcohol habits, total cholesterol (TC), triglycerides (TG), high density lipoprotein (HDL), low density lipoprotein (LDL), very low density lipoprotein (VLDL), physical activity, duration of diabetes, diet control, history of antihypertensive drug treatment, family history of diabetes, waist circumference, hip circumference, medications, central obesity and history of CHD. Predictive risk scores of CHD events are designed by cox proportional hazard regression. Model calibration and discrimination is assessed from Hosmer Lemeshow and area under receiver operating characteristic (ROC) curve. Overfitting and underfitting of the model is checked by applying regularization techniques and best method is selected between ridge, lasso and elastic net regression. Youden’s index is used to choose the optimal cut off point from the scores. Five year probability of CHD is predicted by both survival function and Markov chain two state model and the better technique is concluded. The risk scores for CHD developed can be calculated by doctors and patients for self-control of diabetes. Furthermore, the five-year probabilities can be implemented as well to forecast and maintain the condition of patients.

Keywords: coronary heart disease, cox proportional hazard regression, ROC curve, type 2 diabetes Mellitus

Procedia PDF Downloads 189
3119 Use of Multistage Transition Regression Models for Credit Card Income Prediction

Authors: Denys Osipenko, Jonathan Crook

Abstract:

Because of the variety of the card holders’ behaviour types and income sources each consumer account can be transferred to a variety of states. Each consumer account can be inactive, transactor, revolver, delinquent, defaulted and requires an individual model for the income prediction. The estimation of transition probabilities between statuses at the account level helps to avoid the memorylessness of the Markov Chains approach. This paper investigates the transition probabilities estimation approaches to credit cards income prediction at the account level. The key question of empirical research is which approach gives more accurate results: multinomial logistic regression or multistage conditional logistic regression with binary target. Both models have shown moderate predictive power. Prediction accuracy for conditional logistic regression depends on the order of stages for the conditional binary logistic regression. On the other hand, multinomial logistic regression is easier for usage and gives integrate estimations for all states without priorities. Thus further investigations can be concentrated on alternative modeling approaches such as discrete choice models.

Keywords: multinomial regression, conditional logistic regression, credit account state, transition probability

Procedia PDF Downloads 460
3118 Internet Purchases in European Union Countries: Multiple Linear Regression Approach

Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić

Abstract:

This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analysed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analysed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.

Keywords: European union, Internet purchases, multiple linear regression model, outlier

Procedia PDF Downloads 277
3117 Optimization of Slider Crank Mechanism Using Design of Experiments and Multi-Linear Regression

Authors: Galal Elkobrosy, Amr M. Abdelrazek, Bassuny M. Elsouhily, Mohamed E. Khidr

Abstract:

Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3rd degree to 1st degree and suggested valid predictions and stable explanations.

Keywords: design of experiments, regression analysis, SI engine, statistical modeling

Procedia PDF Downloads 155
3116 An Epsilon Hierarchical Fuzzy Twin Support Vector Regression

Authors: Arindam Chaudhuri

Abstract:

The research presents epsilon- hierarchical fuzzy twin support vector regression (epsilon-HFTSVR) based on epsilon-fuzzy twin support vector regression (epsilon-FTSVR) and epsilon-twin support vector regression (epsilon-TSVR). Epsilon-FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon-TSVR which takes care of uncertainty existing in forecasting problems. Epsilon-FTSVR determines a pair of epsilon-insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon-FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon-FTSVR is then reformulated as epsilon-HFTSVR consisting of a set of hierarchical layers each containing epsilon-FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon-HFTSVR has remarkable generalization performance with minimum training time.

Keywords: regression, epsilon-TSVR, epsilon-FTSVR, epsilon-HFTSVR

Procedia PDF Downloads 332
3115 Nonparametric Truncated Spline Regression Model on the Data of Human Development Index in Indonesia

Authors: Kornelius Ronald Demu, Dewi Retno Sari Saputro, Purnami Widyaningsih

Abstract:

Human Development Index (HDI) is a standard measurement for a country's human development. Several factors may have influenced it, such as life expectancy, gross domestic product (GDP) based on the province's annual expenditure, the number of poor people, and the percentage of an illiterate people. The scatter plot between HDI and the influenced factors show that the plot does not follow a specific pattern or form. Therefore, the HDI's data in Indonesia can be applied with a nonparametric regression model. The estimation of the regression curve in the nonparametric regression model is flexible because it follows the shape of the data pattern. One of the nonparametric regression's method is a truncated spline. Truncated spline regression is one of the nonparametric approach, which is a modification of the segmented polynomial functions. The estimator of a truncated spline regression model was affected by the selection of the optimal knots point. Knot points is a focus point of spline truncated functions. The optimal knots point was determined by the minimum value of generalized cross validation (GCV). In this article were applied the data of Human Development Index with a truncated spline nonparametric regression model. The results of this research were obtained the best-truncated spline regression model to the HDI's data in Indonesia with the combination of optimal knots point 5-5-5-4. Life expectancy and the percentage of an illiterate people were the significant factors depend to the HDI in Indonesia. The coefficient of determination is 94.54%. This means the regression model is good enough to applied on the data of HDI in Indonesia.

Keywords: generalized cross validation (GCV), Human Development Index (HDI), knots point, nonparametric regression, truncated spline

Procedia PDF Downloads 305
3114 Identifying Diabetic Retinopathy Complication by Predictive Techniques in Indian Type 2 Diabetes Mellitus Patients

Authors: Faiz N. K. Yusufi, Aquil Ahmed, Jamal Ahmad

Abstract:

Predicting the risk of diabetic retinopathy (DR) in Indian type 2 diabetes patients is immensely necessary. India, being the second largest country after China in terms of a number of diabetic patients, to the best of our knowledge not a single risk score for complications has ever been investigated. Diabetic retinopathy is a serious complication and is the topmost reason for visual impairment across countries. Any type or form of DR has been taken as the event of interest, be it mild, back, grade I, II, III, and IV DR. A sample was determined and randomly collected from the Rajiv Gandhi Centre for Diabetes and Endocrinology, J.N.M.C., A.M.U., Aligarh, India. Collected variables include patients data such as sex, age, height, weight, body mass index (BMI), blood sugar fasting (BSF), post prandial sugar (PP), glycosylated haemoglobin (HbA1c), diastolic blood pressure (DBP), systolic blood pressure (SBP), smoking, alcohol habits, total cholesterol (TC), triglycerides (TG), high density lipoprotein (HDL), low density lipoprotein (LDL), very low density lipoprotein (VLDL), physical activity, duration of diabetes, diet control, history of antihypertensive drug treatment, family history of diabetes, waist circumference, hip circumference, medications, central obesity and history of DR. Cox proportional hazard regression is used to design risk scores for the prediction of retinopathy. Model calibration and discrimination are assessed from Hosmer Lemeshow and area under receiver operating characteristic curve (ROC). Overfitting and underfitting of the model are checked by applying regularization techniques and best method is selected between ridge, lasso and elastic net regression. Optimal cut off point is chosen by Youden’s index. Five-year probability of DR is predicted by both survival function, and Markov chain two state model and the better technique is concluded. The risk scores developed can be applied by doctors and patients themselves for self evaluation. Furthermore, the five-year probabilities can be applied as well to forecast and maintain the condition of patients. This provides immense benefit in real application of DR prediction in T2DM.

Keywords: Cox proportional hazard regression, diabetic retinopathy, ROC curve, type 2 diabetes mellitus

Procedia PDF Downloads 148
3113 Regression Model Evaluation on Depth Camera Data for Gaze Estimation

Authors: James Purnama, Riri Fitri Sari

Abstract:

We investigate the machine learning algorithm selection problem in the term of a depth image based eye gaze estimation, with respect to its essential difficulty in reducing the number of required training samples and duration time of training. Statistics based prediction accuracy are increasingly used to assess and evaluate prediction or estimation in gaze estimation. This article evaluates Root Mean Squared Error (RMSE) and R-Squared statistical analysis to assess machine learning methods on depth camera data for gaze estimation. There are 4 machines learning methods have been evaluated: Random Forest Regression, Regression Tree, Support Vector Machine (SVM), and Linear Regression. The experiment results show that the Random Forest Regression has the lowest RMSE and the highest R-Squared, which means that it is the best among other methods.

Keywords: gaze estimation, gaze tracking, eye tracking, kinect, regression model, orange python

Procedia PDF Downloads 508
3112 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities

Authors: Retius Chifurira

Abstract:

Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.

Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities

Procedia PDF Downloads 160
3111 The Extended Skew Gaussian Process for Regression

Authors: M. T. Alodat

Abstract:

In this paper, we propose a generalization to the Gaussian process regression(GPR) model called the extended skew Gaussian process for regression(ESGPr) model. The ESGPR model works better than the GPR model when the errors are skewed. We derive the predictive distribution for the ESGPR model at a new input. Also we apply the ESGPR model to FOREX data and we find that it fits the Forex data better than the GPR model.

Keywords: extended skew normal distribution, Gaussian process for regression, predictive distribution, ESGPr model

Procedia PDF Downloads 522
3110 Integrated Nested Laplace Approximations For Quantile Regression

Authors: Kajingulu Malandala, Ranganai Edmore

Abstract:

The asymmetric Laplace distribution (ADL) is commonly used as the likelihood function of the Bayesian quantile regression, and it offers different families of likelihood method for quantile regression. Notwithstanding their popularity and practicality, ADL is not smooth and thus making it difficult to maximize its likelihood. Furthermore, Bayesian inference is time consuming and the selection of likelihood may mislead the inference, as the Bayes theorem does not automatically establish the posterior inference. Furthermore, ADL does not account for greater skewness and Kurtosis. This paper develops a new aspect of quantile regression approach for count data based on inverse of the cumulative density function of the Poisson, binomial and Delaporte distributions using the integrated nested Laplace Approximations. Our result validates the benefit of using the integrated nested Laplace Approximations and support the approach for count data.

Keywords: quantile regression, Delaporte distribution, count data, integrated nested Laplace approximation

Procedia PDF Downloads 134
3109 The Use of Geographically Weighted Regression for Deforestation Analysis: Case Study in Brazilian Cerrado

Authors: Ana Paula Camelo, Keila Sanches

Abstract:

The Geographically Weighted Regression (GWR) was proposed in geography literature to allow relationship in a regression model to vary over space. In Brazil, the agricultural exploitation of the Cerrado Biome is the main cause of deforestation. In this study, we propose a methodology using geostatistical methods to characterize the spatial dependence of deforestation in the Cerrado based on agricultural production indicators. Therefore, it was used the set of exploratory spatial data analysis tools (ESDA) and confirmatory analysis using GWR. It was made the calibration a non-spatial model, evaluation the nature of the regression curve, election of the variables by stepwise process and multicollinearity analysis. After the evaluation of the non-spatial model was processed the spatial-regression model, statistic evaluation of the intercept and verification of its effect on calibration. In an analysis of Spearman’s correlation the results between deforestation and livestock was +0.783 and with soybeans +0.405. The model presented R²=0.936 and showed a strong spatial dependence of agricultural activity of soybeans associated to maize and cotton crops. The GWR is a very effective tool presenting results closer to the reality of deforestation in the Cerrado when compared with other analysis.

Keywords: deforestation, geographically weighted regression, land use, spatial analysis

Procedia PDF Downloads 330
3108 Sparse Modelling of Cancer Patients’ Survival Based on Genomic Copy Number Alterations

Authors: Khaled M. Alqahtani

Abstract:

Copy number alterations (CNA) are variations in the structure of the genome, where certain regions deviate from the typical two chromosomal copies. These alterations are pivotal in understanding tumor progression and are indicative of patients' survival outcomes. However, effectively modeling patients' survival based on their genomic CNA profiles while identifying relevant genomic regions remains a statistical challenge. Various methods, such as the Cox proportional hazard (PH) model with ridge, lasso, or elastic net penalties, have been proposed but often overlook the inherent dependencies between genomic regions, leading to results that are hard to interpret. In this study, we enhance the elastic net penalty by incorporating an additional penalty that accounts for these dependencies. This approach yields smooth parameter estimates and facilitates variable selection, resulting in a sparse solution. Our findings demonstrate that this method outperforms other models in predicting survival outcomes, as evidenced by our simulation study. Moreover, it allows for a more meaningful interpretation of genomic regions associated with patients' survival. We demonstrate the efficacy of our approach using both real data from a lung cancer cohort and simulated datasets.

Keywords: copy number alterations, cox proportional hazard, lung cancer, regression, sparse solution

Procedia PDF Downloads 14
3107 Weighted Rank Regression with Adaptive Penalty Function

Authors: Kang-Mo Jung

Abstract:

The use of regularization for statistical methods has become popular. The least absolute shrinkage and selection operator (LASSO) framework has become the standard tool for sparse regression. However, it is well known that the LASSO is sensitive to outliers or leverage points. We consider a new robust estimation which is composed of the weighted loss function of the pairwise difference of residuals and the adaptive penalty function regulating the tuning parameter for each variable. Rank regression is resistant to regression outliers, but not to leverage points. By adopting a weighted loss function, the proposed method is robust to leverage points of the predictor variable. Furthermore, the adaptive penalty function gives us good statistical properties in variable selection such as oracle property and consistency. We develop an efficient algorithm to compute the proposed estimator using basic functions in program R. We used an optimal tuning parameter based on the Bayesian information criterion (BIC). Numerical simulation shows that the proposed estimator is effective for analyzing real data set and contaminated data.

Keywords: adaptive penalty function, robust penalized regression, variable selection, weighted rank regression

Procedia PDF Downloads 432
3106 MapReduce Logistic Regression Algorithms with RHadoop

Authors: Byung Ho Jung, Dong Hoon Lim

Abstract:

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.

Keywords: big data, logistic regression, MapReduce, RHadoop

Procedia PDF Downloads 247
3105 A Generalized Weighted Loss for Support Vextor Classification and Multilayer Perceptron

Authors: Filippo Portera

Abstract:

Usually standard algorithms employ a loss where each error is the mere absolute difference between the true value and the prediction, in case of a regression task. In the present, we present several error weighting schemes that are a generalization of the consolidated routine. We study both a binary classification model for Support Vextor Classification and a regression net for Multylayer Perceptron. Results proves that the error is never worse than the standard procedure and several times it is better.

Keywords: loss, binary-classification, MLP, weights, regression

Procedia PDF Downloads 64
3104 Interference among Lambsquarters and Oil Rapeseed Cultivars

Authors: Reza Siyami, Bahram Mirshekari

Abstract:

Seed and oil yield of rapeseed is considerably affected by weeds interference including mustard (Sinapis arvensis L.), lambsquarters (Chenopodium album L.) and redroot pigweed (Amaranthus retroflexus L.) throughout the East Azerbaijan province in Iran. To formulate the relationship between four independent growth variables measured in our experiment with a dependent variable, multiple regression analysis was carried out for the weed leaves number per plant (X1), green cover percentage (X2), LAI (X3) and leaf area per plant (X4) as independent variables and rapeseed oil yield as a dependent variable. The multiple regression equation is shown as follows: Seed essential oil yield (kg/ha) = 0.156 + 0.0325 (X1) + 0.0489 (X2) + 0.0415 (X3) + 0.133 (X4). Furthermore, the stepwise regression analysis was also carried out for the data obtained to test the significance of the independent variables affecting the oil yield as a dependent variable. The resulted stepwise regression equation is shown as follows: Oil yield = 4.42 + 0.0841 (X2) + 0.0801 (X3); R2 = 81.5. The stepwise regression analysis verified that the green cover percentage and LAI of weed had a marked increasing effect on the oil yield of rapeseed.

Keywords: green cover percentage, independent variable, interference, regression

Procedia PDF Downloads 389
3103 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 46
3102 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool

Procedia PDF Downloads 402