Search results for: stepwise regression.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 760

Search results for: stepwise regression.

700 Forecasting the Sea Level Change in Strait of Hormuz

Authors: Hamid Goharnejad, Amir Hossein Eghbali

Abstract:

Recent investigations have demonstrated the global sea level rise due to climate change impacts. In this study, climate changes study the effects of increasing water level in the strait of Hormuz. The probable changes of sea level rise should be investigated to employ the adaption strategies. The climatic output data of a GCM (General Circulation Model) named CGCM3 under climate change scenario of A1b and A2 were used. Among different variables simulated by this model, those of maximum correlation with sea level changes in the study region and least redundancy among themselves were selected for sea level rise prediction by using stepwise regression. One of models (Discrete Wavelet artificial Neural Network) was developed to explore the relationship between climatic variables and sea level changes. In these models, wavelet was used to disaggregate the time series of input and output data into different components and then ANN was used to relate the disaggregated components of predictors and input parameters to each other. The results showed in the Shahid Rajae Station for scenario A1B sea level rise is among 64 to 75 cm and for the A2 Scenario sea level rise is among 90 t0 105 cm. Furthermore, the result showed a significant increase of sea level at the study region under climate change impacts, which should be incorporated in coastal areas management.

Keywords: Climate change scenarios, sea-level rise, strait of Hormuz, artificial neural network, fuzzy logic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2380
699 Economic Dispatch Fuzzy Linear Regression and Optimization

Authors: A. K. Al-Othman

Abstract:

This study presents a new approach based on Tanaka's fuzzy linear regression (FLP) algorithm to solve well-known power system economic load dispatch problem (ELD). Tanaka's fuzzy linear regression (FLP) formulation will be employed to compute the optimal solution of optimization problem after linearization. The unknowns are expressed as fuzzy numbers with a triangular membership function that has middle and spread value reflected on the unknowns. The proposed fuzzy model is formulated as a linear optimization problem, where the objective is to minimize the sum of the spread of the unknowns, subject to double inequality constraints. Linear programming technique is employed to obtain the middle and the symmetric spread for every unknown (power generation level). Simulation results of the proposed approach will be compared with those reported in literature.

Keywords: Economic Dispatch, Fuzzy Linear Regression (FLP)and Optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2254
698 Fuzzy Cost Support Vector Regression

Authors: Hadi Sadoghi Yazdi, Tahereh Royani, Mehri Sadoghi Yazdi, Sohrab Effati

Abstract:

In this paper, a new version of support vector regression (SVR) is presented namely Fuzzy Cost SVR (FCSVR). Individual property of the FCSVR is operation over fuzzy data whereas fuzzy cost (fuzzy margin and fuzzy penalty) are maximized. This idea admits to have uncertainty in the penalty and margin terms jointly. Robustness against noise is shown in the experimental results as a property of the proposed method and superiority relative conventional SVR.

Keywords: Support vector regression, Fuzzy input, Fuzzy cost.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1330
697 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: Landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate, Morocco.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 941
696 Small Sample Bootstrap Confidence Intervals for Long-Memory Parameter

Authors: Josu Arteche, Jesus Orbe

Abstract:

The log periodogram regression is widely used in empirical applications because of its simplicity, since only a least squares regression is required to estimate the memory parameter, d, its good asymptotic properties and its robustness to misspecification of the short term behavior of the series. However, the asymptotic distribution is a poor approximation of the (unknown) finite sample distribution if the sample size is small. Here the finite sample performance of different nonparametric residual bootstrap procedures is analyzed when applied to construct confidence intervals. In particular, in addition to the basic residual bootstrap, the local and block bootstrap that might adequately replicate the structure that may arise in the errors of the regression are considered when the series shows weak dependence in addition to the long memory component. Bias correcting bootstrap to adjust the bias caused by that structure is also considered. Finally, the performance of the bootstrap in log periodogram regression based confidence intervals is assessed in different type of models and how its performance changes as sample size increases.

Keywords: bootstrap, confidence interval, log periodogram regression, long memory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1699
695 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data

Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone

Abstract:

This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease dataset, the study successfully identified key factors, and the results were consistent with previous studies.

Keywords: Lyme disease, Poisson generalized linear model, Ridge regression, Lasso Regression, elastic net regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 49
694 A Study on Inference from Distance Variables in Hedonic Regression

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban area, several landmarks may affect housing price and rents, and hedonic analysis should employ distance variables corresponding to each landmarks. Unfortunately, the effects of distances to landmarks on housing prices are generally not consistent with the true price. These distance variables may cause magnitude error in regression, pointing a problem of spatial multicollinearity. In this paper, we provided some approaches for getting the samples with less bias and method on locating the specific sampling area to avoid the multicollinerity problem in two specific landmarks case.

Keywords: Landmarks, hedonic regression, distance variables, collinearity, multicollinerity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1852
693 Using Combination of Optimized Recurrent Neural Network with Design of Experiments and Regression for Control Chart Forecasting

Authors: R. Behmanesh, I. Rahimi

Abstract:

recurrent neural network (RNN) is an efficient tool for modeling production control process as well as modeling services. In this paper one RNN was combined with regression model and were employed in order to be checked whether the obtained data by the model in comparison with actual data, are valid for variable process control chart. Therefore, one maintenance process in workshop of Esfahan Oil Refining Co. (EORC) was taken for illustration of models. First, the regression was made for predicting the response time of process based upon determined factors, and then the error between actual and predicted response time as output and also the same factors as input were used in RNN. Finally, according to predicted data from combined model, it is scrutinized for test values in statistical process control whether forecasting efficiency is acceptable. Meanwhile, in training process of RNN, design of experiments was set so as to optimize the RNN.

Keywords: RNN, DOE, regression, control chart.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1623
692 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing

Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva

Abstract:

In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.

Keywords: Power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1169
691 Optimized Calculation of Hourly Price Forward Curve (HPFC)

Authors: Ahmed Abdolkhalig

Abstract:

This paper examines many mathematical methods for molding the hourly price forward curve (HPFC); the model will be constructed by numerous regression methods, like polynomial regression, radial basic function neural networks & a furrier series. Examination the models goodness of fit will be done by means of statistical & graphical tools. The criteria for choosing the model will depend on minimize the Root Mean Squared Error (RMSE), using the correlation analysis approach for the regression analysis the optimal model will be distinct, which are robust against model misspecification. Learning & supervision technique employed to determine the form of the optimal parameters corresponding to each measure of overall loss. By using all the numerical methods that mentioned previously; the explicit expressions for the optimal model derived and the optimal designs will be implemented.

Keywords: Forward curve, furrier series, regression, radial basic function neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4174
690 Principal Component Regression in Noninvasive Pineapple Soluble Solids Content Assessment Based On Shortwave Near Infrared Spectrum

Authors: K. S. Chia, H. Abdul Rahim, R. Abdul Rahim

Abstract:

The Principal component regression (PCR) is a combination of principal component analysis (PCA) and multiple linear regression (MLR). The objective of this paper is to revise the use of PCR in shortwave near infrared (SWNIR) (750-1000nm) spectral analysis. The idea of PCR was explained mathematically and implemented in the non-destructive assessment of the soluble solid content (SSC) of pineapple based on SWNIR spectral data. PCR achieved satisfactory results in this application with root mean squared error of calibration (RMSEC) of 0.7611 Brix°, coefficient of determination (R2) of 0.5865 and root mean squared error of crossvalidation (RMSECV) of 0.8323 Brix° with principal components (PCs) of 14.

Keywords: Pineapple, Shortwave near infrared, Principal component regression, Non-invasive measurement; Soluble solids content

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1987
689 Forecasting of Grape Juice Flavor by Using Support Vector Regression

Authors: Ren-Jieh Kuo, Chun-Shou Huang

Abstract:

The research of juice flavor forecasting has become more important in China. Due to the fast economic growth in China, many different kinds of juices have been introduced to the market. If a beverage company can understand their customers’ preference well, the juice can be served more attractive. Thus, this study intends to introducing the basic theory and computing process of grapes juice flavor forecasting based on support vector regression (SVR). Applying SVR, BPN, and LR to forecast the flavor of grapes juice in real data shows that SVR is more suitable and effective at predicting performance.

Keywords: Flavor forecasting, artificial neural networks, support vector regression, grape juice flavor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2176
688 Predictive Clustering Hybrid Regression(pCHR) Approach and Its Application to Sucrose-Based Biohydrogen Production

Authors: Nikhil, Ari Visa, Chin-Chao Chen, Chiu-Yue Lin, Jaakko A. Puhakka, Olli Yli-Harja

Abstract:

A predictive clustering hybrid regression (pCHR) approach was developed and evaluated using dataset from H2- producing sucrose-based bioreactor operated for 15 months. The aim was to model and predict the H2-production rate using information available about envirome and metabolome of the bioprocess. Selforganizing maps (SOM) and Sammon map were used to visualize the dataset and to identify main metabolic patterns and clusters in bioprocess data. Three metabolic clusters: acetate coupled with other metabolites, butyrate only, and transition phases were detected. The developed pCHR model combines principles of k-means clustering, kNN classification and regression techniques. The model performed well in modeling and predicting the H2-production rate with mean square error values of 0.0014 and 0.0032, respectively.

Keywords: Biohydrogen, bioprocess modeling, clusteringhybrid regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1730
687 A Study of Panel Logit Model and Adaptive Neuro-Fuzzy Inference System in the Prediction of Financial Distress Periods

Authors: Ε. Giovanis

Abstract:

The purpose of this paper is to present two different approaches of financial distress pre-warning models appropriate for risk supervisors, investors and policy makers. We examine a sample of the financial institutions and electronic companies of Taiwan Security Exchange (TSE) market from 2002 through 2008. We present a binary logistic regression with paned data analysis. With the pooled binary logistic regression we build a model including more variables in the regression than with random effects, while the in-sample and out-sample forecasting performance is higher in random effects estimation than in pooled regression. On the other hand we estimate an Adaptive Neuro-Fuzzy Inference System (ANFIS) with Gaussian and Generalized Bell (Gbell) functions and we find that ANFIS outperforms significant Logit regressions in both in-sample and out-of-sample periods, indicating that ANFIS is a more appropriate tool for financial risk managers and for the economic policy makers in central banks and national statistical services.

Keywords: ANFIS, Binary logistic regression, Financialdistress, Panel data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2305
686 Development of Regression Equation for Surface Finish and Analysis of Surface Integrity in EDM

Authors: Md. Ashikur Rahman Khan, M. M. Rahman

Abstract:

Electrical discharge machining (EDM) is a relatively modern machining process having distinct advantages over other machining processes and can machine Ti-alloys effectively. The present study emphasizes the features of the development of regression equation based on response surface methodology (RSM) for correlating the interactive and higher-order influences of machining parameters on surface finish of Titanium alloy Ti-6Al-4V. The process parameters selected in this study are discharge current, pulse on time, pulse off time and servo voltage. Machining has been accomplished using negative polarity of Graphite electrode. Analysis of variance is employed to ascertain the adequacy of the developed regression model. Experiments based on central composite of response surface method are carried out. Scanning electron microscopy (SEM) analysis was performed to investigate the surface topography of the EDMed job. The results evidence that the proposed regression equation can predict the surface roughness effectively. The lower ampere and short pulse on time yield better surface finish.

Keywords: Graphite electrode, regression model, response surface methodology, surface roughness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2509
685 Dichotomous Logistic Regression with Leave-One-Out Validation

Authors: Sin Yin Teh, Abdul Rahman Othman, Michael Boon Chong Khoo

Abstract:

In this paper, the concepts of dichotomous logistic regression (DLR) with leave-one-out (L-O-O) were discussed. To illustrate this, the L-O-O was run to determine the importance of the simulation conditions for robust test of spread procedures with good Type I error rates. The resultant model was then evaluated. The discussions included 1) assessment of the accuracy of the model, and 2) parameter estimates. These were presented and illustrated by modeling the relationship between the dichotomous dependent variable (Type I error rates) with a set of independent variables (the simulation conditions). The base SAS software containing PROC LOGISTIC and DATA step functions can be making used to do the DLR analysis.

Keywords: Dichotomous logistic regression, leave-one-out, testof spread.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2024
684 A General Regression Test Selection Technique

Authors: Walid S. Abd El-hamid, Sherif S. El-etriby, Mohiy M. Hadhoud

Abstract:

This paper presents a new methodology to select test cases from regression test suites. The selection strategy is based on analyzing the dynamic behavior of the applications that written in any programming language. Methods based on dynamic analysis are more safe and efficient. We design a technique that combine the code based technique and model based technique, to allow comparing the object oriented of an application that written in any programming language. We have developed a prototype tool that detect changes and select test cases from test suite.

Keywords: Regression testing, Model based testing, Dynamicbehavior.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1934
683 Detecting Earnings Management via Statistical and Neural Network Techniques

Authors: Mohammad Namazi, Mohammad Sadeghzadeh Maharluie

Abstract:

Predicting earnings management is vital for the capital market participants, financial analysts and managers. The aim of this research is attempting to respond to this query: Is there a significant difference between the regression model and neural networks’ models in predicting earnings management, and which one leads to a superior prediction of it? In approaching this question, a Linear Regression (LR) model was compared with two neural networks including Multi-Layer Perceptron (MLP), and Generalized Regression Neural Network (GRNN). The population of this study includes 94 listed companies in Tehran Stock Exchange (TSE) market from 2003 to 2011. After the results of all models were acquired, ANOVA was exerted to test the hypotheses. In general, the summary of statistical results showed that the precision of GRNN did not exhibit a significant difference in comparison with MLP. In addition, the mean square error of the MLP and GRNN showed a significant difference with the multi variable LR model. These findings support the notion of nonlinear behavior of the earnings management. Therefore, it is more appropriate for capital market participants to analyze earnings management based upon neural networks techniques, and not to adopt linear regression models.

Keywords: Earnings management, generalized regression neural networks, linear regression, multi-layer perceptron, Tehran stock exchange.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2067
682 Electron-Impact Excitation of Kr 5s, 5p Levels

Authors: Alla A. Mityureva

Abstract:

The available data on the cross sections of electronimpact excitation of krypton 5s and 5p configuration levels out of the ground state are represented in convenient and compact form. The results are obtained by regression through all known published data related to this process.

Keywords: Cross section, electron excitation, krypton, regression

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1051
681 Zero Inflated Strict Arcsine Regression Model

Authors: Y. N. Phang, E. F. Loh

Abstract:

Zero inflated strict arcsine model is a newly developed model which is found to be appropriate in modeling overdispersed count data. In this study, we extend zero inflated strict arcsine model to zero inflated strict arcsine regression model by taking into consideration the extra variability caused by extra zeros and covariates in count data. Maximum likelihood estimation method is used in estimating the parameters for this zero inflated strict arcsine regression model.

Keywords: Overdispersed count data, maximum likelihood estimation, simulated annealing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1712
680 Clustering Protein Sequences with Tailored General Regression Model Technique

Authors: G. Lavanya Devi, Allam Appa Rao, A. Damodaram, GR Sridhar, G. Jaya Suma

Abstract:

Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them.

Keywords: Clustering, General Regression Model, Protein Sequences, Similarity Measure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1522
679 Comparison of Polynomial and Radial Basis Kernel Functions based SVR and MLR in Modeling Mass Transfer by Vertical and Inclined Multiple Plunging Jets

Authors: S. Deswal, M. Pal

Abstract:

Presently various computational techniques are used in modeling and analyzing environmental engineering data. In the present study, an intra-comparison of polynomial and radial basis kernel functions based on Support Vector Regression and, in turn, an inter-comparison with Multi Linear Regression has been attempted in modeling mass transfer capacity of vertical (θ = 90O) and inclined (θ multiple plunging jets (varying from 1 to 16 numbers). The data set used in this study consists of four input parameters with a total of eighty eight cases, forty four each for vertical and inclined multiple plunging jets. For testing, tenfold cross validation was used. Correlation coefficient values of 0.971 and 0.981 along with corresponding root mean square error values of 0.0025 and 0.0020 were achieved by using polynomial and radial basis kernel functions based Support Vector Regression respectively. An intra-comparison suggests improved performance by radial basis function in comparison to polynomial kernel based Support Vector Regression. Further, an inter-comparison with Multi Linear Regression (correlation coefficient = 0.973 and root mean square error = 0.0024) reveals that radial basis kernel functions based Support Vector Regression performs better in modeling and estimating mass transfer by multiple plunging jets.

Keywords: Mass transfer, multiple plunging jets, polynomial and radial basis kernel functions, Support Vector Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1384
678 Solving Single Machine Total Weighted Tardiness Problem Using Gaussian Process Regression

Authors: Wanatchapong Kongkaew

Abstract:

This paper proposes an application of probabilistic technique, namely Gaussian process regression, for estimating an optimal sequence of the single machine with total weighted tardiness (SMTWT) scheduling problem. In this work, the Gaussian process regression (GPR) model is utilized to predict an optimal sequence of the SMTWT problem, and its solution is improved by using an iterated local search based on simulated annealing scheme, called GPRISA algorithm. The results show that the proposed GPRISA method achieves a very good performance and a reasonable trade-off between solution quality and time consumption. Moreover, in the comparison of deviation from the best-known solution, the proposed mechanism noticeably outperforms the recently existing approaches.

 

Keywords: Gaussian process regression, iterated local search, simulated annealing, single machine total weighted tardiness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2197
677 Neuro-fuzzy Model and Regression Model a Comparison Study of MRR in Electrical Discharge Machining of D2 Tool Steel

Authors: M. K. Pradhan, C. K. Biswas,

Abstract:

In the current research, neuro-fuzzy model and regression model was developed to predict Material Removal Rate in Electrical Discharge Machining process for AISI D2 tool steel with copper electrode. Extensive experiments were conducted with various levels of discharge current, pulse duration and duty cycle. The experimental data are split into two sets, one for training and the other for validation of the model. The training data were used to develop the above models and the test data, which was not used earlier to develop these models were used for validation the models. Subsequently, the models are compared. It was found that the predicted and experimental results were in good agreement and the coefficients of correlation were found to be 0.999 and 0.974 for neuro fuzzy and regression model respectively

Keywords: Electrical discharge machining, material removal rate, neuro-fuzzy model, regression model, mountain clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1357
676 Comparative Studies of Support Vector Regression between Reproducing Kernel and Gaussian Kernel

Authors: Wei Zhang, Su-Yan Tang, Yi-Fan Zhu, Wei-Ping Wang

Abstract:

Support vector regression (SVR) has been regarded as a state-of-the-art method for approximation and regression. The importance of kernel function, which is so-called admissible support vector kernel (SV kernel) in SVR, has motivated many studies on its composition. The Gaussian kernel (RBF) is regarded as a “best" choice of SV kernel used by non-expert in SVR, whereas there is no evidence, except for its superior performance on some practical applications, to prove the statement. Its well-known that reproducing kernel (R.K) is also a SV kernel which possesses many important properties, e.g. positive definiteness, reproducing property and composing complex R.K by simpler ones. However, there are a limited number of R.Ks with explicit forms and consequently few quantitative comparison studies in practice. In this paper, two R.Ks, i.e. SV kernels, composed by the sum and product of a translation invariant kernel in a Sobolev space are proposed. An exploratory study on the performance of SVR based general R.K is presented through a systematic comparison to that of RBF using multiple criteria and synthetic problems. The results show that the R.K is an equivalent or even better SV kernel than RBF for the problems with more input variables (more than 5, especially more than 10) and higher nonlinearity.

Keywords: admissible support vector kernel, reproducing kernel, reproducing kernel Hilbert space, support vector regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1544
675 Developing Pedotransfer Functions for Estimating Some Soil Properties using Artificial Neural Network and Multivariate Regression Approaches

Authors: Fereydoon Sarmadian, Ali Keshavarzi

Abstract:

Study of soil properties like field capacity (F.C.) and permanent wilting point (P.W.P.) play important roles in study of soil moisture retention curve. Although these parameters can be measured directly, their measurement is difficult and expensive. Pedotransfer functions (PTFs) provide an alternative by estimating soil parameters from more readily available soil data. In this investigation, 70 soil samples were collected from different horizons of 15 soil profiles located in the Ziaran region, Qazvin province, Iran. The data set was divided into two subsets for calibration (80%) and testing (20%) of the models and their normality were tested by Kolmogorov-Smirnov method. Both multivariate regression and artificial neural network (ANN) techniques were employed to develop the appropriate PTFs for predicting soil parameters using easily measurable characteristics of clay, silt, O.C, S.P, B.D and CaCO3. The performance of the multivariate regression and ANN models was evaluated using an independent test data set. In order to evaluate the models, root mean square error (RMSE) and R2 were used. The comparison of RSME for two mentioned models showed that the ANN model gives better estimates of F.C and P.W.P than the multivariate regression model. The value of RMSE and R2 derived by ANN model for F.C and P.W.P were (2.35, 0.77) and (2.83, 0.72), respectively. The corresponding values for multivariate regression model were (4.46, 0.68) and (5.21, 0.64), respectively. Results showed that ANN with five neurons in hidden layer had better performance in predicting soil properties than multivariate regression.

Keywords: Artificial neural network, Field capacity, Permanentwilting point, Pedotransfer functions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1775
674 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

Keywords: Clustering, Data analysis, Data mining, Predictive models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1903
673 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski

Abstract:

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Keywords: Bootstrap, Edgeworth approximation, independent and Identical distributed, quantile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 378
672 Methods for Data Selection in Medical Databases: The Binary Logistic Regression -Relations with the Calculated Risks

Authors: Cristina G. Dascalu, Elena Mihaela Carausu, Daniela Manuc

Abstract:

The medical studies often require different methods for parameters selection, as a second step of processing, after the database-s designing and filling with information. One common task is the selection of fields that act as risk factors using wellknown methods, in order to find the most relevant risk factors and to establish a possible hierarchy between them. Different methods are available in this purpose, one of the most known being the binary logistic regression. We will present the mathematical principles of this method and a practical example of using it in the analysis of the influence of 10 different psychiatric diagnostics over 4 different types of offences (in a database made from 289 psychiatric patients involved in different types of offences). Finally, we will make some observations about the relation between the risk factors hierarchy established through binary logistic regression and the individual risks, as well as the results of Chi-squared test. We will show that the hierarchy built using the binary logistic regression doesn-t agree with the direct order of risk factors, even if it was naturally to assume this hypothesis as being always true.

Keywords: Databases, risk factors, binary logisticregression, hierarchy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1292
671 Second Order Admissibilities in Multi-parameter Logistic Regression Model

Authors: Chie Obayashi, Hidekazu Tanaka, Yoshiji Takagi

Abstract:

In multi-parameter family of distributions, conditions for a modified maximum likelihood estimator to be second order admissible are given. Applying these results to the multi-parameter logistic regression model, it is shown that the maximum likelihood estimator is always second order inadmissible. Also, conditions for the Berkson estimator to be second order admissible are given.

Keywords: Berkson estimator, modified maximum likelihood estimator, Multi-parameter logistic regression model, second order admissibility.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1580