##### 738 Estimating Regression Parameters in Linear Regression Model with a Censored Response Variable

**Authors:**
Jesus Orbe,
Vicente Nunez-Anton

**Abstract:**

In this work we study the effect of several covariates X on a censored response variable T with unknown probability distribution. In this context, most of the studies in the literature can be located in two possible general classes of regression models: models that study the effect the covariates have on the hazard function; and models that study the effect the covariates have on the censored response variable. Proposals in this paper are in the second class of models and, more specifically, on least squares based model approach. Thus, using the bootstrap estimate of the bias, we try to improve the estimation of the regression parameters by reducing their bias, for small sample sizes. Simulation results presented in the paper show that, for reasonable sample sizes and censoring levels, the bias is always smaller for the new proposals.

**Keywords:**
Censored response variable,
regression,
bias.

##### 737 Computational Aspects of Regression Analysis of Interval Data

**Authors:**
Michal Cerny

**Abstract:**

We consider linear regression models where both input data (the values of independent variables) and output data (the observations of the dependent variable) are interval-censored. We introduce a possibilistic generalization of the least squares estimator, so called OLS-set for the interval model. This set captures the impact of the loss of information on the OLS estimator caused by interval censoring and provides a tool for quantification of this effect. We study complexity-theoretic properties of the OLS-set. We also deal with restricted versions of the general interval linear regression model, in particular the crisp input – interval output model. We give an argument that natural descriptions of the OLS-set in the crisp input – interval output cannot be computed in polynomial time. Then we derive easily computable approximations for the OLS-set which can be used instead of the exact description. We illustrate the approach by an example.

**Keywords:**
Linear regression,
interval-censored data,
computational complexity.

##### 736 Survival Model for Partly Interval-Censored Data with Application to Anti D in Rhesus D Negative Studies

**Authors:**
F. A. M. Elfaki,
Amar Abobakar,
M. Azram,
M. Usman

**Abstract:**

This paper discusses regression analysis of partly interval-censored failure time data, which is occur in many fields including demographical, epidemiological, financial, medical and sociological studies. For the problem, we focus on the situation where the survival time of interest can be described by the additive hazards model in the present of partly interval-censored. A major advantage of the approach is its simplicity and it can be easily implemented by using R software. Simulation studies are conducted which indicate that the approach performs well for practical situations and comparable to the existing methods. The methodology is applied to a set of partly interval-censored failure time data arising from anti D in Rhesus D negative studies.

**Keywords:**
Anti D in Rhesus D negative,
Cox’s model,
EM algorithm.

##### 735 A Renovated Cook's Distance Based On The Buckley-James Estimate In Censored Regression

**Authors:**
Nazrina Aziz,
Dong Q. Wang

**Abstract:**

There have been various methods created based on the regression ideas to resolve the problem of data set containing censored observations, i.e. the Buckley-James method, Miller-s method, Cox method, and Koul-Susarla-Van Ryzin estimators. Even though comparison studies show the Buckley-James method performs better than some other methods, it is still rarely used by researchers mainly because of the limited diagnostics analysis developed for the Buckley-James method thus far. Therefore, a diagnostic tool for the Buckley-James method is proposed in this paper. It is called the renovated Cook-s Distance, (RD* i ) and has been developed based on the Cook-s idea. The renovated Cook-s Distance (RD* i ) has advantages (depending on the analyst demand) over (i) the change in the fitted value for a single case, DFIT* i as it measures the influence of case i on all n fitted values Yˆ∗ (not just the fitted value for case i as DFIT* i) (ii) the change in the estimate of the coefficient when the ith case is deleted, DBETA* i since DBETA* i corresponds to the number of variables p so it is usually easier to look at a diagnostic measure such as RD* i since information from p variables can be considered simultaneously. Finally, an example using Stanford Heart Transplant data is provided to illustrate the proposed diagnostic tool.

**Keywords:**
Buckley-James estimators,
censored regression,
censored data,
diagnostic analysis,
product-limit estimator,
renovated Cook's Distance.

##### 734 Discovery of Fuzzy Censored Production Rules from Large Set of Discovered Fuzzy if then Rules

**Authors:**
Tamanna Siddiqui,
M. Afshar Alam

**Abstract:**

**Keywords:**
Uncertainty Quantification,
Fuzzy if then rules,
Fuzzy Censored Production Rules,
Learning algorithm.

##### 733 Relationship between Sums of Squares in Linear Regression and Semi-parametric Regression

**Authors:**
Dursun Aydın,
Bilgin Senel

**Abstract:**

**Keywords:**
Semi-parametric regression,
Penalized LeastSquares,
Residuals,
Deviance,
Smoothing Spline.

##### 732 A Comparison of the Sum of Squares in Linear and Partial Linear Regression Models

**Authors:**
Dursun Aydın

**Abstract:**

**Keywords:**
Partial Linear Regression Model,
Linear RegressionModel,
Residuals,
Deviance,
Smoothing Spline.

##### 731 A Comparison of the Nonparametric Regression Models using Smoothing Spline and Kernel Regression

**Authors:**
Dursun Aydin

**Abstract:**

**Keywords:**
Kernel regression,
Nonparametric models,
Prediction,
Smoothing spline.

##### 730 Maximum Likelihood Estimation of Burr Type V Distribution under Left Censored Samples

**Abstract:**

The paper deals with the maximum likelihood estimation of the parameters of the Burr type V distribution based on left censored samples. The maximum likelihood estimators (MLE) of the parameters have been derived and the Fisher information matrix for the parameters of the said distribution has been obtained explicitly. The confidence intervals for the parameters have also been discussed. A simulation study has been conducted to investigate the performance of the point and interval estimates.

**Keywords:**
Fisher information matrix,
confidence intervals,
censoring.

##### 729 Orthogonal Regression for Nonparametric Estimation of Errors-in-Variables Models

**Authors:**
Anastasiia Yu. Timofeeva

**Abstract:**

Two new algorithms for nonparametric estimation of errors-in-variables models are proposed. The first algorithm is based on penalized regression spline. The spline is represented as a piecewise-linear function and for each linear portion orthogonal regression is estimated. This algorithm is iterative. The second algorithm involves locally weighted regression estimation. When the independent variable is measured with error such estimation is a complex nonlinear optimization problem. The simulation results have shown the advantage of the second algorithm under the assumption that true smoothing parameters values are known. Nevertheless the use of some indexes of fit to smoothing parameters selection gives the similar results and has an oversmoothing effect.

**Keywords:**
Grade point average,
orthogonal regression,
penalized regression spline,
locally weighted regression.

##### 728 On Bayesian Analysis of Failure Rate under Topp Leone Distribution using Complete and Censored Samples

**Abstract:**

The article is concerned with analysis of failure rate (shape parameter) under the Topp Leone distribution using a Bayesian framework. Different loss functions and a couple of noninformative priors have been assumed for posterior estimation. The posterior predictive distributions have also been derived. A simulation study has been carried to compare the performance of different estimators. A real life example has been used to illustrate the applicability of the results obtained. The findings of the study suggest that the precautionary loss function based on Jeffreys prior and singly type II censored samples can effectively be employed to obtain the Bayes estimate of the failure rate under Topp Leone distribution.

**Keywords:**
loss functions,
type II censoring,
posterior
distribution,
Bayes estimators.

##### 727 Evolutionary Approach for Automated Discovery of Censored Production Rules

**Authors:**
Kamal K. Bharadwaj,
Basheer M. Al-Maqaleh

**Abstract:**

**Keywords:**
Censored Production Rule,
Data Mining,
MachineLearning,
Evolutionary Algorithms.

##### 726 On the outlier Detection in Nonlinear Regression

**Authors:**
Hossein Riazoshams,
Midi Habshah,
Jr.,
Mohamad Bakri Adam

**Abstract:**

**Keywords:**
Nonlinear Regression,
outliers,
Gradient,
LeastSquare,
M-estimate,
MM-estimate.

##### 725 Robust Regression and its Application in Financial Data Analysis

**Authors:**
Mansoor Momeni,
Mahmoud Dehghan Nayeri,
Ali Faal Ghayoumi,
Hoda Ghorbani

**Abstract:**

This research is aimed to describe the application of robust regression and its advantages over the least square regression method in analyzing financial data. To do this, relationship between earning per share, book value of equity per share and share price as price model and earning per share, annual change of earning per share and return of stock as return model is discussed using both robust and least square regressions, and finally the outcomes are compared. Comparing the results from the robust regression and the least square regression shows that the former can provide the possibility of a better and more realistic analysis owing to eliminating or reducing the contribution of outliers and influential data. Therefore, robust regression is recommended for getting more precise results in financial data analysis.

**Keywords:**
Financial data analysis,
Influential data,
Outliers,
Robust regression.

##### 724 Regression Test Selection Technique for Multi-Programming Language

**Authors:**
Walid S. Abd El-hamid,
Sherif S. El-Etriby,
Mohiy M. Hadhoud

**Abstract:**

**Keywords:**
Regression testing,
testing,
test selection,
softwareevolution,
software maintenance.

##### 723 Model-Based Software Regression Test Suite Reduction

**Authors:**
Shiwei Deng,
Yang Bao

**Abstract:**

**Keywords:**
Dependence analysis,
EFSM model,
greedy
algorithm,
regression test.

##### 722 Stock Market Prediction by Regression Model with Social Moods

**Authors:**
Masahiro Ohmura,
Koh Kakusho,
Takeshi Okadome

**Abstract:**

This paper presents a regression model with autocorrelated errors in which the inputs are social moods obtained by analyzing the adjectives in Twitter posts using a document topic model, where document topics are extracted using LDA. The regression model predicts Dow Jones Industrial Average (DJIA) more precisely than autoregressive moving-average models.

**Keywords:**
Regression model,
social mood,
stock market
prediction,
Twitter.

##### 721 A Fuzzy Linear Regression Model Based on Dissemblance Index

**Authors:**
Shih-Pin Chen,
Shih-Syuan You

**Abstract:**

**Keywords:**
Dissemblance index,
fuzzy linear regression,
graded
mean integration,
mathematical programming.

##### 720 A Forward Automatic Censored Cell-Averaging Detector for Multiple Target Situations in Log-Normal Clutter

**Authors:**
Musa'ed N. Almarshad,
Saleh A. Alshebeili,
Mourad Barkat

**Abstract:**

**Keywords:**
CFAR,
Log-normal clutter,
Censoring,
Probabilityof detection,
Probability of false alarm,
Probability of falsecensoring.

##### 719 A Cumulative Learning Approach to Data Mining Employing Censored Production Rules (CPRs)

**Authors:**
Rekha Kandwal,
Kamal K.Bharadwaj

**Abstract:**

Knowledge is indispensable but voluminous knowledge becomes a bottleneck for efficient processing. A great challenge for data mining activity is the generation of large number of potential rules as a result of mining process. In fact sometimes result size is comparable to the original data. Traditional data mining pruning activities such as support do not sufficiently reduce the huge rule space. Moreover, many practical applications are characterized by continual change of data and knowledge, thereby making knowledge voluminous with each change. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. Michalski & Winston proposed Censored Production Rules (CPRs), as an extension of production rules, that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight or there is simply no information available as to whether it holds or not. Thus the 'If P Then D' part of the CPR expresses important information while the Unless C part acts only as a switch changes the polarity of D to ~D. In this paper a scheme based on Dempster-Shafer Theory (DST) interpretation of a CPR is suggested for discovering CPRs from the discovered flat PRs. The discovery of CPRs from flat rules would result in considerable reduction of the already discovered rules. The proposed scheme incrementally incorporates new knowledge and also reduces the size of knowledge base considerably with each episode. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested cumulative learning scheme would be useful in mining data streams.

**Keywords:**
Censored production rules,
cumulative learning,
data mining,
machine learning.

##### 718 Segmentation of Piecewise Polynomial Regression Model by Using Reversible Jump MCMC Algorithm

**Authors:**
Suparman

**Abstract:**

Piecewise polynomial regression model is very flexible model for modeling the data. If the piecewise polynomial regression model is matched against the data, its parameters are not generally known. This paper studies the parameter estimation problem of piecewise polynomial regression model. The method which is used to estimate the parameters of the piecewise polynomial regression model is Bayesian method. Unfortunately, the Bayes estimator cannot be found analytically. Reversible jump MCMC algorithm is proposed to solve this problem. Reversible jump MCMC algorithm generates the Markov chain that converges to the limit distribution of the posterior distribution of piecewise polynomial regression model parameter. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of piecewise polynomial regression model.

**Keywords:**
Piecewise,
Bayesian,
reversible jump MCMC,
segmentation.

##### 717 Fuzzy Logic Approach to Robust Regression Models of Uncertain Medical Categories

**Authors:**
Arkady Bolotin

**Abstract:**

Dichotomization of the outcome by a single cut-off point is an important part of various medical studies. Usually the relationship between the resulted dichotomized dependent variable and explanatory variables is analyzed with linear regression, probit regression or logistic regression. However, in many real-life situations, a certain cut-off point dividing the outcome into two groups is unknown and can be specified only approximately, i.e. surrounded by some (small) uncertainty. It means that in order to have any practical meaning the regression model must be robust to this uncertainty. In this paper, we show that neither the beta in the linear regression model, nor its significance level is robust to the small variations in the dichotomization cut-off point. As an alternative robust approach to the problem of uncertain medical categories, we propose to use the linear regression model with the fuzzy membership function as a dependent variable. This fuzzy membership function denotes to what degree the value of the underlying (continuous) outcome falls below or above the dichotomization cut-off point. In the paper, we demonstrate that the linear regression model of the fuzzy dependent variable can be insensitive against the uncertainty in the cut-off point location. In the paper we present the modeling results from the real study of low hemoglobin levels in infants. We systematically test the robustness of the binomial regression model and the linear regression model with the fuzzy dependent variable by changing the boundary for the category Anemia and show that the behavior of the latter model persists over a quite wide interval.

**Keywords:**
Categorization,
Uncertain medical categories,
Binomial regression model,
Fuzzy dependent variable,
Robustness.

##### 716 The Relative Efficiency of Parameter Estimation in Linear Weighted Regression

**Authors:**
Baoguang Tian,
Nan Chen

**Abstract:**

A new relative efficiency in linear model in reference is instructed into the linear weighted regression, and its upper and lower bound are proposed. In the linear weighted regression model, for the best linear unbiased estimation of mean matrix respect to the least-squares estimation, two new relative efficiencies are given, and their upper and lower bounds are also studied.

**Keywords:**
Linear weighted regression,
Relative efficiency,
Mean matrix,
Trace.

##### 715 Internet Purchases in European Union Countries: Multiple Linear Regression Approach

**Authors:**
Ksenija Dumičić,
Anita Čeh Časni,
Irena Palić

**Abstract:**

This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analyzed EU countries is positively correlated with statistically significant variable Gross Domestic Product *per capita *(GDPpc). Also, analyzed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.

**Keywords:**
European Union,
Internet purchases,
multiple linear regression model,
outlier

##### 714 Extended Least Squares LS–SVM

**Authors:**
József Valyon,
Gábor Horváth

**Abstract:**

**Keywords:**
Function estimation,
Least–Squares Support VectorMachines,
Regression,
System Modeling

##### 713 Optimization of Slider Crank Mechanism Using Design of Experiments and Multi-Linear Regression

**Authors:**
Galal Elkobrosy,
Amr M. Abdelrazek,
Bassuny M. Elsouhily,
Mohamed E. Khidr

**Abstract:**

Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3^{rd} degree to 1^{st }degree and suggested valid predictions and stable explanations.

**Keywords:**
Design of experiments,
regression analysis,
SI Engine,
statistical modeling.

##### 712 Churn Prediction: Does Technology Matter?

**Authors:**
John Hadden,
Ashutosh Tiwari,
Rajkumar Roy,
Dymitr Ruta

**Abstract:**

**Keywords:**
Churn,
Decision Trees,
Neural Networks,
Regression.

##### 711 Categorical Data Modeling: Logistic Regression Software

**Authors:**
Abdellatif Tchantchane

**Abstract:**

A Matlab based software for logistic regression is developed to enhance the process of teaching quantitative topics and assist researchers with analyzing wide area of applications where categorical data is involved. The software offers an option of performing stepwise logistic regression to select the most significant predictors. The software includes a feature to detect influential observations in data, and investigates the effect of dropping or misclassifying an observation on a predictor variable. The input data may consist either as a set of individual responses (yes/no) with the predictor variables or as grouped records summarizing various categories for each unique set of predictor variables' values. Graphical displays are used to output various statistical results and to assess the goodness of fit of the logistic regression model. The software recognizes possible convergence constraints when present in data, and the user is notified accordingly.

**Keywords:**
Logistic regression,
Matlab,
Categorical data,
Influential observation.

##### 710 Research on the Problems of Housing Prices in Qingdao from a Macro Perspective

**Authors:**
Liu Zhiyuan,
Sun Zongdi,
Liu Zhiyuan,
Sun Zongdi

**Abstract:**

Qingdao is a seaside city. Taking into account the characteristics of Qingdao, this article established a multiple linear regression model to analyze the impact of macroeconomic factors on housing prices. We used stepwise regression method to make multiple linear regression analysis, and made statistical analysis of F test values and T test values. According to the analysis results, the model is continuously optimized. Finally, this article obtained the multiple linear regression equation and the influencing factors, and the reliability of the model was verified by F test and T test.

**Keywords:**
Housing prices,
multiple linear regression model,
macroeconomic factors,
Qingdao City.

##### 709 Adjusted Ratio and Regression Type Estimators for Estimation of Population Mean when some Observations are missing

**Authors:**
Nuanpan Nangsue

**Abstract:**

Ratio and regression type estimators have been used by previous authors to estimate a population mean for the principal variable from samples in which both auxiliary x and principal y variable data are available. However, missing data are a common problem in statistical analyses with real data. Ratio and regression type estimators have also been used for imputing values of missing y data. In this paper, six new ratio and regression type estimators are proposed for imputing values for any missing y data and estimating a population mean for y from samples with missing x and/or y data. A simulation study has been conducted to compare the six ratio and regression type estimators with a previous estimator of Rueda. Two population sizes N = 1,000 and 5,000 have been considered with sample sizes of 10% and 30% and with correlation coefficients between population variables X and Y of 0.5 and 0.8. In the simulations, 10 and 40 percent of sample y values and 10 and 40 percent of sample x values were randomly designated as missing. The new ratio and regression type estimators give similar mean absolute percentage errors that are smaller than the Rueda estimator for all cases. The new estimators give a large reduction in errors for the case of 40% missing y values and sampling fraction of 30%.

**Keywords:**
Auxiliary variable,
missing data,
ratio and regression
type estimators.