Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7233

Search results for: regression models

7233 New Segmentation of Piecewise Linear Regression Models Using Reversible Jump MCMC Algorithm

Authors: Suparman

Abstract:

Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studies the problem of parameter estimation of piecewise linear regression models. The method used to estimate the parameters of picewise linear regression models is Bayesian method. But the Bayes estimator can not be found analytically. To overcome these problems, the reversible jump MCMC algorithm is proposed. Reversible jump MCMC algorithm generates the Markov chain converges to the limit distribution of the posterior distribution of the parameters of picewise linear regression models. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of picewise linear regression models.

Keywords: regression, piecewise, Bayesian, reversible Jump MCMC

Procedia PDF Downloads 430
7232 Application Difference between Cox and Logistic Regression Models

Authors: Idrissa Kayijuka

Abstract:

The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.

Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio

Procedia PDF Downloads 323
7231 Orthogonal Regression for Nonparametric Estimation of Errors-In-Variables Models

Authors: Anastasiia Yu. Timofeeva

Abstract:

Two new algorithms for nonparametric estimation of errors-in-variables models are proposed. The first algorithm is based on penalized regression spline. The spline is represented as a piecewise-linear function and for each linear portion orthogonal regression is estimated. This algorithm is iterative. The second algorithm involves locally weighted regression estimation. When the independent variable is measured with error such estimation is a complex nonlinear optimization problem. The simulation results have shown the advantage of the second algorithm under the assumption that true smoothing parameters values are known. Nevertheless the use of some indexes of fit to smoothing parameters selection gives the similar results and has an oversmoothing effect.

Keywords: grade point average, orthogonal regression, penalized regression spline, locally weighted regression

Procedia PDF Downloads 333
7230 A Fuzzy Linear Regression Model Based on Dissemblance Index

Authors: Shih-Pin Chen, Shih-Syuan You

Abstract:

Fuzzy regression models are useful for investigating the relationship between explanatory variables and responses in fuzzy environments. To overcome the deficiencies of previous models and increase the explanatory power of fuzzy data, the graded mean integration (GMI) representation is applied to determine representative crisp regression coefficients. A fuzzy regression model is constructed based on the modified dissemblance index (MDI), which can precisely measure the actual total error. Compared with previous studies based on the proposed MDI and distance criterion, the results from commonly used test examples show that the proposed fuzzy linear regression model has higher explanatory power and forecasting accuracy.

Keywords: dissemblance index, fuzzy linear regression, graded mean integration, mathematical programming

Procedia PDF Downloads 345
7229 Use of Multistage Transition Regression Models for Credit Card Income Prediction

Authors: Denys Osipenko, Jonathan Crook

Abstract:

Because of the variety of the card holders’ behaviour types and income sources each consumer account can be transferred to a variety of states. Each consumer account can be inactive, transactor, revolver, delinquent, defaulted and requires an individual model for the income prediction. The estimation of transition probabilities between statuses at the account level helps to avoid the memorylessness of the Markov Chains approach. This paper investigates the transition probabilities estimation approaches to credit cards income prediction at the account level. The key question of empirical research is which approach gives more accurate results: multinomial logistic regression or multistage conditional logistic regression with binary target. Both models have shown moderate predictive power. Prediction accuracy for conditional logistic regression depends on the order of stages for the conditional binary logistic regression. On the other hand, multinomial logistic regression is easier for usage and gives integrate estimations for all states without priorities. Thus further investigations can be concentrated on alternative modeling approaches such as discrete choice models.

Keywords: multinomial regression, conditional logistic regression, credit account state, transition probability

Procedia PDF Downloads 404
7228 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

Authors: Jihye Jeon

Abstract:

This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.

Keywords: multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon

Procedia PDF Downloads 488
7227 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 63
7226 Optimization of Slider Crank Mechanism Using Design of Experiments and Multi-Linear Regression

Authors: Galal Elkobrosy, Amr M. Abdelrazek, Bassuny M. Elsouhily, Mohamed E. Khidr

Abstract:

Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3rd degree to 1st degree and suggested valid predictions and stable explanations.

Keywords: design of experiments, regression analysis, SI engine, statistical modeling

Procedia PDF Downloads 117
7225 A Learning-Based EM Mixture Regression Algorithm

Authors: Yi-Cheng Tian, Miin-Shen Yang

Abstract:

The mixture likelihood approach to clustering is a popular clustering method where the expectation and maximization (EM) algorithm is the most used mixture likelihood method. In the literature, the EM algorithm had been used for mixture regression models. However, these EM mixture regression algorithms are sensitive to initial values with a priori number of clusters. In this paper, to resolve these drawbacks, we construct a learning-based schema for the EM mixture regression algorithm such that it is free of initializations and can automatically obtain an approximately optimal number of clusters. Some numerical examples and comparisons demonstrate the superiority and usefulness of the proposed learning-based EM mixture regression algorithm.

Keywords: clustering, EM algorithm, Gaussian mixture model, mixture regression model

Procedia PDF Downloads 426
7224 Stock Market Prediction by Regression Model with Social Moods

Authors: Masahiro Ohmura, Koh Kakusho, Takeshi Okadome

Abstract:

This paper presents a regression model with autocorrelated errors in which the inputs are social moods obtained by analyzing the adjectives in Twitter posts using a document topic model. The regression model predicts Dow Jones Industrial Average (DJIA) more precisely than autoregressive moving-average models.

Keywords: stock market prediction, social moods, regression model, DJIA

Procedia PDF Downloads 445
7223 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 338
7222 Modelling and Maping Malnutrition Toddlers in Bojonegoro Regency with Mixed Geographically Weighted Regression Approach

Authors: Elvira Mustikawati P.H., Iis Dewi Ratih, Dita Amelia

Abstract:

Bojonegoro has proclaimed a policy of zero malnutrition. Therefore, as an effort to solve the cases of malnutrition children in Bojonegoro, this study used the approach geographically Mixed Weighted Regression (MGWR) to determine the factors that influence the percentage of malnourished children under five in which factors can be divided into locally influential factor in each district and global factors that influence throughout the district. Based on the test of goodness of fit models, R2 and AIC values in GWR models are better than MGWR models. R2 and AIC values in MGWR models are 84.37% and 14.28, while the GWR models respectively are 91.04% and -62.04. Based on the analysis with GWR models, District Sekar, Bubulan, Gondang, and Dander is a district with three predictor variables (percentage of vitamin A, the percentage of births assisted health personnel, and the percentage of clean water) that significantly influence the percentage of malnourished children under five.

Keywords: GWR, MGWR, R2, AIC

Procedia PDF Downloads 201
7221 Quantitative Structure-Activity Relationship Study of Some Quinoline Derivatives as Antimalarial Agents

Authors: M. Ouassaf, S. Belaid

Abstract:

A series of quinoline derivatives with antimalarial activity were subjected to two-dimensional quantitative structure-activity relationship (2D-QSAR) studies. Three models were implemented using multiple regression linear MLR, a regression partial least squares (PLS), nonlinear regression (MNLR), to see which descriptors are closely related to the activity biologic. We relied on a principal component analysis (PCA). Based on our results, a comparison of the quality of, MLR, PLS, and MNLR models shows that the MNLR (R = 0.914 and R² = 0.835, RCV= 0.853) models have substantially better predictive capability because the MNLR approach gives better results than MLR (R = 0.835 and R² = 0,752, RCV=0.601)), PLS (R = 0.742 and R² = 0.552, RCV=0.550) The model of MNLR gave statistically significant results and showed good stability to data variation in leave-one-out cross-validation. The obtained results suggested that our proposed model MNLR may be useful to predict the biological activity of derivatives of quinoline.

Keywords: antimalarial, quinoline, QSAR, PCA, MLR , MNLR, MLR

Procedia PDF Downloads 87
7220 Arabic Character Recognition Using Regression Curves with the Expectation Maximization Algorithm

Authors: Abdullah A. AlShaher

Abstract:

In this paper, we demonstrate how regression curves can be used to recognize 2D non-rigid handwritten shapes. Each shape is represented by a set of non-overlapping uniformly distributed landmarks. The underlying models utilize 2nd order of polynomials to model shapes within a training set. To estimate the regression models, we need to extract the required coefficients which describe the variations for a set of shape class. Hence, a least square method is used to estimate such modes. We then proceed by training these coefficients using the apparatus Expectation Maximization algorithm. Recognition is carried out by finding the least error landmarks displacement with respect to the model curves. Handwritten isolated Arabic characters are used to evaluate our approach.

Keywords: character recognition, regression curves, handwritten Arabic letters, expectation maximization algorithm

Procedia PDF Downloads 78
7219 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities

Authors: Retius Chifurira

Abstract:

Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.

Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities

Procedia PDF Downloads 105
7218 Predicting Survival in Cancer: How Cox Regression Model Compares to Artifial Neural Networks?

Authors: Dalia Rimawi, Walid Salameh, Amal Al-Omari, Hadeel AbdelKhaleq

Abstract:

Predication of Survival time of patients with cancer, is a core factor that influences oncologist decisions in different aspects; such as offered treatment plans, patients’ quality of life and medications development. For a long time proportional hazards Cox regression (ph. Cox) was and still the most well-known statistical method to predict survival outcome. But due to the revolution of data sciences; new predication models were employed and proved to be more flexible and provided higher accuracy in that type of studies. Artificial neural network is one of those models that is suitable to handle time to event predication. In this study we aim to compare ph Cox regression with artificial neural network method according to data handling and Accuracy of each model.

Keywords: Cox regression, neural networks, survival, cancer.

Procedia PDF Downloads 63
7217 Prediction of Energy Storage Areas for Static Photovoltaic System Using Irradiation and Regression Modelling

Authors: Kisan Sarda, Bhavika Shingote

Abstract:

This paper aims to evaluate regression modelling for prediction of Energy storage of solar photovoltaic (PV) system using Semi parametric regression techniques because there are some parameters which are known while there are some unknown parameters like humidity, dust etc. Here irradiation of solar energy is different for different places on the basis of Latitudes, so by finding out areas which give more storage we can implement PV systems at those places and our need of energy will be fulfilled. This regression modelling is done for daily, monthly and seasonal prediction of solar energy storage. In this, we have used R modules for designing the algorithm. This algorithm will give the best comparative results than other regression models for the solar PV cell energy storage.

Keywords: semi parametric regression, photovoltaic (PV) system, regression modelling, irradiation

Procedia PDF Downloads 279
7216 A Study on Characteristics of Hedonic Price Models in Korea Based on Meta-Regression Analysis

Authors: Minseo Jo

Abstract:

The purpose of this paper is to examine the factors in the hedonic price models, that has significance impact in determining the price of apartments. There are many variables employed in the hedonic price models and their effectiveness vary differently according to the researchers and the regions they are analysing. In order to consider various conditions, the meta-regression analysis has been selected for the study. In this paper, four meta-independent variables, from the 65 hedonic price models to analysis. The factors that influence the prices of apartments, as well as including factors that influence the prices of apartments, regions, which are divided into two of the research performed, years of research performed, the coefficients of the functions employed. The covariance between the four meta-variables and p-value of the coefficients and the four meta-variables and number of data used in the 65 hedonic price models have been analyzed in this study. The six factors that are most important in deciding the prices of apartments are positioning of apartments, the noise of the apartments, points of the compass and views from the apartments, proximity to the public transportations, companies that have constructed the apartments, social environments (such as schools etc.).

Keywords: hedonic price model, housing price, meta-regression analysis, characteristics

Procedia PDF Downloads 312
7215 Behind Fuzzy Regression Approach: An Exploration Study

Authors: Lavinia B. Dulla

Abstract:

The exploration study of the fuzzy regression approach attempts to present that fuzzy regression can be used as a possible alternative to classical regression. It likewise seeks to assess the differences and characteristics of simple linear regression and fuzzy regression using the width of prediction interval, mean absolute deviation, and variance of residuals. Based on the simple linear regression model, the fuzzy regression approach is worth considering as an alternative to simple linear regression when the sample size is between 10 and 20. As the sample size increases, the fuzzy regression approach is not applicable to use since the assumption regarding large sample size is already operating within the framework of simple linear regression. Nonetheless, it can be suggested for a practical alternative when decisions often have to be made on the basis of small data.

Keywords: fuzzy regression approach, minimum fuzziness criterion, interval regression, prediction interval

Procedia PDF Downloads 137
7214 Chemometric QSRR Evaluation of Behavior of s-Triazine Pesticides in Liquid Chromatography

Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević

Abstract:

This study considers the selection of the most suitable in silico molecular descriptors that could be used for s-triazine pesticides characterization. Suitable descriptors among topological, geometrical and physicochemical are used for quantitative structure-retention relationships (QSRR) model establishment. Established models were obtained using linear regression (LR) and multiple linear regression (MLR) analysis. In this paper, MLR models were established avoiding multicollinearity among the selected molecular descriptors. Statistical quality of established models was evaluated by standard and cross-validation statistical parameters. For detection of similarity or dissimilarity among investigated s-triazine pesticides and their classification, principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used and gave similar grouping. This study is financially supported by COST action TD1305.

Keywords: chemometrics, classification analysis, molecular descriptors, pesticides, regression analysis

Procedia PDF Downloads 307
7213 Multiple Linear Regression for Rapid Estimation of Subsurface Resistivity from Apparent Resistivity Measurements

Authors: Sabiu Bala Muhammad, Rosli Saad

Abstract:

Multiple linear regression (MLR) models for fast estimation of true subsurface resistivity from apparent resistivity field measurements are developed and assessed in this study. The parameters investigated were apparent resistivity (ρₐ), horizontal location (X) and depth (Z) of measurement as the independent variables; and true resistivity (ρₜ) as the dependent variable. To achieve linearity in both resistivity variables, datasets were first transformed into logarithmic domain following diagnostic checks of normality of the dependent variable and heteroscedasticity to ensure accurate models. Four MLR models were developed based on hierarchical combination of the independent variables. The generated MLR coefficients were applied to another data set to estimate ρₜ values for validation. Contours of the estimated ρₜ values were plotted and compared to the observed data plots at the colour scale and blanking for visual assessment. The accuracy of the models was assessed using coefficient of determination (R²), standard error (SE) and weighted mean absolute percentage error (wMAPE). It is concluded that the MLR models can estimate ρₜ for with high level of accuracy.

Keywords: apparent resistivity, depth, horizontal location, multiple linear regression, true resistivity

Procedia PDF Downloads 198
7212 Agriculture Yield Prediction Using Predictive Analytic Techniques

Authors: Nagini Sabbineni, Rajini T. V. Kanth, B. V. Kiranmayee

Abstract:

India’s economy primarily depends on agriculture yield growth and their allied agro industry products. The agriculture yield prediction is the toughest task for agricultural departments across the globe. The agriculture yield depends on various factors. Particularly countries like India, majority of agriculture growth depends on rain water, which is highly unpredictable. Agriculture growth depends on different parameters, namely Water, Nitrogen, Weather, Soil characteristics, Crop rotation, Soil moisture, Surface temperature and Rain water etc. In our paper, lot of Explorative Data Analysis is done and various predictive models were designed. Further various regression models like Linear, Multiple Linear, Non-linear models are tested for the effective prediction or the forecast of the agriculture yield for various crops in Andhra Pradesh and Telangana states.

Keywords: agriculture yield growth, agriculture yield prediction, explorative data analysis, predictive models, regression models

Procedia PDF Downloads 232
7211 Using the Bootstrap for Problems Statistics

Authors: Brahim Boukabcha, Amar Rebbouh

Abstract:

The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.

Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models

Procedia PDF Downloads 286
7210 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate

Procedia PDF Downloads 104
7209 Detecting Earnings Management via Statistical and Neural Networks Techniques

Authors: Mohammad Namazi, Mohammad Sadeghzadeh Maharluie

Abstract:

Predicting earnings management is vital for the capital market participants, financial analysts and managers. The aim of this research is attempting to respond to this query: Is there a significant difference between the regression model and neural networks’ models in predicting earnings management, and which one leads to a superior prediction of it? In approaching this question, a Linear Regression (LR) model was compared with two neural networks including Multi-Layer Perceptron (MLP), and Generalized Regression Neural Network (GRNN). The population of this study includes 94 listed companies in Tehran Stock Exchange (TSE) market from 2003 to 2011. After the results of all models were acquired, ANOVA was exerted to test the hypotheses. In general, the summary of statistical results showed that the precision of GRNN did not exhibit a significant difference in comparison with MLP. In addition, the mean square error of the MLP and GRNN showed a significant difference with the multi variable LR model. These findings support the notion of nonlinear behavior of the earnings management. Therefore, it is more appropriate for capital market participants to analyze earnings management based upon neural networks techniques, and not to adopt linear regression models.

Keywords: earnings management, generalized linear regression, neural networks multi-layer perceptron, Tehran stock exchange

Procedia PDF Downloads 358
7208 Forecasting Equity Premium Out-of-Sample with Sophisticated Regression Training Techniques

Authors: Jonathan Iworiso

Abstract:

Forecasting the equity premium out-of-sample is a major concern to researchers in finance and emerging markets. The quest for a superior model that can forecast the equity premium with significant economic gains has resulted in several controversies on the choice of variables and suitable techniques among scholars. This research focuses mainly on the application of Regression Training (RT) techniques to forecast monthly equity premium out-of-sample recursively with an expanding window method. A broad category of sophisticated regression models involving model complexity was employed. The RT models include Ridge, Forward-Backward (FOBA) Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), Relaxed LASSO, Elastic Net, and Least Angle Regression were trained and used to forecast the equity premium out-of-sample. In this study, the empirical investigation of the RT models demonstrates significant evidence of equity premium predictability both statistically and economically relative to the benchmark historical average, delivering significant utility gains. They seek to provide meaningful economic information on mean-variance portfolio investment for investors who are timing the market to earn future gains at minimal risk. Thus, the forecasting models appeared to guarantee an investor in a market setting who optimally reallocates a monthly portfolio between equities and risk-free treasury bills using equity premium forecasts at minimal risk.

Keywords: regression training, out-of-sample forecasts, expanding window, statistical predictability, economic significance, utility gains

Procedia PDF Downloads 11
7207 Internet Purchases in European Union Countries: Multiple Linear Regression Approach

Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić

Abstract:

This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analysed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analysed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.

Keywords: European union, Internet purchases, multiple linear regression model, outlier

Procedia PDF Downloads 224
7206 Optimization of Machine Learning Regression Results: An Application on Health Expenditures

Authors: Songul Cinaroglu

Abstract:

Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.

Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure

Procedia PDF Downloads 108
7205 Switched System Diagnosis Based on Intelligent State Filtering with Unknown Models

Authors: Nada Slimane, Foued Theljani, Faouzi Bouani

Abstract:

The paper addresses the problem of fault diagnosis for systems operating in several modes (normal or faulty) based on states assessment. We use, for this purpose, a methodology consisting of three main processes: 1) sequential data clustering, 2) linear model regression and 3) state filtering. Typically, Kalman Filter (KF) is an algorithm that provides estimation of unknown states using a sequence of I/O measurements. Inevitably, although it is an efficient technique for state estimation, it presents two main weaknesses. First, it merely predicts states without being able to isolate/classify them according to their different operating modes, whether normal or faulty modes. To deal with this dilemma, the KF is endowed with an extra clustering step based fully on sequential version of the k-means algorithm. Second, to provide state estimation, KF requires state space models, which can be unknown. A linear regularized regression is used to identify the required models. To prove its effectiveness, the proposed approach is assessed on a simulated benchmark.

Keywords: clustering, diagnosis, Kalman Filtering, k-means, regularized regression

Procedia PDF Downloads 109
7204 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 303