Search results for: linear regression models.
4475 A Comparison of the Sum of Squares in Linear and Partial Linear Regression Models
Authors: Dursun Aydın
Abstract:
In this paper, estimation of the linear regression model is made by ordinary least squares method and the partially linear regression model is estimated by penalized least squares method using smoothing spline. Then, it is investigated that differences and similarity in the sum of squares related for linear regression and partial linear regression models (semi-parametric regression models). It is denoted that the sum of squares in linear regression is reduced to sum of squares in partial linear regression models. Furthermore, we indicated that various sums of squares in the linear regression are similar to different deviance statements in partial linear regression. In addition to, coefficient of the determination derived in linear regression model is easily generalized to coefficient of the determination of the partial linear regression model. For this aim, it is made two different applications. A simulated and a real data set are considered to prove the claim mentioned here. In this way, this study is supported with a simulation and a real data example.Keywords: Partial Linear Regression Model, Linear RegressionModel, Residuals, Deviance, Smoothing Spline.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18724474 Relationship between Sums of Squares in Linear Regression and Semi-parametric Regression
Authors: Dursun Aydın, Bilgin Senel
Abstract:
In this paper, the sum of squares in linear regression is reduced to sum of squares in semi-parametric regression. We indicated that different sums of squares in the linear regression are similar to various deviance statements in semi-parametric regression. In addition to, coefficient of the determination derived in linear regression model is easily generalized to coefficient of the determination of the semi-parametric regression model. Then, it is made an application in order to support the theory of the linear regression and semi-parametric regression. In this way, study is supported with a simulated data example.Keywords: Semi-parametric regression, Penalized LeastSquares, Residuals, Deviance, Smoothing Spline.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18544473 A Fuzzy Linear Regression Model Based on Dissemblance Index
Authors: Shih-Pin Chen, Shih-Syuan You
Abstract:
Fuzzy regression models are useful for investigating the relationship between explanatory variables and responses in fuzzy environments. To overcome the deficiencies of previous models and increase the explanatory power of fuzzy data, the graded mean integration (GMI) representation is applied to determine representative crisp regression coefficients. A fuzzy regression model is constructed based on the modified dissemblance index (MDI), which can precisely measure the actual total error. Compared with previous studies based on the proposed MDI and distance criterion, the results from commonly used test examples show that the proposed fuzzy linear regression model has higher explanatory power and forecasting accuracy.Keywords: Dissemblance index, fuzzy linear regression, graded mean integration, mathematical programming.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14424472 Orthogonal Regression for Nonparametric Estimation of Errors-in-Variables Models
Authors: Anastasiia Yu. Timofeeva
Abstract:
Two new algorithms for nonparametric estimation of errors-in-variables models are proposed. The first algorithm is based on penalized regression spline. The spline is represented as a piecewise-linear function and for each linear portion orthogonal regression is estimated. This algorithm is iterative. The second algorithm involves locally weighted regression estimation. When the independent variable is measured with error such estimation is a complex nonlinear optimization problem. The simulation results have shown the advantage of the second algorithm under the assumption that true smoothing parameters values are known. Nevertheless the use of some indexes of fit to smoothing parameters selection gives the similar results and has an oversmoothing effect.
Keywords: Grade point average, orthogonal regression, penalized regression spline, locally weighted regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21334471 Internet Purchases in European Union Countries: Multiple Linear Regression Approach
Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić
Abstract:
This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analyzed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analyzed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.
Keywords: European Union, Internet purchases, multiple linear regression model, outlier
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29554470 The Relative Efficiency of Parameter Estimation in Linear Weighted Regression
Authors: Baoguang Tian, Nan Chen
Abstract:
A new relative efficiency in linear model in reference is instructed into the linear weighted regression, and its upper and lower bound are proposed. In the linear weighted regression model, for the best linear unbiased estimation of mean matrix respect to the least-squares estimation, two new relative efficiencies are given, and their upper and lower bounds are also studied.
Keywords: Linear weighted regression, Relative efficiency, Mean matrix, Trace.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24734469 Selection of Designs in Ordinal Regression Models under Linear Predictor Misspecification
Authors: Ishapathik Das
Abstract:
The purpose of this article is to find a method of comparing designs for ordinal regression models using quantile dispersion graphs in the presence of linear predictor misspecification. The true relationship between response variable and the corresponding control variables are usually unknown. Experimenter assumes certain form of the linear predictor of the ordinal regression models. The assumed form of the linear predictor may not be correct always. Thus, the maximum likelihood estimates (MLE) of the unknown parameters of the model may be biased due to misspecification of the linear predictor. In this article, the uncertainty in the linear predictor is represented by an unknown function. An algorithm is provided to estimate the unknown function at the design points where observations are available. The unknown function is estimated at all points in the design region using multivariate parametric kriging. The comparison of the designs are based on a scalar valued function of the mean squared error of prediction (MSEP) matrix, which incorporates both variance and bias of the prediction caused by the misspecification in the linear predictor. The designs are compared using quantile dispersion graphs approach. The graphs also visually depict the robustness of the designs on the changes in the parameter values. Numerical examples are presented to illustrate the proposed methodology.Keywords: Model misspecification, multivariate kriging, multivariate logistic link, ordinal response models, quantile dispersion graphs.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10024468 Optimization of Slider Crank Mechanism Using Design of Experiments and Multi-Linear Regression
Authors: Galal Elkobrosy, Amr M. Abdelrazek, Bassuny M. Elsouhily, Mohamed E. Khidr
Abstract:
Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3rd degree to 1st degree and suggested valid predictions and stable explanations.
Keywords: Design of experiments, regression analysis, SI Engine, statistical modeling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12524467 A Comparison of the Nonparametric Regression Models using Smoothing Spline and Kernel Regression
Authors: Dursun Aydin
Abstract:
This paper study about using of nonparametric models for Gross National Product data in Turkey and Stanford heart transplant data. It is discussed two nonparametric techniques called smoothing spline and kernel regression. The main goal is to compare the techniques used for prediction of the nonparametric regression models. According to the results of numerical studies, it is concluded that smoothing spline regression estimators are better than those of the kernel regression.Keywords: Kernel regression, Nonparametric models, Prediction, Smoothing spline.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31014466 Estimating Regression Parameters in Linear Regression Model with a Censored Response Variable
Authors: Jesus Orbe, Vicente Nunez-Anton
Abstract:
In this work we study the effect of several covariates X on a censored response variable T with unknown probability distribution. In this context, most of the studies in the literature can be located in two possible general classes of regression models: models that study the effect the covariates have on the hazard function; and models that study the effect the covariates have on the censored response variable. Proposals in this paper are in the second class of models and, more specifically, on least squares based model approach. Thus, using the bootstrap estimate of the bias, we try to improve the estimation of the regression parameters by reducing their bias, for small sample sizes. Simulation results presented in the paper show that, for reasonable sample sizes and censoring levels, the bias is always smaller for the new proposals.
Keywords: Censored response variable, regression, bias.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14764465 Clustering Protein Sequences with Tailored General Regression Model Technique
Authors: G. Lavanya Devi, Allam Appa Rao, A. Damodaram, GR Sridhar, G. Jaya Suma
Abstract:
Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them.Keywords: Clustering, General Regression Model, Protein Sequences, Similarity Measure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15674464 Fuzzy Logic Approach to Robust Regression Models of Uncertain Medical Categories
Authors: Arkady Bolotin
Abstract:
Dichotomization of the outcome by a single cut-off point is an important part of various medical studies. Usually the relationship between the resulted dichotomized dependent variable and explanatory variables is analyzed with linear regression, probit regression or logistic regression. However, in many real-life situations, a certain cut-off point dividing the outcome into two groups is unknown and can be specified only approximately, i.e. surrounded by some (small) uncertainty. It means that in order to have any practical meaning the regression model must be robust to this uncertainty. In this paper, we show that neither the beta in the linear regression model, nor its significance level is robust to the small variations in the dichotomization cut-off point. As an alternative robust approach to the problem of uncertain medical categories, we propose to use the linear regression model with the fuzzy membership function as a dependent variable. This fuzzy membership function denotes to what degree the value of the underlying (continuous) outcome falls below or above the dichotomization cut-off point. In the paper, we demonstrate that the linear regression model of the fuzzy dependent variable can be insensitive against the uncertainty in the cut-off point location. In the paper we present the modeling results from the real study of low hemoglobin levels in infants. We systematically test the robustness of the binomial regression model and the linear regression model with the fuzzy dependent variable by changing the boundary for the category Anemia and show that the behavior of the latter model persists over a quite wide interval.
Keywords: Categorization, Uncertain medical categories, Binomial regression model, Fuzzy dependent variable, Robustness.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15594463 Detecting Earnings Management via Statistical and Neural Network Techniques
Authors: Mohammad Namazi, Mohammad Sadeghzadeh Maharluie
Abstract:
Predicting earnings management is vital for the capital market participants, financial analysts and managers. The aim of this research is attempting to respond to this query: Is there a significant difference between the regression model and neural networks’ models in predicting earnings management, and which one leads to a superior prediction of it? In approaching this question, a Linear Regression (LR) model was compared with two neural networks including Multi-Layer Perceptron (MLP), and Generalized Regression Neural Network (GRNN). The population of this study includes 94 listed companies in Tehran Stock Exchange (TSE) market from 2003 to 2011. After the results of all models were acquired, ANOVA was exerted to test the hypotheses. In general, the summary of statistical results showed that the precision of GRNN did not exhibit a significant difference in comparison with MLP. In addition, the mean square error of the MLP and GRNN showed a significant difference with the multi variable LR model. These findings support the notion of nonlinear behavior of the earnings management. Therefore, it is more appropriate for capital market participants to analyze earnings management based upon neural networks techniques, and not to adopt linear regression models.Keywords: Earnings management, generalized regression neural networks, linear regression, multi-layer perceptron, Tehran stock exchange.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21044462 System Identification Based on Stepwise Regression for Dynamic Market Representation
Authors: Alexander Efremov
Abstract:
A system for market identification (SMI) is presented. The resulting representations are multivariable dynamic demand models. The market specifics are analyzed. Appropriate models and identification techniques are chosen. Multivariate static and dynamic models are used to represent the market behavior. The steps of the first stage of SMI, named data preprocessing, are mentioned. Next, the second stage, which is the model estimation, is considered in more details. Stepwise linear regression (SWR) is used to determine the significant cross-effects and the orders of the model polynomials. The estimates of the model parameters are obtained by a numerically stable estimator. Real market data is used to analyze SMI performance. The main conclusion is related to the applicability of multivariate dynamic models for representation of market systems.Keywords: market identification, dynamic models, stepwise regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16184461 Computational Aspects of Regression Analysis of Interval Data
Authors: Michal Cerny
Abstract:
We consider linear regression models where both input data (the values of independent variables) and output data (the observations of the dependent variable) are interval-censored. We introduce a possibilistic generalization of the least squares estimator, so called OLS-set for the interval model. This set captures the impact of the loss of information on the OLS estimator caused by interval censoring and provides a tool for quantification of this effect. We study complexity-theoretic properties of the OLS-set. We also deal with restricted versions of the general interval linear regression model, in particular the crisp input – interval output model. We give an argument that natural descriptions of the OLS-set in the crisp input – interval output cannot be computed in polynomial time. Then we derive easily computable approximations for the OLS-set which can be used instead of the exact description. We illustrate the approach by an example.
Keywords: Linear regression, interval-censored data, computational complexity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14704460 On the outlier Detection in Nonlinear Regression
Authors: Hossein Riazoshams, Midi Habshah, Jr., Mohamad Bakri Adam
Abstract:
The detection of outliers is very essential because of their responsibility for producing huge interpretative problem in linear as well as in nonlinear regression analysis. Much work has been accomplished on the identification of outlier in linear regression, but not in nonlinear regression. In this article we propose several outlier detection techniques for nonlinear regression. The main idea is to use the linear approximation of a nonlinear model and consider the gradient as the design matrix. Subsequently, the detection techniques are formulated. Six detection measures are developed that combined with three estimation techniques such as the Least-Squares, M and MM-estimators. The study shows that among the six measures, only the studentized residual and Cook Distance which combined with the MM estimator, consistently capable of identifying the correct outliers.Keywords: Nonlinear Regression, outliers, Gradient, LeastSquare, M-estimate, MM-estimate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31804459 Economic Dispatch Fuzzy Linear Regression and Optimization
Authors: A. K. Al-Othman
Abstract:
This study presents a new approach based on Tanaka's fuzzy linear regression (FLP) algorithm to solve well-known power system economic load dispatch problem (ELD). Tanaka's fuzzy linear regression (FLP) formulation will be employed to compute the optimal solution of optimization problem after linearization. The unknowns are expressed as fuzzy numbers with a triangular membership function that has middle and spread value reflected on the unknowns. The proposed fuzzy model is formulated as a linear optimization problem, where the objective is to minimize the sum of the spread of the unknowns, subject to double inequality constraints. Linear programming technique is employed to obtain the middle and the symmetric spread for every unknown (power generation level). Simulation results of the proposed approach will be compared with those reported in literature.Keywords: Economic Dispatch, Fuzzy Linear Regression (FLP)and Optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22934458 Research on the Problems of Housing Prices in Qingdao from a Macro Perspective
Authors: Liu Zhiyuan, Sun Zongdi, Liu Zhiyuan, Sun Zongdi
Abstract:
Qingdao is a seaside city. Taking into account the characteristics of Qingdao, this article established a multiple linear regression model to analyze the impact of macroeconomic factors on housing prices. We used stepwise regression method to make multiple linear regression analysis, and made statistical analysis of F test values and T test values. According to the analysis results, the model is continuously optimized. Finally, this article obtained the multiple linear regression equation and the influencing factors, and the reliability of the model was verified by F test and T test.
Keywords: Housing prices, multiple linear regression model, macroeconomic factors, Qingdao City.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11794457 Dry Relaxation Shrinkage Prediction of Bordeaux Fiber Using a Feed Forward Neural
Authors: Baeza S. Roberto
Abstract:
The knitted fabric suffers a deformation in its dimensions due to stretching and tension factors, transverse and longitudinal respectively, during the process in rectilinear knitting machines so it performs a dry relaxation shrinkage procedure and thermal action of prefixed to obtain stable conditions in the knitting. This paper presents a dry relaxation shrinkage prediction of Bordeaux fiber using a feed forward neural network and linear regression models. Six operational alternatives of shrinkage were predicted. A comparison of the results was performed finding neural network models with higher levels of explanation of the variability and prediction. The presence of different reposes is included. The models were obtained through a neural toolbox of Matlab and Minitab software with real data in a knitting company of Southern Guanajuato. The results allow predicting dry relaxation shrinkage of each alternative operation.Keywords: Neural network, dry relaxation, knitting, linear regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17604456 Modeling Aeration of Sharp Crested Weirs by Using Support Vector Machines
Authors: Arun Goel
Abstract:
The present paper attempts to investigate the prediction of air entrainment rate and aeration efficiency of a free overfall jets issuing from a triangular sharp crested weir by using regression based modelling. The empirical equations, Support vector machine (polynomial and radial basis function) models and the linear regression techniques were applied on the triangular sharp crested weirs relating the air entrainment rate and the aeration efficiency to the input parameters namely drop height, discharge, and vertex angle. It was observed that there exists a good agreement between the measured values and the values obtained using empirical equations, Support vector machine (Polynomial and rbf) models and the linear regression techniques. The test results demonstrated that the SVM based (Poly & rbf) model also provided acceptable prediction of the measured values with reasonable accuracy along with empirical equations and linear regression techniques in modelling the air entrainment rate and the aeration efficiency of a free overfall jets issuing from triangular sharp crested weir. Further sensitivity analysis has also been performed to study the impact of input parameter on the output in terms of air entrainment rate and aeration efficiency.Keywords: Air entrainment rate, dissolved oxygen, regression, SVM, weir.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19564455 Comparison of Machine Learning Techniques for Single Imputation on Audiograms
Authors: Sarah Beaver, Renee Bryce
Abstract:
Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125 Hz to 8000 Hz. The data contain patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R2 values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R2 values for the best models for KNN ranges from .89 to .95. The best imputation models received R2 between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our imputation models versus constant imputations by a two percent increase.
Keywords: Machine Learning, audiograms, data imputations, single imputations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614454 Predictive Models for Compressive Strength of High Performance Fly Ash Cement Concrete for Pavements
Authors: S. M. Gupta, Vanita Aggarwal, Som Nath Sachdeva
Abstract:
The work reported through this paper is an experimental work conducted on High Performance Concrete (HPC) with super plasticizer with the aim to develop some models suitable for prediction of compressive strength of HPC mixes. In this study, the effect of varying proportions of fly ash (0% to 50% @ 10% increment) on compressive strength of high performance concrete has been evaluated. The mix designs studied were M30, M40 and M50 to compare the effect of fly ash addition on the properties of these concrete mixes. In all eighteen concrete mixes that have been designed, three were conventional concretes for three grades under discussion and fifteen were HPC with fly ash with varying percentages of fly ash. The concrete mix designing has been done in accordance with Indian standard recommended guidelines. All the concrete mixes have been studied in terms of compressive strength at 7 days, 28 days, 90 days, and 365 days. All the materials used have been kept same throughout the study to get a perfect comparison of values of results. The models for compressive strength prediction have been developed using Linear Regression method (LR), Artificial Neural Network (ANN) and Leave-One-Out Validation (LOOV) methods.
Keywords: ANN, concrete mixes, compressive strength, fly ash, high performance concrete, linear regression, strength prediction models.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20784453 Two New Relative Efficiencies of Linear Weighted Regression
Authors: Shuimiao Wan, Chao Yuan, Baoguang Tian
Abstract:
In statistics parameter theory, usually the parameter estimations have two kinds, one is the least-square estimation (LSE), and the other is the best linear unbiased estimation (BLUE). Due to the determining theorem of minimum variance unbiased estimator (MVUE), the parameter estimation of BLUE in linear model is most ideal. But since the calculations are complicated or the covariance is not given, people are hardly to get the solution. Therefore, people prefer to use LSE rather than BLUE. And this substitution will take some losses. To quantize the losses, many scholars have presented many kinds of different relative efficiencies in different views. For the linear weighted regression model, this paper discusses the relative efficiencies of LSE of β to BLUE of β. It also defines two new relative efficiencies and gives their lower bounds.Keywords: Linear weighted regression, Relative efficiency, Lower bound, Parameter estimation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21184452 Statistical Analysis and Impact Forecasting of Connected and Autonomous Vehicles on the Environment: Case Study in the State of Maryland
Authors: Alireza Ansariyar, Safieh Laaly
Abstract:
Over the last decades, the vehicle industry has shown increased interest in integrating autonomous, connected, and electrical technologies in vehicle design with the primary hope of improving mobility and road safety while reducing transportation’s environmental impact. Using the State of Maryland (M.D.) in the United States as a pilot study, this research investigates Connected and Autonomous Vehicles (CAVs) fuel consumption and air pollutants including Carbon Monoxide (CO), Particulate Matter (PM), and Nitrogen Oxides (NOx) and utilizes meaningful linear regression models to predict CAV’s environmental effects. Maryland transportation network was simulated in VISUM software, and data on a set of variables were collected through a comprehensive survey. The number of pollutants and fuel consumption were obtained for the time interval 2010 to 2021 from the macro simulation. Eventually, four linear regression models were proposed to predict the amount of C.O., NOx, PM pollutants, and fuel consumption in the future. The results highlighted that CAVs’ pollutants and fuel consumption have a significant correlation with the income, age, and race of the CAV customers. Furthermore, the reliability of four statistical models was compared with the reliability of macro simulation model outputs in the year 2030. The error of three pollutants and fuel consumption was obtained at less than 9% by statistical models in SPSS. This study is expected to assist researchers and policymakers with planning decisions to reduce CAV environmental impacts in M.D.
Keywords: Connected and autonomous vehicles, statistical model, environmental effects, pollutants and fuel consumption, VISUM, linear regression models.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4624451 Time Series Regression with Meta-Clusters
Authors: Monika Chuchro
Abstract:
This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.
Keywords: Clustering, Data analysis, Data mining, Predictive models.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19514450 Density Estimation using Generalized Linear Model and a Linear Combination of Gaussians
Authors: Aly Farag, Ayman El-Baz, Refaat Mohamed
Abstract:
In this paper we present a novel approach for density estimation. The proposed approach is based on using the logistic regression model to get initial density estimation for the given empirical density. The empirical data does not exactly follow the logistic regression model, so, there will be a deviation between the empirical density and the density estimated using logistic regression model. This deviation may be positive and/or negative. In this paper we use a linear combination of Gaussian (LCG) with positive and negative components as a model for this deviation. Also, we will use the expectation maximization (EM) algorithm to estimate the parameters of LCG. Experiments on real images demonstrate the accuracy of our approach.
Keywords: Logistic regression model, Expectationmaximization, Segmentation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17334449 Climate Change in Albania and Its Effect on Cereal Yield
Abstract:
This study is focused on analyzing climate change in Albania and its potential effects on cereal yields. Initially, monthly temperature and rainfalls in Albania were studied for the period 1960-2021. Climacteric variables are important variables when trying to model cereal yield behavior, especially when significant changes in weather conditions are observed. For this purpose, in the second part of the study, linear and nonlinear models explaining cereal yield are constructed for the same period, 1960-2021. The multiple linear regression analysis and lasso regression method are applied to the data between cereal yield and each independent variable: average temperature, average rainfall, fertilizer consumption, arable land, land under cereal production, and nitrous oxide emissions. In our regression model, heteroscedasticity is not observed, data follow a normal distribution, and there is a low correlation between factors, so we do not have the problem of multicollinearity. Machine learning methods, such as Random Forest (RF), are used to predict cereal yield responses to climacteric and other variables. RF showed high accuracy compared to the other statistical models in the prediction of cereal yield. We found that changes in average temperature negatively affect cereal yield. The coefficients of fertilizer consumption, arable land, and land under cereal production are positively affecting production. Our results show that the RF method is an effective and versatile machine-learning method for cereal yield prediction compared to the other two methods: multiple linear regression and lasso regression method.
Keywords: Cereal yield, climate change, machine learning, multiple regression model, random forest.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2494448 Extended Least Squares LS–SVM
Authors: József Valyon, Gábor Horváth
Abstract:
Among neural models the Support Vector Machine (SVM) solutions are attracting increasing attention, mostly because they eliminate certain crucial questions involved by neural network construction. The main drawback of standard SVM is its high computational complexity, therefore recently a new technique, the Least Squares SVM (LS–SVM) has been introduced. In this paper we present an extended view of the Least Squares Support Vector Regression (LS–SVR), which enables us to develop new formulations and algorithms to this regression technique. Based on manipulating the linear equation set -which embodies all information about the regression in the learning process- some new methods are introduced to simplify the formulations, speed up the calculations and/or provide better results.Keywords: Function estimation, Least–Squares Support VectorMachines, Regression, System Modeling
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20094447 Estimating Reaction Rate Constants with Neural Networks
Authors: Benedek Kovacs, Janos Toth
Abstract:
Solutions are proposed for the central problem of estimating the reaction rate coefficients in homogeneous kinetics. The first is based upon the fact that the right hand side of a kinetic differential equation is linear in the rate constants, whereas the second one uses the technique of neural networks. This second one is discussed deeply and its advantages, disadvantages and conditions of applicability are analyzed in the mirror of the first one. Numerical analysis carried out on practical models using simulated data, and our programs written in Mathematica.
Keywords: Neural networks, parameter estimation, linear regression, kinetic models, reaction rate coefficients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19974446 Estimation of Time -Varying Linear Regression with Unknown Time -Volatility via Continuous Generalization of the Akaike Information Criterion
Authors: Elena Ezhova, Vadim Mottl, Olga Krasotkina
Abstract:
The problem of estimating time-varying regression is inevitably concerned with the necessity to choose the appropriate level of model volatility - ranging from the full stationarity of instant regression models to their absolute independence of each other. In the stationary case the number of regression coefficients to be estimated equals that of regressors, whereas the absence of any smoothness assumptions augments the dimension of the unknown vector by the factor of the time-series length. The Akaike Information Criterion is a commonly adopted means of adjusting a model to the given data set within a succession of nested parametric model classes, but its crucial restriction is that the classes are rigidly defined by the growing integer-valued dimension of the unknown vector. To make the Kullback information maximization principle underlying the classical AIC applicable to the problem of time-varying regression estimation, we extend it onto a wider class of data models in which the dimension of the parameter is fixed, but the freedom of its values is softly constrained by a family of continuously nested a priori probability distributions.Keywords: Time varying regression, time-volatility of regression coefficients, Akaike Information Criterion (AIC), Kullback information maximization principle.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534