Search results for: least squares regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3405

Search results for: least squares regression

3405 Least Squares Method Identification of Corona Current-Voltage Characteristics and Electromagnetic Field in Electrostatic Precipitator

Authors: H. Nouri, I. E. Achouri, A. Grimes, H. Ait Said, M. Aissou, Y. Zebboudj

Abstract:

This paper aims to analysis the behaviour of DC corona discharge in wire-to-plate electrostatic precipitators (ESP). Current-voltage curves are particularly analysed. Experimental results show that discharge current is strongly affected by the applied voltage. The proposed method of current identification is to use the method of least squares. Least squares problems that of into two categories: linear or ordinary least squares and non-linear least squares, depending on whether or not the residuals are linear in all unknowns. The linear least-squares problem occurs in statistical regression analysis; it has a closed-form solution. A closed-form solution (or closed form expression) is any formula that can be evaluated in a finite number of standard operations. The non-linear problem has no closed-form solution and is usually solved by iterative.

Keywords: electrostatic precipitator, current-voltage characteristics, least squares method, electric field, magnetic field

Procedia PDF Downloads 424
3404 QSRR Analysis of 17-Picolyl and 17-Picolinylidene Androstane Derivatives Based on Partial Least Squares and Principal Component Regression

Authors: Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Lidija Jevrić, Evgenija Djurendić, Jovana Ajduković

Abstract:

There are several methods for determination of the lipophilicity of biologically active compounds, however chromatography has been shown as a very suitable method for this purpose. Chromatographic (C18-RP-HPLC) analysis of a series of 24 17-picolyl and 17-picolinylidene androstane derivatives was carried out. The obtained retention indices (logk, methanol (90%) / water (10%)) were correlated with calculated physicochemical and lipophilicity descriptors. The QSRR analysis was carried out applying principal component regression (PCR) and partial least squares regression (PLS). The PCR and PLS model were selected on the basis of the highest variance and the lowest root mean square error of cross-validation. The obtained PCR and PLS model successfully correlate the calculated molecular descriptors with logk parameter indicating the significance of the lipophilicity of compounds in chromatographic process. On the basis of the obtained results it can be concluded that the obtained logk parameters of the analyzed androstane derivatives can be considered as their chromatographic lipophilicity. These results are the part of the project No. 114-451-347/2015-02, financially supported by the Provincial Secretariat for Science and Technological Development of Vojvodina and CMST COST Action CM1105.

Keywords: androstane derivatives, chromatography, molecular structure, principal component regression, partial least squares regression

Procedia PDF Downloads 269
3403 Genetic Algorithm to Construct and Enumerate 4×4 Pan-Magic Squares

Authors: Younis R. Elhaddad, Mohamed A. Alshaari

Abstract:

Since 2700 B.C the problem of constructing magic squares attracts many researchers. Magic squares one of most difficult challenges for mathematicians. In this work, we describe how to construct and enumerate Pan- magic squares using genetic algorithm, using new chromosome encoding technique. The results were promising within reasonable time.

Keywords: genetic algorithm, magic square, pan-magic square, computational intelligence

Procedia PDF Downloads 568
3402 Application of the Least Squares Method in the Adjustment of Chlorodifluoromethane (HCFC-142b) Regression Models

Authors: L. J. de Bessa Neto, V. S. Filho, J. V. Ferreira Nunes, G. C. Bergamo

Abstract:

There are many situations in which human activities have significant effects on the environment. Damage to the ozone layer is one of them. The objective of this work is to use the Least Squares Method, considering the linear, exponential, logarithmic, power and polynomial models of the second degree, to analyze through the coefficient of determination (R²), which model best fits the behavior of the chlorodifluoromethane (HCFC-142b) in parts per trillion between 1992 and 2018, as well as estimates of future concentrations between 5 and 10 periods, i.e. the concentration of this pollutant in the years 2023 and 2028 in each of the adjustments. A total of 809 observations of the concentration of HCFC-142b in one of the monitoring stations of gases precursors of the deterioration of the ozone layer during the period of time studied were selected and, using these data, the statistical software Excel was used for make the scatter plots of each of the adjustment models. With the development of the present study, it was observed that the logarithmic fit was the model that best fit the data set, since besides having a significant R² its adjusted curve was compatible with the natural trend curve of the phenomenon.

Keywords: chlorodifluoromethane (HCFC-142b), ozone, least squares method, regression models

Procedia PDF Downloads 115
3401 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Models

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the direct and indirect effects of variables in path models. One or more structural regression equations are used to estimate a series of parameters in path models to find the better fit of data. However, sometimes the assumptions of classical regression models, such as ordinary least squares (OLS), are violated by the nature of the data, resulting in insignificant direct and indirect effects of exogenous variables. This article aims to explore the effectiveness of a copula-based regression approach as an alternative to classical regression, specifically when variables are linked through an elliptical copula.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 30
3400 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size

Authors: Jude Opara, Esemokumo Perewarebo Akpos

Abstract:

This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.

Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS

Procedia PDF Downloads 299
3399 A Fuzzy Nonlinear Regression Model for Interval Type-2 Fuzzy Sets

Authors: O. Poleshchuk, E. Komarov

Abstract:

This paper presents a regression model for interval type-2 fuzzy sets based on the least squares estimation technique. Unknown coefficients are assumed to be triangular fuzzy numbers. The basic idea is to determine aggregation intervals for type-1 fuzzy sets, membership functions of whose are low membership function and upper membership function of interval type-2 fuzzy set. These aggregation intervals were called weighted intervals. Low and upper membership functions of input and output interval type-2 fuzzy sets for developed regression models are considered as piecewise linear functions.

Keywords: interval type-2 fuzzy sets, fuzzy regression, weighted interval

Procedia PDF Downloads 363
3398 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 65
3397 Regression Analysis in Estimating Stream-Flow and the Effect of Hierarchical Clustering Analysis: A Case Study in Euphrates-Tigris Basin

Authors: Goksel Ezgi Guzey, Bihrat Onoz

Abstract:

The scarcity of streamflow gauging stations and the increasing effects of global warming cause designing water management systems to be very difficult. This study is a significant contribution to assessing regional regression models for estimating streamflow. In this study, simulated meteorological data was related to the observed streamflow data from 1971 to 2020 for 33 stream gauging stations of the Euphrates-Tigris Basin. Ordinary least squares regression was used to predict flow for 2020-2100 with the simulated meteorological data. CORDEX- EURO and CORDEX-MENA domains were used with 0.11 and 0.22 grids, respectively, to estimate climate conditions under certain climate scenarios. Twelve meteorological variables simulated by two regional climate models, RCA4 and RegCM4, were used as independent variables in the ordinary least squares regression, where the observed streamflow was the dependent variable. The variability of streamflow was then calculated with 5-6 meteorological variables and watershed characteristics such as area and height prior to the application. Of the regression analysis of 31 stream gauging stations' data, the stations were subjected to a clustering analysis, which grouped the stations in two clusters in terms of their hydrometeorological properties. Two streamflow equations were found for the two clusters of stream gauging stations for every domain and every regional climate model, which increased the efficiency of streamflow estimation by a range of 10-15% for all the models. This study underlines the importance of homogeneity of a region in estimating streamflow not only in terms of the geographical location but also in terms of the meteorological characteristics of that region.

Keywords: hydrology, streamflow estimation, climate change, hydrologic modeling, HBV, hydropower

Procedia PDF Downloads 120
3396 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 132
3395 Public Squares and Their Potential for Social Interactions: A Case Study of Historical Public Squares in Tehran

Authors: Asma Mehan

Abstract:

Under the thrust of technological changes, population growth and vehicular traffic, Iranian historical squares have lost their significance and they are no longer the main social nodes of the society. This research focuses on how historical public squares can inspire designers to enhance social interactions among citizens in Iranian urban context. Moreover, the recent master plan of Tehran demonstrates the lack of public spaces designed for the purpose of people’s social gatherings. For filling this gap, first the current situation of 7 selected primary historical public squares in Tehran including Sabze Meydan, Arg, Topkhaneh, Baherstan, Mokhber-al-dole, Rah Ahan and Hassan Abad have been compared. Later, the influencing elements on social interactions of the public squares such as subjective factors (human relationships and memories) and objective factors (natural and built environment) have been investigated. As a conclusion, some strategies are proposed for improving social interactions in historical public squares like; holding cultural, national, athletic and religious events, defining different and new functions in public squares’ surrounding, increasing pedestrian routs, reviving the collective memory, demonstrating the historical importance of square, eliminating visual obstacles across the square, organization the natural elements of the square, appropriate pavement for social activities. Finally, it is argued that the combination of all influencing factors which are: human interactions, natural elements and built environment criteria will lead to enhance the historical public squares’ potential for social interaction.

Keywords: historical square, Iranian public square, social interaction, Tehran

Procedia PDF Downloads 395
3394 Estimation of Coefficients of Ridge and Principal Components Regressions with Multicollinear Data

Authors: Rajeshwar Singh

Abstract:

The presence of multicollinearity is common in handling with several explanatory variables simultaneously due to exhibiting a linear relationship among them. A great problem arises in understanding the impact of explanatory variables on the dependent variable. Thus, the method of least squares estimation gives inexact estimates. In this case, it is advised to detect its presence first before proceeding further. Using the ridge regression degree of its occurrence is reduced but principal components regression gives good estimates in this situation. This paper discusses well-known techniques of the ridge and principal components regressions and applies to get the estimates of coefficients by both techniques. In addition to it, this paper also discusses the conflicting claim on the discovery of the method of ridge regression based on available documents.

Keywords: conflicting claim on credit of discovery of ridge regression, multicollinearity, principal components and ridge regressions, variance inflation factor

Procedia PDF Downloads 410
3393 Quantitative Structure-Activity Relationship Study of Some Quinoline Derivatives as Antimalarial Agents

Authors: M. Ouassaf, S. Belaid

Abstract:

A series of quinoline derivatives with antimalarial activity were subjected to two-dimensional quantitative structure-activity relationship (2D-QSAR) studies. Three models were implemented using multiple regression linear MLR, a regression partial least squares (PLS), nonlinear regression (MNLR), to see which descriptors are closely related to the activity biologic. We relied on a principal component analysis (PCA). Based on our results, a comparison of the quality of, MLR, PLS, and MNLR models shows that the MNLR (R = 0.914 and R² = 0.835, RCV= 0.853) models have substantially better predictive capability because the MNLR approach gives better results than MLR (R = 0.835 and R² = 0,752, RCV=0.601)), PLS (R = 0.742 and R² = 0.552, RCV=0.550) The model of MNLR gave statistically significant results and showed good stability to data variation in leave-one-out cross-validation. The obtained results suggested that our proposed model MNLR may be useful to predict the biological activity of derivatives of quinoline.

Keywords: antimalarial, quinoline, QSAR, PCA, MLR , MNLR, MLR

Procedia PDF Downloads 150
3392 Behind Fuzzy Regression Approach: An Exploration Study

Authors: Lavinia B. Dulla

Abstract:

The exploration study of the fuzzy regression approach attempts to present that fuzzy regression can be used as a possible alternative to classical regression. It likewise seeks to assess the differences and characteristics of simple linear regression and fuzzy regression using the width of prediction interval, mean absolute deviation, and variance of residuals. Based on the simple linear regression model, the fuzzy regression approach is worth considering as an alternative to simple linear regression when the sample size is between 10 and 20. As the sample size increases, the fuzzy regression approach is not applicable to use since the assumption regarding large sample size is already operating within the framework of simple linear regression. Nonetheless, it can be suggested for a practical alternative when decisions often have to be made on the basis of small data.

Keywords: fuzzy regression approach, minimum fuzziness criterion, interval regression, prediction interval

Procedia PDF Downloads 286
3391 Analytical Authentication of Butter Using Fourier Transform Infrared Spectroscopy Coupled with Chemometrics

Authors: M. Bodner, M. Scampicchio

Abstract:

Fourier Transform Infrared (FT-IR) spectroscopy coupled with chemometrics was used to distinguish between butter samples and non-butter samples. Further, quantification of the content of margarine in adulterated butter samples was investigated. Fingerprinting region (1400-800 cm–1) was used to develop unsupervised pattern recognition (Principal Component Analysis, PCA), supervised modeling (Soft Independent Modelling by Class Analogy, SIMCA), classification (Partial Least Squares Discriminant Analysis, PLS-DA) and regression (Partial Least Squares Regression, PLS-R) models. PCA of the fingerprinting region shows a clustering of the two sample types. All samples were classified in their rightful class by SIMCA approach; however, nine adulterated samples (between 1% and 30% w/w of margarine) were classified as belonging both at the butter class and at the non-butter one. In the two-class PLS-DA model’s (R2 = 0.73, RMSEP, Root Mean Square Error of Prediction = 0.26% w/w) sensitivity was 71.4% and Positive Predictive Value (PPV) 100%. Its threshold was calculated at 7% w/w of margarine in adulterated butter samples. Finally, PLS-R model (R2 = 0.84, RMSEP = 16.54%) was developed. PLS-DA was a suitable classification tool and PLS-R a proper quantification approach. Results demonstrate that FT-IR spectroscopy combined with PLS-R can be used as a rapid, simple and safe method to identify pure butter samples from adulterated ones and to determine the grade of adulteration of margarine in butter samples.

Keywords: adulterated butter, margarine, PCA, PLS-DA, PLS-R, SIMCA

Procedia PDF Downloads 135
3390 Analysis of Two Methods to Estimation Stochastic Demand in the Vehicle Routing Problem

Authors: Fatemeh Torfi

Abstract:

Estimation of stochastic demand in physical distribution in general and efficient transport routs management in particular is emerging as a crucial factor in urban planning domain. It is particularly important in some municipalities such as Tehran where a sound demand management calls for a realistic analysis of the routing system. The methodology involved critically investigating a fuzzy least-squares linear regression approach (FLLRs) to estimate the stochastic demands in the vehicle routing problem (VRP) bearing in mind the customer's preferences order. A FLLR method is proposed in solving the VRP with stochastic demands. Approximate-distance fuzzy least-squares (ADFL) estimator ADFL estimator is applied to original data taken from a case study. The SSR values of the ADFL estimator and real demand are obtained and then compared to SSR values of the nominal demand and real demand. Empirical results showed that the proposed methods can be viable in solving problems under circumstances of having vague and imprecise performance ratings. The results further proved that application of the ADFL was realistic and efficient estimator to face the stochastic demand challenges in vehicle routing system management and solve relevant problems.

Keywords: fuzzy least-squares, stochastic, location, routing problems

Procedia PDF Downloads 427
3389 Partial Least Square Regression for High-Dimentional and High-Correlated Data

Authors: Mohammed Abdullah Alshahrani

Abstract:

The research focuses on investigating the use of partial least squares (PLS) methodology for addressing challenges associated with high-dimensional correlated data. Recent technological advancements have led to experiments producing data characterized by a large number of variables compared to observations, with substantial inter-variable correlations. Such data patterns are common in chemometrics, where near-infrared (NIR) spectrometer calibrations record chemical absorbance levels across hundreds of wavelengths, and in genomics, where thousands of genomic regions' copy number alterations (CNA) are recorded from cancer patients. PLS serves as a widely used method for analyzing high-dimensional data, functioning as a regression tool in chemometrics and a classification method in genomics. It handles data complexity by creating latent variables (components) from original variables. However, applying PLS can present challenges. The study investigates key areas to address these challenges, including unifying interpretations across three main PLS algorithms and exploring unusual negative shrinkage factors encountered during model fitting. The research presents an alternative approach to addressing the interpretation challenge of predictor weights associated with PLS. Sparse estimation of predictor weights is employed using a penalty function combining a lasso penalty for sparsity and a Cauchy distribution-based penalty to account for variable dependencies. The results demonstrate sparse and grouped weight estimates, aiding interpretation and prediction tasks in genomic data analysis. High-dimensional data scenarios, where predictors outnumber observations, are common in regression analysis applications. Ordinary least squares regression (OLS), the standard method, performs inadequately with high-dimensional and highly correlated data. Copy number alterations (CNA) in key genes have been linked to disease phenotypes, highlighting the importance of accurate classification of gene expression data in bioinformatics and biology using regularized methods like PLS for regression and classification.

Keywords: partial least square regression, genetics data, negative filter factors, high dimensional data, high correlated data

Procedia PDF Downloads 44
3388 Online Estimation of Clutch Drag Torque in Wet Dual Clutch Transmission Based on Recursive Least Squares

Authors: Hongkui Li, Tongli Lu , Jianwu Zhang

Abstract:

This paper focuses on developing an estimation method of clutch drag torque in wet DCT. The modelling of clutch drag torque is investigated. As the main factor affecting the clutch drag torque, dynamic viscosity of oil is discussed. The paper proposes an estimation method of clutch drag torque based on recursive least squares by utilizing the dynamic equations of gear shifting synchronization process. The results demonstrate that the estimation method has good accuracy and efficiency.

Keywords: clutch drag torque, wet DCT, dynamic viscosity, recursive least squares

Procedia PDF Downloads 315
3387 Support Vector Regression with Weighted Least Absolute Deviations

Authors: Kang-Mo Jung

Abstract:

Least squares support vector machine (LS-SVM) is a penalized regression which considers both fitting and generalization ability of a model. However, the squared loss function is very sensitive to even single outlier. We proposed a weighted absolute deviation loss function for the robustness of the estimates in least absolute deviation support vector machine. The proposed estimates can be obtained by a quadratic programming algorithm. Numerical experiments on simulated datasets show that the proposed algorithm is competitive in view of robustness to outliers.

Keywords: least absolute deviation, quadratic programming, robustness, support vector machine, weight

Procedia PDF Downloads 519
3386 Variogram Fitting Based on the Wilcoxon Norm

Authors: Hazem Al-Mofleh, John Daniels, Joseph McKean

Abstract:

Within geostatistics research, effective estimation of the variogram points has been examined, particularly in developing robust alternatives. The parametric fit of these variogram points which eventually defines the kriging weights, however, has not received the same attention from a robust perspective. This paper proposes the use of the non-linear Wilcoxon norm over weighted non-linear least squares as a robust variogram fitting alternative. First, we introduce the concept of variogram estimation and fitting. Then, as an alternative to non-linear weighted least squares, we discuss the non-linear Wilcoxon estimator. Next, the robustness properties of the non-linear Wilcoxon are demonstrated using a contaminated spatial data set. Finally, under simulated conditions, increasing levels of contaminated spatial processes have their variograms points estimated and fit. In the fitting of these variogram points, both non-linear Weighted Least Squares and non-linear Wilcoxon fits are examined for efficiency. At all levels of contamination (including 0%), using a robust estimation and robust fitting procedure, the non-weighted Wilcoxon outperforms weighted Least Squares.

Keywords: non-linear wilcoxon, robust estimation, variogram estimation, wilcoxon norm

Procedia PDF Downloads 447
3385 Optimization of Machine Learning Regression Results: An Application on Health Expenditures

Authors: Songul Cinaroglu

Abstract:

Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.

Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure

Procedia PDF Downloads 217
3384 Urban Energy Demand Modelling: Spatial Analysis Approach

Authors: Hung-Chu Chen, Han Qi, Bauke de Vries

Abstract:

Energy consumption in the urban environment has attracted numerous researches in recent decades. However, it is comparatively rare to find literary works which investigated 3D spatial analysis of urban energy demand modelling. In order to analyze the spatial correlation between urban morphology and energy demand comprehensively, this paper investigates their relation by using the spatial regression tool. In addition, the spatial regression tool which is applied in this paper is ordinary least squares regression (OLS) and geographically weighted regression (GWR) model. Normalized Difference Built-up Index (NDBI), Normalized Difference Vegetation Index (NDVI), and building volume are explainers of urban morphology, which act as independent variables of Energy-land use (E-L) model. NDBI and NDVI are used as the index to describe five types of land use: urban area (U), open space (O), artificial green area (G), natural green area (V), and water body (W). Accordingly, annual electricity, gas demand and energy demand are dependent variables of the E-L model. Based on the analytical result of E-L model relation, it revealed that energy demand and urban morphology are closely connected and the possible causes and practical use are discussed. Besides, the spatial analysis methods of OLS and GWR are compared.

Keywords: energy demand model, geographically weighted regression, normalized difference built-up index, normalized difference vegetation index, spatial statistics

Procedia PDF Downloads 141
3383 A Comparison of Smoothing Spline Method and Penalized Spline Regression Method Based on Nonparametric Regression Model

Authors: Autcha Araveeporn

Abstract:

This paper presents a study about a nonparametric regression model consisting of a smoothing spline method and a penalized spline regression method. We also compare the techniques used for estimation and prediction of nonparametric regression model. We tried both methods with crude oil prices in dollars per barrel and the Stock Exchange of Thailand (SET) index. According to the results, it is concluded that smoothing spline method performs better than that of penalized spline regression method.

Keywords: nonparametric regression model, penalized spline regression method, smoothing spline method, Stock Exchange of Thailand (SET)

Procedia PDF Downloads 430
3382 Comparison between Some of Robust Regression Methods with OLS Method with Application

Authors: Sizar Abed Mohammed, Zahraa Ghazi Sadeeq

Abstract:

The use of the classic method, least squares (OLS) to estimate the linear regression parameters, when they are available assumptions, and capabilities that have good characteristics, such as impartiality, minimum variance, consistency, and so on. The development of alternative statistical techniques to estimate the parameters, when the data are contaminated with outliers. These are powerful methods (or resistance). In this paper, three of robust methods are studied, which are: Maximum likelihood type estimate M-estimator, Modified Maximum likelihood type estimate MM-estimator and Least Trimmed Squares LTS-estimator, and their results are compared with OLS method. These methods applied to real data taken from Duhok company for manufacturing furniture, the obtained results compared by using the criteria: Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE) and Mean Sum of Absolute Error (MSAE). Important conclusions that this study came up with are: a number of typical values detected by using four methods in the furniture line and very close to the data. This refers to the fact that close to the normal distribution of standard errors, but typical values in the doors line data, using OLS less than that detected by the powerful ways. This means that the standard errors of the distribution are far from normal departure. Another important conclusion is that the estimated values of the parameters by using the lifeline is very far from the estimated values using powerful methods for line doors, gave LTS- destined better results using standard MSE, and gave the M- estimator better results using standard MAPE. Moreover, we noticed that using standard MSAE, and MM- estimator is better. The programs S-plus (version 8.0, professional 2007), Minitab (version 13.2) and SPSS (version 17) are used to analyze the data.

Keywords: Robest, LTS, M estimate, MSE

Procedia PDF Downloads 227
3381 Hybrid Artificial Bee Colony and Least Squares Method for Rule-Based Systems Learning

Authors: Ahcene Habbi, Yassine Boudouaoui

Abstract:

This paper deals with the problem of automatic rule generation for fuzzy systems design. The proposed approach is based on hybrid artificial bee colony (ABC) optimization and weighted least squares (LS) method and aims to find the structure and parameters of fuzzy systems simultaneously. More precisely, two ABC based fuzzy modeling strategies are presented and compared. The first strategy uses global optimization to learn fuzzy models, the second one hybridizes ABC and weighted least squares estimate method. The performances of the proposed ABC and ABC-LS fuzzy modeling strategies are evaluated on complex modeling problems and compared to other advanced modeling methods.

Keywords: automatic design, learning, fuzzy rules, hybrid, swarm optimization

Procedia PDF Downloads 432
3380 Determination of the Effective Economic and/or Demographic Indicators in Classification of European Union Member and Candidate Countries Using Partial Least Squares Discriminant Analysis

Authors: Esra Polat

Abstract:

Partial Least Squares Discriminant Analysis (PLSDA) is a statistical method for classification and consists a classical Partial Least Squares Regression (PLSR) in which the dependent variable is a categorical one expressing the class membership of each observation. PLSDA can be applied in many cases when classical discriminant analysis cannot be applied. For example, when the number of observations is low and when the number of independent variables is high. When there are missing values, PLSDA can be applied on the data that is available. Finally, it is adapted when multicollinearity between independent variables is high. The aim of this study is to determine the economic and/or demographic indicators, which are effective in grouping the 28 European Union (EU) member countries and 7 candidate countries (including potential candidates Bosnia and Herzegovina (BiH) and Kosova) by using the data set obtained from database of the World Bank for 2014. Leaving the political issues aside, the analysis is only concerned with the economic and demographic variables that have the potential influence on country’s eligibility for EU entrance. Hence, in this study, both the performance of PLSDA method in classifying the countries correctly to their pre-defined groups (candidate or member) and the differences between the EU countries and candidate countries in terms of these indicators are analyzed. As a result of the PLSDA, the value of percentage correctness of 100 % indicates that overall of the 35 countries is classified correctly. Moreover, the most important variables that determine the statuses of member and candidate countries in terms of economic indicators are identified as 'external balance on goods and services (% GDP)', 'gross domestic savings (% GDP)' and 'gross national expenditure (% GDP)' that means for the 2014 economical structure of countries is the most important determinant of EU membership. Subsequently, the model validated to prove the predictive ability by using the data set for 2015. For prediction sample, %97,14 of the countries are correctly classified. An interesting result is obtained for only BiH, which is still a potential candidate for EU, predicted as a member of EU by using the indicators data set for 2015 as a prediction sample. Although BiH has made a significant transformation from a war-torn country to a semi-functional state, ethnic tensions, nationalistic rhetoric and political disagreements are still evident, which inhibit Bosnian progress towards the EU.

Keywords: classification, demographic indicators, economic indicators, European Union, partial least squares discriminant analysis

Procedia PDF Downloads 276
3379 Orthogonal Regression for Nonparametric Estimation of Errors-In-Variables Models

Authors: Anastasiia Yu. Timofeeva

Abstract:

Two new algorithms for nonparametric estimation of errors-in-variables models are proposed. The first algorithm is based on penalized regression spline. The spline is represented as a piecewise-linear function and for each linear portion orthogonal regression is estimated. This algorithm is iterative. The second algorithm involves locally weighted regression estimation. When the independent variable is measured with error such estimation is a complex nonlinear optimization problem. The simulation results have shown the advantage of the second algorithm under the assumption that true smoothing parameters values are known. Nevertheless the use of some indexes of fit to smoothing parameters selection gives the similar results and has an oversmoothing effect.

Keywords: grade point average, orthogonal regression, penalized regression spline, locally weighted regression

Procedia PDF Downloads 407
3378 Dynamic Process Monitoring of an Ammonia Synthesis Fixed-Bed Reactor

Authors: Bothinah Altaf, Gary Montague, Elaine B. Martin

Abstract:

This study involves the modeling and monitoring of an ammonia synthesis fixed-bed reactor using partial least squares (PLS) and its variants. The process exhibits complex dynamic behavior due to the presence of heat recycling and feed quench. One limitation of static PLS model in this situation is that it does not take account of the process dynamics and hence dynamic PLS was used. Although it showed, superior performance to static PLS in terms of prediction, the monitoring scheme was inappropriate hence adaptive PLS was considered. A limitation of adaptive PLS is that non-conforming observations also contribute to the model, therefore, a new adaptive approach was developed, robust adaptive dynamic PLS. This approach updates a dynamic PLS model and is robust to non-representative data. The developed methodology showed a clear improvement over existing approaches in terms of the modeling of the reactor and the detection of faults.

Keywords: ammonia synthesis fixed-bed reactor, dynamic partial least squares modeling, recursive partial least squares, robust modeling

Procedia PDF Downloads 384
3377 A Learning-Based EM Mixture Regression Algorithm

Authors: Yi-Cheng Tian, Miin-Shen Yang

Abstract:

The mixture likelihood approach to clustering is a popular clustering method where the expectation and maximization (EM) algorithm is the most used mixture likelihood method. In the literature, the EM algorithm had been used for mixture regression models. However, these EM mixture regression algorithms are sensitive to initial values with a priori number of clusters. In this paper, to resolve these drawbacks, we construct a learning-based schema for the EM mixture regression algorithm such that it is free of initializations and can automatically obtain an approximately optimal number of clusters. Some numerical examples and comparisons demonstrate the superiority and usefulness of the proposed learning-based EM mixture regression algorithm.

Keywords: clustering, EM algorithm, Gaussian mixture model, mixture regression model

Procedia PDF Downloads 504
3376 The Determinants of the Operational Performance in Airline Industry: A Case of a Turkish Airline Company

Authors: Mustafa K. Yilmaz, Ahmet Kaplan, Murat Guven, Vildan Kesici

Abstract:

Aviation industry influences the social and economic growth across the countries. Further, airline companies are highly affected by social, political, and financial crises and show a high degree of cyclicity in operational performance. Hence, this paper investigates the effects of available seat kilometers (ASK), revenue per kilometer (RPK), passenger load factor (PLF) as well as socio-political crisis on the number of passengers carried (PC) by Turkish Airlines company over the period of 2010M1-2018M12. To conduct the analysis, we employ fully modified ordinary least squares (FMOLS), dynamic ordinary least squares (DOLS), and canonical cointegration regression (CCR) techniques using monthly data. We use ASK, RPK, PLF as independent variables to identify the determinants of the PC, as a dependent variable. We also test the effect of the socio-political crisis. The results reveal that there is a significant and negative relationship between ASK and PC, while the relationship between RPK and PC is positive and significant. We also find that there is an insignificant relationship between PLF and PC. Further, we also find a negative effect of the crisis on the PC. These findings show although the crisis had an immediate effect on the operational performance of Turkish Airlines, the company recovered from the crisis and cope with the situation very promptly. Thus, this proves the resilience and agile management ability of the company.

Keywords: airline industry, operational performance, air traffic, socio-political crisis

Procedia PDF Downloads 166