Search results for: penalized spline regression method
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 20830

Search results for: penalized spline regression method

20770 Analysis of Factors Affecting the Number of Infant and Maternal Mortality in East Java with Geographically Weighted Bivariate Generalized Poisson Regression Method

Authors: Luh Eka Suryani, Purhadi

Abstract:

Poisson regression is a non-linear regression model with response variable in the form of count data that follows Poisson distribution. Modeling for a pair of count data that show high correlation can be analyzed by Poisson Bivariate Regression. Data, the number of infant mortality and maternal mortality, are count data that can be analyzed by Poisson Bivariate Regression. The Poisson regression assumption is an equidispersion where the mean and variance values are equal. However, the actual count data has a variance value which can be greater or less than the mean value (overdispersion and underdispersion). Violations of this assumption can be overcome by applying Generalized Poisson Regression. Characteristics of each regency can affect the number of cases occurred. This issue can be overcome by spatial analysis called geographically weighted regression. This study analyzes the number of infant mortality and maternal mortality based on conditions in East Java in 2016 using Geographically Weighted Bivariate Generalized Poisson Regression (GWBGPR) method. Modeling is done with adaptive bisquare Kernel weighting which produces 3 regency groups based on infant mortality rate and 5 regency groups based on maternal mortality rate. Variables that significantly influence the number of infant and maternal mortality are the percentages of pregnant women visit health workers at least 4 times during pregnancy, pregnant women get Fe3 tablets, obstetric complication handled, clean household and healthy behavior, and married women with the first marriage age under 18 years.

Keywords: adaptive bisquare kernel, GWBGPR, infant mortality, maternal mortality, overdispersion

Procedia PDF Downloads 131
20769 A Novel Approach towards Test Case Prioritization Technique

Authors: Kamna Solanki, Yudhvir Singh, Sandeep Dalal

Abstract:

Software testing is a time and cost intensive process. A scrutiny of the code and rigorous testing is required to identify and rectify the putative bugs. The process of bug identification and its consequent correction is continuous in nature and often some of the bugs are removed after the software has been launched in the market. This process of code validation of the altered software during the maintenance phase is termed as Regression testing. Regression testing ubiquitously considers resource constraints; therefore, the deduction of an appropriate set of test cases, from the ensemble of the entire gamut of test cases, is a critical issue for regression test planning. This paper presents a novel method for designing a suitable prioritization process to optimize fault detection rate and performance of regression test on predefined constraints. The proposed method for test case prioritization m-ACO alters the food source selection criteria of natural ants and is basically a modified version of Ant Colony Optimization (ACO). The proposed m-ACO approach has been coded in 'Perl' language and results are validated using three examples by computation of Average Percentage of Faults Detected (APFD) metric.

Keywords: regression testing, software testing, test case prioritization, test suite optimization

Procedia PDF Downloads 306
20768 Imputation of Incomplete Large-Scale Monitoring Count Data via Penalized Estimation

Authors: Mohamed Dakki, Genevieve Robin, Marie Suet, Abdeljebbar Qninba, Mohamed A. El Agbani, Asmâa Ouassou, Rhimou El Hamoumi, Hichem Azafzaf, Sami Rebah, Claudia Feltrup-Azafzaf, Nafouel Hamouda, Wed a.L. Ibrahim, Hosni H. Asran, Amr A. Elhady, Haitham Ibrahim, Khaled Etayeb, Essam Bouras, Almokhtar Saied, Ashrof Glidan, Bakar M. Habib, Mohamed S. Sayoud, Nadjiba Bendjedda, Laura Dami, Clemence Deschamps, Elie Gaget, Jean-Yves Mondain-Monval, Pierre Defos Du Rau

Abstract:

In biodiversity monitoring, large datasets are becoming more and more widely available and are increasingly used globally to estimate species trends and con- servation status. These large-scale datasets challenge existing statistical analysis methods, many of which are not adapted to their size, incompleteness and heterogeneity. The development of scalable methods to impute missing data in incomplete large-scale monitoring datasets is crucial to balance sampling in time or space and thus better inform conservation policies. We developed a new method based on penalized Poisson models to impute and analyse incomplete monitoring data in a large-scale framework. The method al- lows parameterization of (a) space and time factors, (b) the main effects of predic- tor covariates, as well as (c) space–time interactions. It also benefits from robust statistical and computational capability in large-scale settings. The method was tested extensively on both simulated and real-life waterbird data, with the findings revealing that it outperforms six existing methods in terms of missing data imputation errors. Applying the method to 16 waterbird species, we estimated their long-term trends for the first time at the entire North African scale, a region where monitoring data suffer from many gaps in space and time series. This new approach opens promising perspectives to increase the accuracy of species-abundance trend estimations. We made it freely available in the r package ‘lori’ (https://CRAN.R-project.org/package=lori) and recommend its use for large- scale count data, particularly in citizen science monitoring programmes.

Keywords: biodiversity monitoring, high-dimensional statistics, incomplete count data, missing data imputation, waterbird trends in North-Africa

Procedia PDF Downloads 118
20767 Integrated Nested Laplace Approximations For Quantile Regression

Authors: Kajingulu Malandala, Ranganai Edmore

Abstract:

The asymmetric Laplace distribution (ADL) is commonly used as the likelihood function of the Bayesian quantile regression, and it offers different families of likelihood method for quantile regression. Notwithstanding their popularity and practicality, ADL is not smooth and thus making it difficult to maximize its likelihood. Furthermore, Bayesian inference is time consuming and the selection of likelihood may mislead the inference, as the Bayes theorem does not automatically establish the posterior inference. Furthermore, ADL does not account for greater skewness and Kurtosis. This paper develops a new aspect of quantile regression approach for count data based on inverse of the cumulative density function of the Poisson, binomial and Delaporte distributions using the integrated nested Laplace Approximations. Our result validates the benefit of using the integrated nested Laplace Approximations and support the approach for count data.

Keywords: quantile regression, Delaporte distribution, count data, integrated nested Laplace approximation

Procedia PDF Downloads 135
20766 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities

Authors: Retius Chifurira

Abstract:

Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.

Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities

Procedia PDF Downloads 161
20765 Integrating Process Planning, WMS Dispatching, and WPPW Weighted Due Date Assignment Using a Genetic Algorithm

Authors: Halil Ibrahim Demir, Tarık Cakar, Ibrahim Cil, Muharrem Dugenci, Caner Erden

Abstract:

Conventionally, process planning, scheduling, and due-date assignment functions are performed separately and sequentially. The interdependence of these functions requires integration. Although integrated process planning and scheduling, and scheduling with due date assignment problems are popular research topics, only a few works address the integration of these three functions. This work focuses on the integration of process planning, WMS scheduling, and WPPW due date assignment. Another novelty of this work is the use of a weighted due date assignment. In the literature, due dates are generally assigned without considering the importance of customers. However, in this study, more important customers get closer due dates. Typically, only tardiness is punished, but the JIT philosophy punishes both earliness and tardiness. In this study, all weighted earliness, tardiness, and due date related costs are penalized. As no customer desires distant due dates, such distant due dates should be penalized. In this study, various levels of integration of these three functions are tested and genetic search and random search are compared both with each other and with ordinary solutions. Higher integration levels are superior, while search is always useful. Genetic searches outperformed random searches.

Keywords: process planning, weighted scheduling, weighted due-date assignment, genetic algorithm, random search

Procedia PDF Downloads 355
20764 Application of Regularized Spatio-Temporal Models to the Analysis of Remote Sensing Data

Authors: Salihah Alghamdi, Surajit Ray

Abstract:

Space-time data can be observed over irregularly shaped manifolds, which might have complex boundaries or interior gaps. Most of the existing methods do not consider the shape of the data, and as a result, it is difficult to model irregularly shaped data accommodating the complex domain. We used a method that can deal with space-time data that are distributed over non-planner shaped regions. The method is based on partial differential equations and finite element analysis. The model can be estimated using a penalized least squares approach with a regularization term that controls the over-fitting. The model is regularized using two roughness penalties, which consider the spatial and temporal regularities separately. The integrated square of the second derivative of the basis function is used as temporal penalty. While the spatial penalty consists of the integrated square of Laplace operator, which is integrated exclusively over the domain of interest that is determined using finite element technique. In this paper, we applied a spatio-temporal regression model with partial differential equations regularization (ST-PDE) approach to analyze a remote sensing data measuring the greenness of vegetation, measure by an index called enhanced vegetation index (EVI). The EVI data consist of measurements that take values between -1 and 1 reflecting the level of greenness of some region over a period of time. We applied (ST-PDE) approach to irregular shaped region of the EVI data. The approach efficiently accommodates the irregular shaped regions taking into account the complex boundaries rather than smoothing across the boundaries. Furthermore, the approach succeeds in capturing the temporal variation in the data.

Keywords: irregularly shaped domain, partial differential equations, finite element analysis, complex boundray

Procedia PDF Downloads 120
20763 Multiobjective Optimization of a Pharmaceutical Formulation Using Regression Method

Authors: J. Satya Eswari, Ch. Venkateswarlu

Abstract:

The formulation of a commercial pharmaceutical product involves several composition factors and response characteristics. When the formulation requires to satisfy multiple response characteristics which are conflicting, an optimal solution requires the need for an efficient multiobjective optimization technique. In this work, a regression is combined with a non-dominated sorting differential evolution (NSDE) involving Naïve & Slow and ε constraint techniques to derive different multiobjective optimization strategies, which are then evaluated by means of a trapidil pharmaceutical formulation. The analysis of the results show the effectiveness of the strategy that combines the regression model and NSDE with the integration of both Naïve & Slow and ε constraint techniques for Pareto optimization of trapidil formulation. With this strategy, the optimal formulation at pH=6.8 is obtained with the decision variables of micro crystalline cellulose, hydroxypropyl methylcellulose and compression pressure. The corresponding response characteristics of rate constant and release order are also noted down. The comparison of these results with the experimental data and with those of other multiple regression model based multiobjective evolutionary optimization strategies signify the better performance for optimal trapidil formulation.

Keywords: pharmaceutical formulation, multiple regression model, response surface method, radial basis function network, differential evolution, multiobjective optimization

Procedia PDF Downloads 384
20762 Arabic Character Recognition Using Regression Curves with the Expectation Maximization Algorithm

Authors: Abdullah A. AlShaher

Abstract:

In this paper, we demonstrate how regression curves can be used to recognize 2D non-rigid handwritten shapes. Each shape is represented by a set of non-overlapping uniformly distributed landmarks. The underlying models utilize 2nd order of polynomials to model shapes within a training set. To estimate the regression models, we need to extract the required coefficients which describe the variations for a set of shape class. Hence, a least square method is used to estimate such modes. We then proceed by training these coefficients using the apparatus Expectation Maximization algorithm. Recognition is carried out by finding the least error landmarks displacement with respect to the model curves. Handwritten isolated Arabic characters are used to evaluate our approach.

Keywords: character recognition, regression curves, handwritten Arabic letters, expectation maximization algorithm

Procedia PDF Downloads 116
20761 Prediction of Energy Storage Areas for Static Photovoltaic System Using Irradiation and Regression Modelling

Authors: Kisan Sarda, Bhavika Shingote

Abstract:

This paper aims to evaluate regression modelling for prediction of Energy storage of solar photovoltaic (PV) system using Semi parametric regression techniques because there are some parameters which are known while there are some unknown parameters like humidity, dust etc. Here irradiation of solar energy is different for different places on the basis of Latitudes, so by finding out areas which give more storage we can implement PV systems at those places and our need of energy will be fulfilled. This regression modelling is done for daily, monthly and seasonal prediction of solar energy storage. In this, we have used R modules for designing the algorithm. This algorithm will give the best comparative results than other regression models for the solar PV cell energy storage.

Keywords: semi parametric regression, photovoltaic (PV) system, regression modelling, irradiation

Procedia PDF Downloads 350
20760 Formulating a Flexible-Spread Fuzzy Regression Model Based on Dissemblance Index

Authors: Shih-Pin Chen, Shih-Syuan You

Abstract:

This study proposes a regression model with flexible spreads for fuzzy input-output data to cope with the situation that the existing measures cannot reflect the actual estimation error. The main idea is that a dissemblance index (DI) is carefully identified and defined for precisely measuring the actual estimation error. Moreover, the graded mean integration (GMI) representation is adopted for determining more representative numeric regression coefficients. Notably, to comprehensively compare the performance of the proposed model with other ones, three different criteria are adopted. The results from commonly used test numerical examples and an application to Taiwan's business monitoring indicator illustrate that the proposed dissemblance index method not only produces valid fuzzy regression models for fuzzy input-output data, but also has satisfactory and stable performance in terms of the total estimation error based on these three criteria.

Keywords: dissemblance index, forecasting, fuzzy sets, linear regression

Procedia PDF Downloads 330
20759 Optimizing the Scanning Time with Radiation Prediction Using a Machine Learning Technique

Authors: Saeed Eskandari, Seyed Rasoul Mehdikhani

Abstract:

Radiation sources have been used in many industries, such as gamma sources in medical imaging. These waves have destructive effects on humans and the environment. It is very important to detect and find the source of these waves because these sources cannot be seen by the eye. A portable robot has been designed and built with the purpose of revealing radiation sources that are able to scan the place from 5 to 20 meters away and shows the location of the sources according to the intensity of the waves on a two-dimensional digital image. The operation of the robot is done by measuring the pixels separately. By increasing the image measurement resolution, we will have a more accurate scan of the environment, and more points will be detected. But this causes a lot of time to be spent on scanning. In this paper, to overcome this challenge, we designed a method that can optimize this time. In this method, a small number of important points of the environment are measured. Hence the remaining pixels are predicted and estimated by regression algorithms in machine learning. The research method is based on comparing the actual values of all pixels. These steps have been repeated with several other radiation sources. The obtained results of the study show that the values estimated by the regression method are very close to the real values.

Keywords: regression, machine learning, scan radiation, robot

Procedia PDF Downloads 57
20758 Prediction of Index-Mechanical Properties of Pyroclastic Rock Utilizing Electrical Resistivity Method

Authors: İsmail İnce

Abstract:

The aim of this study is to determine index and mechanical properties of pyroclastic rock in a practical way by means of electrical resistivity method. For this purpose, electrical resistivity, uniaxial compressive strength, point load strength, P-wave velocity, density and porosity values of 10 different pyroclastic rocks were measured in the laboratory. A simple regression analysis was made among the index-mechanical properties of the samples compatible with electrical resistivity values. A strong exponentially relation was found between index-mechanical properties and electrical resistivity values. The electrical resistivity method can be used to assess the engineering properties of the rock from which it is difficult to obtain regular shaped samples as a non-destructive method.

Keywords: electrical resistivity, index-mechanical properties, pyroclastic rocks, regression analysis

Procedia PDF Downloads 441
20757 A Study on Inference from Distance Variables in Hedonic Regression

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban area, several landmarks may affect housing price and rents, hedonic analysis should employ distance variables corresponding to each landmarks. Unfortunately, the effects of distances to landmarks on housing prices are generally not consistent with the true price. These distance variables may cause magnitude error in regression, pointing a problem of spatial multicollinearity. In this paper, we provided some approaches for getting the samples with less bias and method on locating the specific sampling area to avoid the multicollinerity problem in two specific landmarks case.

Keywords: landmarks, hedonic regression, distance variables, collinearity, multicollinerity

Procedia PDF Downloads 426
20756 Applying Element Free Galerkin Method on Beam and Plate

Authors: Mahdad M’hamed, Belaidi Idir

Abstract:

This paper develops a meshless approach, called Element Free Galerkin (EFG) method, which is based on the weak form Moving Least Squares (MLS) of the partial differential governing equations and employs the interpolation to construct the meshless shape functions. The variation weak form is used in the EFG where the trial and test functions are approximated bye the MLS approximation. Since the shape functions constructed by this discretization have the weight function property based on the randomly distributed points, the essential boundary conditions can be implemented easily. The local weak form of the partial differential governing equations is obtained by the weighted residual method within the simple local quadrature domain. The spline function with high continuity is used as the weight function. The presently developed EFG method is a truly meshless method, as it does not require the mesh, either for the construction of the shape functions, or for the integration of the local weak form. Several numerical examples of two-dimensional static structural analysis are presented to illustrate the performance of the present EFG method. They show that the EFG method is highly efficient for the implementation and highly accurate for the computation. The present method is used to analyze the static deflection of beams and plate hole

Keywords: numerical computation, element-free Galerkin (EFG), moving least squares (MLS), meshless methods

Procedia PDF Downloads 261
20755 A Hybrid Fuzzy Clustering Approach for Fertile and Unfertile Analysis

Authors: Shima Soltanzadeh, Mohammad Hosain Fazel Zarandi, Mojtaba Barzegar Astanjin

Abstract:

Diagnosis of male infertility by the laboratory tests is expensive and, sometimes it is intolerable for patients. Filling out the questionnaire and then using classification method can be the first step in decision-making process, so only in the cases with a high probability of infertility we can use the laboratory tests. In this paper, we evaluated the performance of four classification methods including naive Bayesian, neural network, logistic regression and fuzzy c-means clustering as a classification, in the diagnosis of male infertility due to environmental factors. Since the data are unbalanced, the ROC curves are most suitable method for the comparison. In this paper, we also have selected the more important features using a filtering method and examined the impact of this feature reduction on the performance of each methods; generally, most of the methods had better performance after applying the filter. We have showed that using fuzzy c-means clustering as a classification has a good performance according to the ROC curves and its performance is comparable to other classification methods like logistic regression.

Keywords: classification, fuzzy c-means, logistic regression, Naive Bayesian, neural network, ROC curve

Procedia PDF Downloads 304
20754 Application Difference between Cox and Logistic Regression Models

Authors: Idrissa Kayijuka

Abstract:

The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.

Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio

Procedia PDF Downloads 424
20753 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 114
20752 Stock Market Prediction by Regression Model with Social Moods

Authors: Masahiro Ohmura, Koh Kakusho, Takeshi Okadome

Abstract:

This paper presents a regression model with autocorrelated errors in which the inputs are social moods obtained by analyzing the adjectives in Twitter posts using a document topic model. The regression model predicts Dow Jones Industrial Average (DJIA) more precisely than autoregressive moving-average models.

Keywords: stock market prediction, social moods, regression model, DJIA

Procedia PDF Downloads 521
20751 Modelling the Indonesian Goverment Securities Yield Curve Using Nelson-Siegel-Svensson and Support Vector Regression

Authors: Jamilatuzzahro, Rezzy Eko Caraka

Abstract:

The yield curve is the plot of the yield to maturity of zero-coupon bonds against maturity. In practice, the yield curve is not observed but must be extracted from observed bond prices for a set of (usually) incomplete maturities. There exist many methodologies and theory to analyze of yield curve. We use two methods (the Nelson-Siegel Method, the Svensson Method, and the SVR method) in order to construct and compare our zero-coupon yield curves. The objectives of this research were: (i) to study the adequacy of NSS model and SVR to Indonesian government bonds data, (ii) to choose the best optimization or estimation method for NSS model and SVR. To obtain that objective, this research was done by the following steps: data preparation, cleaning or filtering data, modeling, and model evaluation.

Keywords: support vector regression, Nelson-Siegel-Svensson, yield curve, Indonesian government

Procedia PDF Downloads 216
20750 An Overbooking Model for Car Rental Service with Different Types of Cars

Authors: Naragain Phumchusri, Kittitach Pongpairoj

Abstract:

Overbooking is a very useful revenue management technique that could help reduce costs caused by either undersales or oversales. In this paper, we propose an overbooking model for two types of cars that can minimize the total cost for car rental service. With two types of cars, there is an upgrade possibility for lower type to upper type. This makes the model more complex than one type of cars scenario. We have found that convexity can be proved in this case. Sensitivity analysis of the parameters is conducted to observe the effects of relevant parameters on the optimal solution. Model simplification is proposed using multiple linear regression analysis, which can help estimate the optimal overbooking level using appropriate independent variables. The results show that the overbooking level from multiple linear regression model is relatively close to the optimal solution (with the adjusted R-squared value of at least 72.8%). To evaluate the performance of the proposed model, the total cost was compared with the case where the decision maker uses a naïve method for the overbooking level. It was found that the total cost from optimal solution is only 0.5 to 1 percent (on average) lower than the cost from regression model, while it is approximately 67% lower than the cost obtained by the naïve method. It indicates that our proposed simplification method using regression analysis can effectively perform in estimating the overbooking level.

Keywords: overbooking, car rental industry, revenue management, stochastic model

Procedia PDF Downloads 146
20749 Model-Based Software Regression Test Suite Reduction

Authors: Shiwei Deng, Yang Bao

Abstract:

In this paper, we present a model-based regression test suite reducing approach that uses EFSM model dependence analysis and probability-driven greedy algorithm to reduce software regression test suites. The approach automatically identifies the difference between the original model and the modified model as a set of elementary model modifications. The EFSM dependence analysis is performed for each elementary modification to reduce the regression test suite, and then the probability-driven greedy algorithm is adopted to select the minimum set of test cases from the reduced regression test suite that cover all interaction patterns. Our initial experience shows that the approach may significantly reduce the size of regression test suites.

Keywords: dependence analysis, EFSM model, greedy algorithm, regression test

Procedia PDF Downloads 399
20748 On the Performance of Improvised Generalized M-Estimator in the Presence of High Leverage Collinearity Enhancing Observations

Authors: Habshah Midi, Mohammed A. Mohammed, Sohel Rana

Abstract:

Multicollinearity occurs when two or more independent variables in a multiple linear regression model are highly correlated. The ridge regression is the commonly used method to rectify this problem. However, the ridge regression cannot handle the problem of multicollinearity which is caused by high leverage collinearity enhancing observation (HLCEO). Since high leverage points (HLPs) are responsible for inducing multicollinearity, the effect of HLPs needs to be reduced by using Generalized M estimator. The existing GM6 estimator is based on the Minimum Volume Ellipsoid (MVE) which tends to swamp some low leverage points. Hence an improvised GM (MGM) estimator is presented to improve the precision of the GM6 estimator. Numerical example and simulation study are presented to show how HLPs can cause multicollinearity. The numerical results show that our MGM estimator is the most efficient method compared to some existing methods.

Keywords: identification, high leverage points, multicollinearity, GM-estimator, DRGP, DFFITS

Procedia PDF Downloads 221
20747 Influence of Parameters of Modeling and Data Distribution for Optimal Condition on Locally Weighted Projection Regression Method

Authors: Farhad Asadi, Mohammad Javad Mollakazemi, Aref Ghafouri

Abstract:

Recent research in neural networks science and neuroscience for modeling complex time series data and statistical learning has focused mostly on learning from high input space and signals. Local linear models are a strong choice for modeling local nonlinearity in data series. Locally weighted projection regression is a flexible and powerful algorithm for nonlinear approximation in high dimensional signal spaces. In this paper, different learning scenario of one and two dimensional data series with different distributions are investigated for simulation and further noise is inputted to data distribution for making different disordered distribution in time series data and for evaluation of algorithm in locality prediction of nonlinearity. Then, the performance of this algorithm is simulated and also when the distribution of data is high or when the number of data is less the sensitivity of this approach to data distribution and influence of important parameter of local validity in this algorithm with different data distribution is explained.

Keywords: local nonlinear estimation, LWPR algorithm, online training method, locally weighted projection regression method

Procedia PDF Downloads 466
20746 Applicability of Cameriere’s Age Estimation Method in a Sample of Turkish Adults

Authors: Hatice Boyacioglu, Nursel Akkaya, Humeyra Ozge Yilanci, Hilmi Kansu, Nihal Avcu

Abstract:

The strong relationship between the reduction in the size of the pulp cavity and increasing age has been reported in the literature. This relationship can be utilized to estimate the age of an individual by measuring the pulp cavity size using dental radiographs as a non-destructive method. The purpose of this study is to develop a population specific regression model for age estimation in a sample of Turkish adults by applying Cameriere’s method on panoramic radiographs. The sample consisted of 100 panoramic radiographs of Turkish patients (40 men, 60 women) aged between 20 and 70 years. Pulp and tooth area ratios (AR) of the maxilla¬¬ry canines were measured by two maxillofacial radiologists and then the results were subjected to regression analysis. There were no statistically significant intra-observer and inter-observer differences. The correlation coefficient between age and the AR of the maxillary canines was -0.71 and the following regression equation was derived: Estimated Age = 77,365 – ( 351,193 × AR ). The mean prediction error was 4 years which is within acceptable errors limits for age estimation. This shows that the pulp/tooth area ratio is a useful variable for assessing age with reasonable accuracy. Based on the results of this research, it was concluded that Cameriere’s method is suitable for dental age estimation and it can be used for forensic procedures in Turkish adults. These instructions give you guidelines for preparing papers for conferences or journals.

Keywords: age estimation by teeth, forensic dentistry, panoramic radiograph, Cameriere's method

Procedia PDF Downloads 423
20745 Incorporating Anomaly Detection in a Digital Twin Scenario Using Symbolic Regression

Authors: Manuel Alves, Angelica Reis, Armindo Lobo, Valdemar Leiras

Abstract:

In industry 4.0, it is common to have a lot of sensor data. In this deluge of data, hints of possible problems are difficult to spot. The digital twin concept aims to help answer this problem, but it is mainly used as a monitoring tool to handle the visualisation of data. Failure detection is of paramount importance in any industry, and it consumes a lot of resources. Any improvement in this regard is of tangible value to the organisation. The aim of this paper is to add the ability to forecast test failures, curtailing detection times. To achieve this, several anomaly detection algorithms were compared with a symbolic regression approach. To this end, Isolation Forest, One-Class SVM and an auto-encoder have been explored. For the symbolic regression PySR library was used. The first results show that this approach is valid and can be added to the tools available in this context as a low resource anomaly detection method since, after training, the only requirement is the calculation of a polynomial, a useful feature in the digital twin context.

Keywords: anomaly detection, digital twin, industry 4.0, symbolic regression

Procedia PDF Downloads 89
20744 Study on the Geometric Similarity in Computational Fluid Dynamics Calculation and the Requirement of Surface Mesh Quality

Authors: Qian Yi Ooi

Abstract:

At present, airfoil parameters are still designed and optimized according to the scale of conventional aircraft, and there are still some slight deviations in terms of scale differences. However, insufficient parameters or poor surface mesh quality is likely to occur if these small deviations are embedded in a future civil aircraft with a size that is quite different from conventional aircraft, such as a blended-wing-body (BWB) aircraft with future potential, resulting in large deviations in geometric similarity in computational fluid dynamics (CFD) simulations. To avoid this situation, the study on the CFD calculation on the geometric similarity of airfoil parameters and the quality of the surface mesh is conducted to obtain the ability of different parameterization methods applied on different airfoil scales. The research objects are three airfoil scales, including the wing root and wingtip of conventional civil aircraft and the wing root of the giant hybrid wing, used by three parameterization methods to compare the calculation differences between different sizes of airfoils. In this study, the constants including NACA 0012, a Reynolds number of 10 million, an angle of attack of zero, a C-grid for meshing, and the k-epsilon (k-ε) turbulence model are used. The experimental variables include three airfoil parameterization methods: point cloud method, B-spline curve method, and class function/shape function transformation (CST) method. The airfoil dimensions are set to 3.98 meters, 17.67 meters, and 48 meters, respectively. In addition, this study also uses different numbers of edge meshing and the same bias factor in the CFD simulation. Studies have shown that with the change of airfoil scales, different parameterization methods, the number of control points, and the meshing number of divisions should be used to improve the accuracy of the aerodynamic performance of the wing. When the airfoil ratio increases, the most basic point cloud parameterization method will require more and larger data to support the accuracy of the airfoil’s aerodynamic performance, which will face the severe test of insufficient computer capacity. On the other hand, when using the B-spline curve method, average number of control points and meshing number of divisions should be set appropriately to obtain higher accuracy; however, the quantitative balance cannot be directly defined, but the decisions should be made repeatedly by adding and subtracting. Lastly, when using the CST method, it is found that limited control points are enough to accurately parameterize the larger-sized wing; a higher degree of accuracy and stability can be obtained by using a lower-performance computer.

Keywords: airfoil, computational fluid dynamics, geometric similarity, surface mesh quality

Procedia PDF Downloads 194
20743 A Fuzzy Linear Regression Model Based on Dissemblance Index

Authors: Shih-Pin Chen, Shih-Syuan You

Abstract:

Fuzzy regression models are useful for investigating the relationship between explanatory variables and responses in fuzzy environments. To overcome the deficiencies of previous models and increase the explanatory power of fuzzy data, the graded mean integration (GMI) representation is applied to determine representative crisp regression coefficients. A fuzzy regression model is constructed based on the modified dissemblance index (MDI), which can precisely measure the actual total error. Compared with previous studies based on the proposed MDI and distance criterion, the results from commonly used test examples show that the proposed fuzzy linear regression model has higher explanatory power and forecasting accuracy.

Keywords: dissemblance index, fuzzy linear regression, graded mean integration, mathematical programming

Procedia PDF Downloads 409
20742 Regret-Regression for Multi-Armed Bandit Problem

Authors: Deyadeen Ali Alshibani

Abstract:

In the literature, the multi-armed bandit problem as a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. There are several different algorithms models and their applications on this problem. In this paper, we evaluate the Regret-regression through comparing with Q-learning method. A simulation on determination of optimal treatment regime is presented in detail.

Keywords: optimal, bandit problem, optimization, dynamic programming

Procedia PDF Downloads 424
20741 The Effect of Accounting Conservatism on Cost of Capital: A Quantile Regression Approach for MENA Countries

Authors: Maha Zouaoui Khalifa, Hakim Ben Othman, Hussaney Khaled

Abstract:

Prior empirical studies have investigated the economic consequences of accounting conservatism by examining its impact on the cost of equity capital (COEC). However, findings are not conclusive. We assume that inconsistent results of such association may be attributed to the regression models used in data analysis. To address this issue, we re-examine the effect of different dimension of accounting conservatism: unconditional conservatism (U_CONS) and conditional conservatism (C_CONS) on the COEC for a sample of listed firms from Middle Eastern and North Africa (MENA) countries, applying quantile regression (QR) approach developed by Koenker and Basset (1978). While classical ordinary least square (OLS) method is widely used in empirical accounting research, however it may produce inefficient and bias estimates in the case of departures from normality or long tail error distribution. QR method is more powerful than OLS to handle this kind of problem. It allows the coefficient on the independent variables to shift across the distribution of the dependent variable whereas OLS method only estimates the conditional mean effects of a response variable. We find as predicted that U_CONS has a significant positive effect on the COEC however, C_CONS has a negative impact. Findings suggest also that the effect of the two dimensions of accounting conservatism differs considerably across COEC quantiles. Comparing results from QR method with those of OLS, this study throws more lights on the association between accounting conservatism and COEC.

Keywords: unconditional conservatism, conditional conservatism, cost of equity capital, OLS, quantile regression, emerging markets, MENA countries

Procedia PDF Downloads 328