Search results for: regression estimators
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3167

Search results for: regression estimators

3077 Survival and Hazard Maximum Likelihood Estimator with Covariate Based on Right Censored Data of Weibull Distribution

Authors: Al Omari Mohammed Ahmed

Abstract:

This paper focuses on Maximum Likelihood Estimator with Covariate. Covariates are incorporated into the Weibull model. Under this regression model with regards to maximum likelihood estimator, the parameters of the covariate, shape parameter, survival function and hazard rate of the Weibull regression distribution with right censored data are estimated. The mean square error (MSE) and absolute bias are used to compare the performance of Weibull regression distribution. For the simulation comparison, the study used various sample sizes and several specific values of the Weibull shape parameter.

Keywords: weibull regression distribution, maximum likelihood estimator, survival function, hazard rate, right censoring

Procedia PDF Downloads 407
3076 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins

Authors: Navab Karimi, Tohid Alizadeh

Abstract:

An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.

Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.

Procedia PDF Downloads 37
3075 Analysis of Factors Affecting the Number of Infant and Maternal Mortality in East Java with Geographically Weighted Bivariate Generalized Poisson Regression Method

Authors: Luh Eka Suryani, Purhadi

Abstract:

Poisson regression is a non-linear regression model with response variable in the form of count data that follows Poisson distribution. Modeling for a pair of count data that show high correlation can be analyzed by Poisson Bivariate Regression. Data, the number of infant mortality and maternal mortality, are count data that can be analyzed by Poisson Bivariate Regression. The Poisson regression assumption is an equidispersion where the mean and variance values are equal. However, the actual count data has a variance value which can be greater or less than the mean value (overdispersion and underdispersion). Violations of this assumption can be overcome by applying Generalized Poisson Regression. Characteristics of each regency can affect the number of cases occurred. This issue can be overcome by spatial analysis called geographically weighted regression. This study analyzes the number of infant mortality and maternal mortality based on conditions in East Java in 2016 using Geographically Weighted Bivariate Generalized Poisson Regression (GWBGPR) method. Modeling is done with adaptive bisquare Kernel weighting which produces 3 regency groups based on infant mortality rate and 5 regency groups based on maternal mortality rate. Variables that significantly influence the number of infant and maternal mortality are the percentages of pregnant women visit health workers at least 4 times during pregnancy, pregnant women get Fe3 tablets, obstetric complication handled, clean household and healthy behavior, and married women with the first marriage age under 18 years.

Keywords: adaptive bisquare kernel, GWBGPR, infant mortality, maternal mortality, overdispersion

Procedia PDF Downloads 125
3074 Determining the Causality Variables in Female Genital Mutilation: A Factor Screening Approach

Authors: Ekele Alih, Enejo Jalija

Abstract:

Female Genital Mutilation (FGM) is made up of three types namely: Clitoridectomy, Excision and Infibulation. In this study, we examine the factors responsible for FGM in order to identify the causality variables in a logistic regression approach. From the result of the survey conducted by the Public Health Division, Nigeria Institute of Medical Research, Yaba, Lagos State, the tau statistic, τ was used to screen 9 factors that causes FGM in order to select few of the predictors before multiple regression equation is obtained. The need for this may be that the sample size may not be able to sustain having a regression with all the predictors or to avoid multi-collinearity. A total of 300 respondents, comprising 150 adult males and 150 adult females were selected for the household survey based on the multi-stage sampling procedure. The tau statistic,

Keywords: female genital mutilation, logistic regression, tau statistic, African society

Procedia PDF Downloads 225
3073 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation

Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen

Abstract:

Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.

Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning

Procedia PDF Downloads 30
3072 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate

Procedia PDF Downloads 154
3071 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 107
3070 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 327
3069 A Comparison of Neural Network and DOE-Regression Analysis for Predicting Resource Consumption of Manufacturing Processes

Authors: Frank Kuebler, Rolf Steinhilper

Abstract:

Artificial neural networks (ANN) as well as Design of Experiments (DOE) based regression analysis (RA) are mainly used for modeling of complex systems. Both methodologies are commonly applied in process and quality control of manufacturing processes. Due to the fact that resource efficiency has become a critical concern for manufacturing companies, these models needs to be extended to predict resource-consumption of manufacturing processes. This paper describes an approach to use neural networks as well as DOE based regression analysis for predicting resource consumption of manufacturing processes and gives a comparison of the achievable results based on an industrial case study of a turning process.

Keywords: artificial neural network, design of experiments, regression analysis, resource efficiency, manufacturing process

Procedia PDF Downloads 486
3068 Logistic Regression Model versus Additive Model for Recurrent Event Data

Authors: Entisar A. Elgmati

Abstract:

Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.

Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event

Procedia PDF Downloads 598
3067 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data

Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone

Abstract:

This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.

Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression

Procedia PDF Downloads 81
3066 An Analysis of the Effect of Sharia Financing and Work Relation Founding towards Non-Performing Financing in Islamic Banks in Indonesia

Authors: Muhammad Bahrul Ilmi

Abstract:

The purpose of this research is to analyze the influence of Islamic financing and work relation founding simultaneously and partially towards non-performing financing in Islamic banks. This research was regression quantitative field research, and had been done in Muammalat Indonesia Bank and Islamic Danamon Bank in 3 months. The populations of this research were 15 account officers of Muammalat Indonesia Bank and Islamic Danamon Bank in Surakarta, Indonesia. The techniques of collecting data used in this research were documentation, questionnaire, literary study and interview. Regression analysis result shows that Islamic financing and work relation founding simultaneously has positive and significant effect towards non performing financing of two Islamic Banks. It is obtained with probability value 0.003 which is less than 0.05 and F value 9.584. The analysis result of Islamic financing regression towards non performing financing shows the significant effect. It is supported by double linear regression analysis with probability value 0.001 which is less than 0.05. The regression analysis of work relation founding effect towards non-performing financing shows insignificant effect. This is shown in the double linear regression analysis with probability value 0.161 which is bigger than 0.05.

Keywords: Syariah financing, work relation founding, non-performing financing (NPF), Islamic Bank

Procedia PDF Downloads 398
3065 A Kolmogorov-Smirnov Type Goodness-Of-Fit Test of Multinomial Logistic Regression Model in Case-Control Studies

Authors: Chen Li-Ching

Abstract:

The multinomial logistic regression model is used popularly for inferring the relationship of risk factors and disease with multiple categories. This study based on the discrepancy between the nonparametric maximum likelihood estimator and semiparametric maximum likelihood estimator of the cumulative distribution function to propose a Kolmogorov-Smirnov type test statistic to assess adequacy of the multinomial logistic regression model for case-control data. A bootstrap procedure is presented to calculate the critical value of the proposed test statistic. Empirical type I error rates and powers of the test are performed by simulation studies. Some examples will be illustrated the implementation of the test.

Keywords: case-control studies, goodness-of-fit test, Kolmogorov-Smirnov test, multinomial logistic regression

Procedia PDF Downloads 422
3064 A Study on Inference from Distance Variables in Hedonic Regression

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban area, several landmarks may affect housing price and rents, hedonic analysis should employ distance variables corresponding to each landmarks. Unfortunately, the effects of distances to landmarks on housing prices are generally not consistent with the true price. These distance variables may cause magnitude error in regression, pointing a problem of spatial multicollinearity. In this paper, we provided some approaches for getting the samples with less bias and method on locating the specific sampling area to avoid the multicollinerity problem in two specific landmarks case.

Keywords: landmarks, hedonic regression, distance variables, collinearity, multicollinerity

Procedia PDF Downloads 420
3063 Forecasting of Grape Juice Flavor by Using Support Vector Regression

Authors: Ren-Jieh Kuo, Chun-Shou Huang

Abstract:

The research of juice flavor forecasting has become more important in China. Due to the fast economic growth in China, many different kinds of juices have been introduced to the market. If a beverage company can understand their customers’ preference well, the juice can be served more attractively. Thus, this study intends to introduce the basic theory and computing process of grapes juice flavor forecasting based on support vector regression (SVR). Applying SVR, BPN and LR to forecast the flavor of grapes juice in real data, the result shows that SVR is more suitable and effective at predicting performance.

Keywords: flavor forecasting, artificial neural networks, Support Vector Regression, China

Procedia PDF Downloads 448
3062 Estimation of Coefficients of Ridge and Principal Components Regressions with Multicollinear Data

Authors: Rajeshwar Singh

Abstract:

The presence of multicollinearity is common in handling with several explanatory variables simultaneously due to exhibiting a linear relationship among them. A great problem arises in understanding the impact of explanatory variables on the dependent variable. Thus, the method of least squares estimation gives inexact estimates. In this case, it is advised to detect its presence first before proceeding further. Using the ridge regression degree of its occurrence is reduced but principal components regression gives good estimates in this situation. This paper discusses well-known techniques of the ridge and principal components regressions and applies to get the estimates of coefficients by both techniques. In addition to it, this paper also discusses the conflicting claim on the discovery of the method of ridge regression based on available documents.

Keywords: conflicting claim on credit of discovery of ridge regression, multicollinearity, principal components and ridge regressions, variance inflation factor

Procedia PDF Downloads 369
3061 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing

Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva

Abstract:

In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.

Keywords: power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity

Procedia PDF Downloads 280
3060 Incorporating Anomaly Detection in a Digital Twin Scenario Using Symbolic Regression

Authors: Manuel Alves, Angelica Reis, Armindo Lobo, Valdemar Leiras

Abstract:

In industry 4.0, it is common to have a lot of sensor data. In this deluge of data, hints of possible problems are difficult to spot. The digital twin concept aims to help answer this problem, but it is mainly used as a monitoring tool to handle the visualisation of data. Failure detection is of paramount importance in any industry, and it consumes a lot of resources. Any improvement in this regard is of tangible value to the organisation. The aim of this paper is to add the ability to forecast test failures, curtailing detection times. To achieve this, several anomaly detection algorithms were compared with a symbolic regression approach. To this end, Isolation Forest, One-Class SVM and an auto-encoder have been explored. For the symbolic regression PySR library was used. The first results show that this approach is valid and can be added to the tools available in this context as a low resource anomaly detection method since, after training, the only requirement is the calculation of a polynomial, a useful feature in the digital twin context.

Keywords: anomaly detection, digital twin, industry 4.0, symbolic regression

Procedia PDF Downloads 83
3059 An Estimating Equation for Survival Data with a Possibly Time-Varying Covariates under a Semiparametric Transformation Models

Authors: Yemane Hailu Fissuh, Zhongzhan Zhang

Abstract:

An estimating equation technique is an alternative method of the widely used maximum likelihood methods, which enables us to ease some complexity due to the complex characteristics of time-varying covariates. In the situations, when both the time-varying covariates and left-truncation are considered in the model, the maximum likelihood estimation procedures become much more burdensome and complex. To ease the complexity, in this study, the modified estimating equations those have been given high attention and considerations in many researchers under semiparametric transformation model was proposed. The purpose of this article was to develop the modified estimating equation under flexible and general class of semiparametric transformation models for left-truncated and right censored survival data with time-varying covariates. Besides the commonly applied Cox proportional hazards model, such kind of problems can be also analyzed with a general class of semiparametric transformation models to estimate the effect of treatment given possibly time-varying covariates on the survival time. The consistency and asymptotic properties of the estimators were intuitively derived via the expectation-maximization (EM) algorithm. The characteristics of the estimators in the finite sample performance for the proposed model were illustrated via simulation studies and Stanford heart transplant real data examples. To sum up the study, the bias for covariates has been adjusted by estimating density function for the truncation time variable. Then the effect of possibly time-varying covariates was evaluated in some special semiparametric transformation models.

Keywords: EM algorithm, estimating equation, semiparametric transformation models, time-to-event outcomes, time varying covariate

Procedia PDF Downloads 117
3058 Impact of Infrastructural Development on Socio-Economic Growth: An Empirical Investigation in India

Authors: Jonardan Koner

Abstract:

The study attempts to find out the impact of infrastructural investment on state economic growth in India. It further tries to determine the magnitude of the impact of infrastructural investment on economic indicator, i.e., per-capita income (PCI) in Indian States. The study uses panel regression technique to measure the impact of infrastructural investment on per-capita income (PCI) in Indian States. Panel regression technique helps incorporate both the cross-section and time-series aspects of the dataset. In order to analyze the difference in impact of the explanatory variables on the explained variables across states, the study uses Fixed Effect Panel Regression Model. The conclusions of the study are that infrastructural investment has a desirable impact on economic development and that the impact is different for different states in India. We analyze time series data (annual frequency) ranging from 1991 to 2010. The study reveals that the infrastructural investment significantly explains the variation of economic indicators.

Keywords: infrastructural investment, multiple regression, panel regression techniques, economic development, fixed effect dummy variable model

Procedia PDF Downloads 338
3057 A Quadratic Model to Early Predict the Blastocyst Stage with a Time Lapse Incubator

Authors: Cecile Edel, Sandrine Giscard D'Estaing, Elsa Labrune, Jacqueline Lornage, Mehdi Benchaib

Abstract:

Introduction: The use of incubator equipped with time-lapse technology in Artificial Reproductive Technology (ART) allows a continuous surveillance. With morphocinetic parameters, algorithms are available to predict the potential outcome of an embryo. However, the different proposed time-lapse algorithms do not take account the missing data, and then some embryos could not be classified. The aim of this work is to construct a predictive model even in the case of missing data. Materials and methods: Patients: A retrospective study was performed, in biology laboratory of reproduction at the hospital ‘Femme Mère Enfant’ (Lyon, France) between 1 May 2013 and 30 April 2015. Embryos (n= 557) obtained from couples (n=108) were cultured in a time-lapse incubator (Embryoscope®, Vitrolife, Goteborg, Sweden). Time-lapse incubator: The morphocinetic parameters obtained during the three first days of embryo life were used to build the predictive model. Predictive model: A quadratic regression was performed between the number of cells and time. N = a. T² + b. T + c. N: number of cells at T time (T in hours). The regression coefficients were calculated with Excel software (Microsoft, Redmond, WA, USA), a program with Visual Basic for Application (VBA) (Microsoft) was written for this purpose. The quadratic equation was used to find a value that allows to predict the blastocyst formation: the synthetize value. The area under the curve (AUC) obtained from the ROC curve was used to appreciate the performance of the regression coefficients and the synthetize value. A cut-off value has been calculated for each regression coefficient and for the synthetize value to obtain two groups where the difference of blastocyst formation rate according to the cut-off values was maximal. The data were analyzed with SPSS (IBM, Il, Chicago, USA). Results: Among the 557 embryos, 79.7% had reached the blastocyst stage. The synthetize value corresponds to the value calculated with time value equal to 99, the highest AUC was then obtained. The AUC for regression coefficient ‘a’ was 0.648 (p < 0.001), 0.363 (p < 0.001) for the regression coefficient ‘b’, 0.633 (p < 0.001) for the regression coefficient ‘c’, and 0.659 (p < 0.001) for the synthetize value. The results are presented as follow: blastocyst formation rate under cut-off value versus blastocyst rate formation above cut-off value. For the regression coefficient ‘a’ the optimum cut-off value was -1.14.10-3 (61.3% versus 84.3%, p < 0.001), 0.26 for the regression coefficient ‘b’ (83.9% versus 63.1%, p < 0.001), -4.4 for the regression coefficient ‘c’ (62.2% versus 83.1%, p < 0.001) and 8.89 for the synthetize value (58.6% versus 85.0%, p < 0.001). Conclusion: This quadratic regression allows to predict the outcome of an embryo even in case of missing data. Three regression coefficients and a synthetize value could represent the identity card of an embryo. ‘a’ regression coefficient represents the acceleration of cells division, ‘b’ regression coefficient represents the speed of cell division. We could hypothesize that ‘c’ regression coefficient could represent the intrinsic potential of an embryo. This intrinsic potential could be dependent from oocyte originating the embryo. These hypotheses should be confirmed by studies analyzing relationship between regression coefficients and ART parameters.

Keywords: ART procedure, blastocyst formation, time-lapse incubator, quadratic model

Procedia PDF Downloads 281
3056 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values

Authors: Daniel Fundi Murithi

Abstract:

Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.

Keywords: finite population total, missing data, model-based imputation, two-phase sampling

Procedia PDF Downloads 98
3055 A Novel Approach towards Test Case Prioritization Technique

Authors: Kamna Solanki, Yudhvir Singh, Sandeep Dalal

Abstract:

Software testing is a time and cost intensive process. A scrutiny of the code and rigorous testing is required to identify and rectify the putative bugs. The process of bug identification and its consequent correction is continuous in nature and often some of the bugs are removed after the software has been launched in the market. This process of code validation of the altered software during the maintenance phase is termed as Regression testing. Regression testing ubiquitously considers resource constraints; therefore, the deduction of an appropriate set of test cases, from the ensemble of the entire gamut of test cases, is a critical issue for regression test planning. This paper presents a novel method for designing a suitable prioritization process to optimize fault detection rate and performance of regression test on predefined constraints. The proposed method for test case prioritization m-ACO alters the food source selection criteria of natural ants and is basically a modified version of Ant Colony Optimization (ACO). The proposed m-ACO approach has been coded in 'Perl' language and results are validated using three examples by computation of Average Percentage of Faults Detected (APFD) metric.

Keywords: regression testing, software testing, test case prioritization, test suite optimization

Procedia PDF Downloads 301
3054 Prediction of the Thermodynamic Properties of Hydrocarbons Using Gaussian Process Regression

Authors: N. Alhazmi

Abstract:

Knowing the thermodynamics properties of hydrocarbons is vital when it comes to analyzing the related chemical reaction outcomes and understanding the reaction process, especially in terms of petrochemical industrial applications, combustions, and catalytic reactions. However, measuring the thermodynamics properties experimentally is time-consuming and costly. In this paper, Gaussian process regression (GPR) has been used to directly predict the main thermodynamic properties - standard enthalpy of formation, standard entropy, and heat capacity -for more than 360 cyclic and non-cyclic alkanes, alkenes, and alkynes. A simple workflow has been proposed that can be applied to directly predict the main properties of any hydrocarbon by knowing its descriptors and chemical structure and can be generalized to predict the main properties of any material. The model was evaluated by calculating the statistical error R², which was more than 0.9794 for all the predicted properties.

Keywords: thermodynamic, Gaussian process regression, hydrocarbons, regression, supervised learning, entropy, enthalpy, heat capacity

Procedia PDF Downloads 182
3053 Solving Single Machine Total Weighted Tardiness Problem Using Gaussian Process Regression

Authors: Wanatchapong Kongkaew

Abstract:

This paper proposes an application of probabilistic technique, namely Gaussian process regression, for estimating an optimal sequence of the single machine with total weighted tardiness (SMTWT) scheduling problem. In this work, the Gaussian process regression (GPR) model is utilized to predict an optimal sequence of the SMTWT problem, and its solution is improved by using an iterated local search based on simulated annealing scheme, called GPRISA algorithm. The results show that the proposed GPRISA method achieves a very good performance and a reasonable trade-off between solution quality and time consumption. Moreover, in the comparison of deviation from the best-known solution, the proposed mechanism noticeably outperforms the recently existing approaches.

Keywords: Gaussian process regression, iterated local search, simulated annealing, single machine total weighted tardiness

Procedia PDF Downloads 274
3052 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski

Abstract:

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Keywords: bootstrap, edgeworth approximation, IID, quantile

Procedia PDF Downloads 121
3051 Asymptotic Spectral Theory for Nonlinear Random Fields

Authors: Karima Kimouche

Abstract:

In this paper, we consider the asymptotic problems in spectral analysis of stationary causal random fields. We impose conditions only involving (conditional) moments, which are easily verifiable for a variety of nonlinear random fields. Limiting distributions of periodograms and smoothed periodogram spectral density estimates are obtained and applications to the spectral domain bootstrap are given.

Keywords: spatial nonlinear processes, spectral estimators, GMC condition, bootstrap method

Procedia PDF Downloads 418
3050 Transition From Economic Growth-Energy Use to Green Growth-Green Energy Towards Environmental Quality: Evidence from Africa Using Econometric Approaches

Authors: Jackson Niyongabo

Abstract:

This study addresses a notable gap in the existing literature on the relationship between energy consumption, economic growth, and CO₂ emissions, particularly within the African context. While numerous studies have explored these dynamics globally and regionally across various development levels, few have delved into the nuances of regions and income levels specific to African countries. Furthermore, the evaluation of the interplay between green growth policies, green energy technologies, and their impact on environmental quality has been underexplored. This research aims to fill these gaps by conducting a comprehensive analysis of the transition from conventional economic growth and energy consumption to a paradigm of green growth coupled with green energy utilization across the African continent from 1980 to 2018. The study is structured into three main parts: an empirical examination of the long-term effects of energy intensity, renewable energy consumption, and economic growth on CO₂ emissions across diverse African regions and income levels; an estimation of the long-term impact of green growth and green energy use on CO₂ emissions for countries implementing green policies within Africa, as well as at regional and global levels; and a comparative analysis of the impact of green growth policies on environmental degradation before and after implementation. Employing advanced econometric methods and panel estimators, the study utilizes a testing framework, panel unit tests, and various estimators to derive meaningful insights. The anticipated results and conclusions will be elucidated through causality tests, impulse response, and variance decomposition analyses, contributing valuable knowledge to the discourse on sustainable development in the African context.

Keywords: economic growth, green growth, energy consumption, CO₂ emissions, econometric models, green energy

Procedia PDF Downloads 24
3049 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 430
3048 Economic Analysis of Cowpea (Unguiculata spp) Production in Northern Nigeria: A Case Study of Kano Katsina and Jigawa States

Authors: Yakubu Suleiman, S. A. Musa

Abstract:

Nigeria is the largest cowpea producer in the world, accounting for about 45%, followed by Brazil with about 17%. Cowpea is grown in Kano, Bauchi, Katsina, Borno in the north, Oyo in the west, and to the lesser extent in Enugu in the east. This study was conducted to determine the input–output relationship of Cowpea production in Kano, Katsina, and Jigawa states of Nigeria. The data were collected with the aid of 1000 structured questionnaires that were randomly distributed to Cowpea farmers in the three states mentioned above of the study area. The data collected were analyzed using regression analysis (Cobb–Douglass production function model). The result of the regression analysis revealed the coefficient of multiple determinations, R2, to be 72.5% and the F ration to be 106.20 and was found to be significant (P < 0.01). The regression coefficient of constant is 0.5382 and is significant (P < 0.01). The regression coefficient with respect to labor and seeds were 0.65554 and 0.4336, respectively, and they are highly significant (P < 0.01). The regression coefficient with respect to fertilizer is 0.26341 which is significant (P < 0.05). This implies that a unit increase of any one of the variable inputs used while holding all other variables inputs constants, will significantly increase the total Cowpea output by their corresponding coefficient. This indicated that farmers in the study area are operating in stage II of the production function. The result revealed that Cowpea farmer in Kano, Jigawa and Katsina States realized a profit of N15,997, N34,016 and N19,788 per hectare respectively. It is hereby recommended that more attention should be given to Cowpea production by government and research institutions.

Keywords: coefficient, constant, inputs, regression

Procedia PDF Downloads 381