Search results for: mixture regression model
19692 Robust Shrinkage Principal Component Parameter Estimator for Combating Multicollinearity and Outliers’ Problems in a Poisson Regression Model
Authors: Arum Kingsley Chinedu, Ugwuowo Fidelis Ifeanyi, Oranye Henrietta Ebele
Abstract:
The Poisson regression model (PRM) is a nonlinear model that belongs to the exponential family of distribution. PRM is suitable for studying count variables using appropriate covariates and sometimes experiences the problem of multicollinearity in the explanatory variables and outliers on the response variable. This study aims to address the problem of multicollinearity and outliers jointly in a Poisson regression model. We developed an estimator called the robust modified jackknife PCKL parameter estimator by combining the principal component estimator, modified jackknife KL and transformed M-estimator estimator to address both problems in a PRM. The superiority conditions for this estimator were established, and the properties of the estimator were also derived. The estimator inherits the characteristics of the combined estimators, thereby making it efficient in addressing both problems. And will also be of immediate interest to the research community and advance this study in terms of novelty compared to other studies undertaken in this area. The performance of the estimator (robust modified jackknife PCKL) with other existing estimators was compared using mean squared error (MSE) as a performance evaluation criterion through a Monte Carlo simulation study and the use of real-life data. The results of the analytical study show that the estimator outperformed other existing estimators compared with by having the smallest MSE across all sample sizes, different levels of correlation, percentages of outliers and different numbers of explanatory variables.Keywords: jackknife modified KL, outliers, multicollinearity, principal component, transformed M-estimator.
Procedia PDF Downloads 6619691 Factors Related with Self-Care Behaviors among Iranian Type 2 Diabetic Patients: An Application of Health Belief Model
Authors: Ali Soroush, Mehdi Mirzaei Alavijeh, Touraj Ahmadi Jouybari, Fazel Zinat-Motlagh, Abbas Aghaei, Mari Ataee
Abstract:
Diabetes is a disease with long cardiovascular, renal, ophthalmic and neural complications. It is prevalent all around the world including Iran, and its prevalence is increasing. The aim of this study was to determine the factors related to self-care behavior based on health belief model among sample of Iranian diabetic patients. This cross-sectional study was conducted among 301 type 2 diabetic patients in Gachsaran, Iran. Data collection was based on an interview and the data were analyzed by SPSS version 20 using ANOVA, t-tests, Pearson correlation, and linear regression statistical tests at 95% significant level. Linear regression analyses showed the health belief model variables accounted for 29% of the variation in self-care behavior; and perceived severity and perceived self-efficacy are more influential predictors on self-care behavior among diabetic patients.Keywords: diabetes, patients, self-care behaviors, health belief model
Procedia PDF Downloads 46819690 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Models
Authors: Alam Ali, Ashok Kumar Pathak
Abstract:
Path analysis is a statistical technique used to evaluate the direct and indirect effects of variables in path models. One or more structural regression equations are used to estimate a series of parameters in path models to find the better fit of data. However, sometimes the assumptions of classical regression models, such as ordinary least squares (OLS), are violated by the nature of the data, resulting in insignificant direct and indirect effects of exogenous variables. This article aims to explore the effectiveness of a copula-based regression approach as an alternative to classical regression, specifically when variables are linked through an elliptical copula.Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique
Procedia PDF Downloads 4119689 Factors for Entry Timing Choices Using Principal Axis Factorial Analysis and Logistic Regression Model
Authors: C. M. Mat Isa, H. Mohd Saman, S. R. Mohd Nasir, A. Jaapar
Abstract:
International market expansion involves a strategic process of market entry decision through which a firm expands its operation from domestic to the international domain. Hence, entry timing choices require the needs to balance the early entry risks and the problems in losing opportunities as a result of late entry into a new market. Questionnaire surveys administered to 115 Malaysian construction firms operating in 51 countries worldwide have resulted in 39.1 percent response rate. Factor analysis was used to determine the most significant factors affecting entry timing choices of the firms to penetrate the international market. A logistic regression analysis used to examine the firms’ entry timing choices, indicates that the model has correctly classified 89.5 per cent of cases as late movers. The findings reveal that the most significant factor influencing the construction firms’ choices as late movers was the firm factor related to the firm’s international experience, resources, competencies and financing capacity. The study also offers valuable information to construction firms with intention to internationalize their businesses.Keywords: factors, early movers, entry timing choices, late movers, logistic regression model, principal axis factorial analysis, Malaysian construction firms
Procedia PDF Downloads 37619688 A Regression Model for Predicting Sugar Crystal Size in a Fed-Batch Vacuum Evaporative Crystallizer
Authors: Sunday B. Alabi, Edikan P. Felix, Aniediong M. Umo
Abstract:
Crystal size distribution is of great importance in the sugar factories. It determines the market value of granulated sugar and also influences the cost of production of sugar crystals. Typically, sugar is produced using fed-batch vacuum evaporative crystallizer. The crystallization quality is examined by crystal size distribution at the end of the process which is quantified by two parameters: the average crystal size of the distribution in the mean aperture (MA) and the width of the distribution of the coefficient of variation (CV). Lack of real-time measurement of the sugar crystal size hinders its feedback control and eventual optimisation of the crystallization process. An attractive alternative is to use a soft sensor (model-based method) for online estimation of the sugar crystal size. Unfortunately, the available models for sugar crystallization process are not suitable as they do not contain variables that can be measured easily online. The main contribution of this paper is the development of a regression model for estimating the sugar crystal size as a function of input variables which are easy to measure online. This has the potential to provide real-time estimates of crystal size for its effective feedback control. Using 7 input variables namely: initial crystal size (Lo), temperature (T), vacuum pressure (P), feed flowrate (Ff), steam flowrate (Fs), initial super-saturation (S0) and crystallization time (t), preliminary studies were carried out using Minitab 14 statistical software. Based on the existing sugar crystallizer models, and the typical ranges of these 7 input variables, 128 datasets were obtained from a 2-level factorial experimental design. These datasets were used to obtain a simple but online-implementable 6-input crystal size model. It seems the initial crystal size (Lₒ) does not play a significant role. The goodness of the resulting regression model was evaluated. The coefficient of determination, R² was obtained as 0.994, and the maximum absolute relative error (MARE) was obtained as 4.6%. The high R² (~1.0) and the reasonably low MARE values are an indication that the model is able to predict sugar crystal size accurately as a function of the 6 easy-to-measure online variables. Thus, the model can be used as a soft sensor to provide real-time estimates of sugar crystal size during sugar crystallization process in a fed-batch vacuum evaporative crystallizer.Keywords: crystal size, regression model, soft sensor, sugar, vacuum evaporative crystallizer
Procedia PDF Downloads 20819687 Economic Analysis of Cowpea (Unguiculata spp) Production in Northern Nigeria: A Case Study of Kano Katsina and Jigawa States
Authors: Yakubu Suleiman, S. A. Musa
Abstract:
Nigeria is the largest cowpea producer in the world, accounting for about 45%, followed by Brazil with about 17%. Cowpea is grown in Kano, Bauchi, Katsina, Borno in the north, Oyo in the west, and to the lesser extent in Enugu in the east. This study was conducted to determine the input–output relationship of Cowpea production in Kano, Katsina, and Jigawa states of Nigeria. The data were collected with the aid of 1000 structured questionnaires that were randomly distributed to Cowpea farmers in the three states mentioned above of the study area. The data collected were analyzed using regression analysis (Cobb–Douglass production function model). The result of the regression analysis revealed the coefficient of multiple determinations, R2, to be 72.5% and the F ration to be 106.20 and was found to be significant (P < 0.01). The regression coefficient of constant is 0.5382 and is significant (P < 0.01). The regression coefficient with respect to labor and seeds were 0.65554 and 0.4336, respectively, and they are highly significant (P < 0.01). The regression coefficient with respect to fertilizer is 0.26341 which is significant (P < 0.05). This implies that a unit increase of any one of the variable inputs used while holding all other variables inputs constants, will significantly increase the total Cowpea output by their corresponding coefficient. This indicated that farmers in the study area are operating in stage II of the production function. The result revealed that Cowpea farmer in Kano, Jigawa and Katsina States realized a profit of N15,997, N34,016 and N19,788 per hectare respectively. It is hereby recommended that more attention should be given to Cowpea production by government and research institutions.Keywords: coefficient, constant, inputs, regression
Procedia PDF Downloads 40919686 Climate Changes in Albania and Their Effect on Cereal Yield
Authors: Lule Basha, Eralda Gjika
Abstract:
This study is focused on analyzing climate change in Albania and its potential effects on cereal yields. Initially, monthly temperature and rainfalls in Albania were studied for the period 1960-2021. Climacteric variables are important variables when trying to model cereal yield behavior, especially when significant changes in weather conditions are observed. For this purpose, in the second part of the study, linear and nonlinear models explaining cereal yield are constructed for the same period, 1960-2021. The multiple linear regression analysis and lasso regression method are applied to the data between cereal yield and each independent variable: average temperature, average rainfall, fertilizer consumption, arable land, land under cereal production, and nitrous oxide emissions. In our regression model, heteroscedasticity is not observed, data follow a normal distribution, and there is a low correlation between factors, so we do not have the problem of multicollinearity. Machine-learning methods, such as random forest, are used to predict cereal yield responses to climacteric and other variables. Random Forest showed high accuracy compared to the other statistical models in the prediction of cereal yield. We found that changes in average temperature negatively affect cereal yield. The coefficients of fertilizer consumption, arable land, and land under cereal production are positively affecting production. Our results show that the Random Forest method is an effective and versatile machine-learning method for cereal yield prediction compared to the other two methods.Keywords: cereal yield, climate change, machine learning, multiple regression model, random forest
Procedia PDF Downloads 9119685 Influence of Thermal History on the Undrained Shear Strength of the Bentonite-Sand Mixture
Authors: K. Ravi, Sabu Subhash
Abstract:
Densely compacted bentonite or bentonite–sand mixture has been identified as a suitable buffer in the deep geological repository (DGR) for the safe disposal of high-level nuclear waste (HLW) due to its favourable physicochemical and hydro-mechanical properties. The addition of sand to the bentonite enhances the thermal conductivity and compaction properties and reduces the drying shrinkage of the buffer material. The buffer material may undergo cyclic wetting and drying upon ingress of groundwater from the surrounding rock mass and from evaporation due to high temperature (50–210 °C) derived from the waste canister. The cycles of changes in temperature may result in thermal history, and the hydro-mechanical properties of the buffer material may be affected. This paper examines the influence of thermal history on the undrained shear strength of bentonite and bentonite-sand mixture. Bentonite from Rajasthan state and sand from the Assam state of India are used in this study. The undrained shear strength values are obtained by conducting unconfined compressive strength (UCS) tests on cylindrical specimens (dry densities 1.30 and 1.5 Mg/m3) of bentonite and bentonite-sand mixture consisting of 30 % bentonite+ 70 % sand. The specimens are preheated at temperatures varying from 50-150 °C for one, two and four hours in hot air oven. The results indicate that the undrained shear strength is increased by the thermal history of the buffer material. The specimens of bentonite-sand mixture exhibited more increase in strength compared to the pure bentonite specimens. This indicates that the sand content of the mixture plays a vital role in taking the thermal stresses of the bentonite buffer in DGR conditions.Keywords: bentonite, deep geological repository, thermal history, undrained shear strength
Procedia PDF Downloads 34519684 Fuzzy Logic Classification Approach for Exponential Data Set in Health Care System for Predication of Future Data
Authors: Manish Pandey, Gurinderjit Kaur, Meenu Talwar, Sachin Chauhan, Jagbir Gill
Abstract:
Health-care management systems are a unit of nice connection as a result of the supply a straightforward and fast management of all aspects relating to a patient, not essentially medical. What is more, there are unit additional and additional cases of pathologies during which diagnosing and treatment may be solely allotted by victimization medical imaging techniques. With associate ever-increasing prevalence, medical pictures area unit directly acquired in or regenerate into digital type, for his or her storage additionally as sequent retrieval and process. Data Mining is the process of extracting information from large data sets through using algorithms and Techniques drawn from the field of Statistics, Machine Learning and Data Base Management Systems. Forecasting may be a prediction of what's going to occur within the future, associated it's an unsure method. Owing to the uncertainty, the accuracy of a forecast is as vital because the outcome foretold by foretelling the freelance variables. A forecast management should be wont to establish if the accuracy of the forecast is within satisfactory limits. Fuzzy regression strategies have normally been wont to develop shopper preferences models that correlate the engineering characteristics with shopper preferences relating to a replacement product; the patron preference models offer a platform, wherever by product developers will decide the engineering characteristics so as to satisfy shopper preferences before developing the merchandise. Recent analysis shows that these fuzzy regression strategies area units normally will not to model client preferences. We tend to propose a Testing the strength of Exponential Regression Model over regression toward the mean Model.Keywords: health-care management systems, fuzzy regression, data mining, forecasting, fuzzy membership function
Procedia PDF Downloads 27919683 A Model for Diagnosis and Prediction of Coronavirus Using Neural Network
Authors: Sajjad Baghernezhad
Abstract:
Meta-heuristic and hybrid algorithms have high adeer in modeling medical problems. In this study, a neural network was used to predict covid-19 among high-risk and low-risk patients. This study was conducted to collect the applied method and its target population consisting of 550 high-risk and low-risk patients from the Kerman University of medical sciences medical center to predict the coronavirus. In this study, the memetic algorithm, which is a combination of a genetic algorithm and a local search algorithm, has been used to update the weights of the neural network and develop the accuracy of the neural network. The initial study showed that the accuracy of the neural network was 88%. After updating the weights, the memetic algorithm increased by 93%. For the proposed model, sensitivity, specificity, positive predictivity value, value/accuracy to 97.4, 92.3, 95.8, 96.2, and 0.918, respectively; for the genetic algorithm model, 87.05, 9.20 7, 89.45, 97.30 and 0.967 and for logistic regression model were 87.40, 95.20, 93.79, 0.87 and 0.916. Based on the findings of this study, neural network models have a lower error rate in the diagnosis of patients based on individual variables and vital signs compared to the regression model. The findings of this study can help planners and health care providers in signing programs and early diagnosis of COVID-19 or Corona.Keywords: COVID-19, decision support technique, neural network, genetic algorithm, memetic algorithm
Procedia PDF Downloads 6619682 Oral Toxicity of Low Doses of Fungicides, Propinebe, Propiconazole and Their Mixtures in the Male Rat
Authors: Mallem Leila, Aiche Mohamed Amine, Boulakoud Mohamed Salah
Abstract:
A number of chemical compounds are being used to protect agricultural crops from diseases. Residues of these chemicals lead to environmental pollution and pose some threat to non target organisms, human and animal. The aim of this study is to detect the toxicity of these fungicides and their mixtures in the fertility and biochemical’s parameters in the rat. The male of rats (28) were used, they were divided in four groups (7 rats of each group) and one group was used as control. Rats were dosed orally with propiconazole (60 mg/kg body weight/day), propinebe (100 mg/Kg body weight/day) and their mixture (50:50) for 4 weeks. Animals were observed for clinical toxicity. At the end of treatment period, animals of all groups were scarified and samples of different organs were fixed in the formol 10% for histopathological study, and blood was collected for hematological and biochemical’s analysis. The results indicated that the fungicide and their mixture of fungicides were toxic in the treated animals. The semen study showed a decrease in the count, mobility and speed of spermatozoa in all treated group especially those dosed with the mixture and Propiconazole, it was also a decrease in the weight of the testis and epidydimis in the treated group as compared with control. Remarquable histological changes were observed in the testis and epidydimis and liver in the group treated with mixture.Keywords: fungicides, mixture, fertility, hematological, biochemical's parameters
Procedia PDF Downloads 57119681 Analysis of Spatial Heterogeneity of Residential Prices in Guangzhou: An Actual Study Based on Point of Interest Geographically Weighted Regression Model
Authors: Zichun Guo
Abstract:
Guangzhou's house price has long been lower than the other three major cities; with the gradual increase in Guangzhou's house price, the influencing factors of house price have gradually been paid attention to; this paper tries to use house price data and POI (Point of Interest) data, and explores the distribution of house price and influencing factors by applying the Kriging spatial interpolation method and geographically weighted regression model in ArcGIS. The results show that the interpolation result of house price has a significant relationship with the economic development and development potential of the region and that different POI types have different impacts on the growth of house prices in different regions.Keywords: POI, house price, spatial heterogeneity, Guangzhou
Procedia PDF Downloads 5519680 The Effects of the Waste Plastic Modification of the Asphalt Mixture on the Permanent Deformation
Authors: Soheil Heydari, Ailar Hajimohammadi, Nasser Khalili
Abstract:
The application of plastic waste for asphalt modification is a sustainable strategy to deal with the enormous plastic waste generated each year and enhance the properties of asphalt. The modification is either practiced by the dry process or the wet process. In the dry process, plastics are added straight into the asphalt mixture, and in the wet process, they are mixed and digested into bitumen. In this article, the effects of plastic inclusion in asphalt mixture, through the dry process, on the permanent deformation of the asphalt are investigated. The main waste plastics that are usually used in asphalt modification are taken into account, which is linear, low-density polyethylene, low-density polyethylene, high-density polyethylene, and polypropylene. Also, to simulate a plastic waste stream, different grades of each virgin plastic are mixed and used. For instance, four different grades of polypropylene are mixed and used as representative of polypropylene. A precisely designed mixing condition is considered to dry-mix the plastics into the mixture such that the polymer was melted and modified by the later introduced binder. In this mixing process, plastics are first added to the hot aggregates and mixed three times in different time intervals, then bitumen is introduced, and the whole mixture is mixed three times in fifteen minutes intervals. Marshall specimens were manufactured, and dynamic creep tests were conducted to evaluate the effects of modification on the permanent deformation of the asphalt mixture. Dynamic creep is a common repeated loading test conducted at different stress levels and temperatures. Loading cycles are applied to the AC specimen until failure occurs; with the amount of deformation constantly recorded, the cumulative, permanent strain is determined and reported as a function of the number of cycles. The results of this study showed that the dry inclusion of the waste plastics is very effective in enhancing the resistance against permanent deformation of the mixture. However, the mixing process must be precisely engineered to melt the plastics, and a homogenous mixture is achieved.Keywords: permanent deformation, waste plastics, low-density polyethene, high-density polyethene, polypropylene, linear low-density polyethene, dry process
Procedia PDF Downloads 8819679 An Epsilon Hierarchical Fuzzy Twin Support Vector Regression
Authors: Arindam Chaudhuri
Abstract:
The research presents epsilon- hierarchical fuzzy twin support vector regression (epsilon-HFTSVR) based on epsilon-fuzzy twin support vector regression (epsilon-FTSVR) and epsilon-twin support vector regression (epsilon-TSVR). Epsilon-FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon-TSVR which takes care of uncertainty existing in forecasting problems. Epsilon-FTSVR determines a pair of epsilon-insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon-FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon-FTSVR is then reformulated as epsilon-HFTSVR consisting of a set of hierarchical layers each containing epsilon-FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon-HFTSVR has remarkable generalization performance with minimum training time.Keywords: regression, epsilon-TSVR, epsilon-FTSVR, epsilon-HFTSVR
Procedia PDF Downloads 37519678 Minimizing the Impact of Covariate Detection Limit in Logistic Regression
Authors: Shahadut Hossain, Jacek Wesolowski, Zahirul Hoque
Abstract:
In many epidemiological and environmental studies covariate measurements are subject to the detection limit. In most applications, covariate measurements are usually truncated from below which is known as left-truncation. Because the measuring device, which we use to measure the covariate, fails to detect values falling below the certain threshold. In regression analyses, it causes inflated bias and inaccurate mean squared error (MSE) to the estimators. This paper suggests a response-based regression calibration method to correct the deleterious impact introduced by the covariate detection limit in the estimators of the parameters of simple logistic regression model. Compared to the maximum likelihood method, the proposed method is computationally simpler, and hence easier to implement. It is robust to the violation of distributional assumption about the covariate of interest. In producing correct inference, the performance of the proposed method compared to the other competing methods has been investigated through extensive simulations. A real-life application of the method is also shown using data from a population-based case-control study of non-Hodgkin lymphoma.Keywords: environmental exposure, detection limit, left truncation, bias, ad-hoc substitution
Procedia PDF Downloads 23619677 Non-Methane Hydrocarbons Emission during the Photocopying Process
Authors: Kiurski S. Jelena, Aksentijević M. Snežana, Kecić S. Vesna, Oros B. Ivana
Abstract:
The prosperity of electronic equipment in photocopying environment not only has improved work efficiency, but also has changed indoor air quality. Considering the number of photocopying employed, indoor air quality might be worse than in general office environments. Determining the contribution from any type of equipment to indoor air pollution is a complex matter. Non-methane hydrocarbons are known to have an important role of air quality due to their high reactivity. The presence of hazardous pollutants in indoor air has been detected in one photocopying shop in Novi Sad, Serbia. Air samples were collected and analyzed for five days, during 8-hr working time in three-time intervals, whereas three different sampling points were determined. Using multiple linear regression model and software package STATISTICA 10 the concentrations of occupational hazards and micro-climates parameters were mutually correlated. Based on the obtained multiple coefficients of determination (0.3751, 0.2389, and 0.1975), a weak positive correlation between the observed variables was determined. Small values of parameter F indicated that there was no statistically significant difference between the concentration levels of non-methane hydrocarbons and micro-climates parameters. The results showed that variable could be presented by the general regression model: y = b0 + b1xi1+ b2xi2. Obtained regression equations allow to measure the quantitative agreement between the variation of variables and thus obtain more accurate knowledge of their mutual relations.Keywords: non-methane hydrocarbons, photocopying process, multiple regression analysis, indoor air quality, pollutant emission
Procedia PDF Downloads 37819676 Free Fatty Acid Assessment of Crude Palm Oil Using a Non-Destructive Approach
Authors: Siti Nurhidayah Naqiah Abdull Rani, Herlina Abdul Rahim, Rashidah Ghazali, Noramli Abdul Razak
Abstract:
Near infrared (NIR) spectroscopy has always been of great interest in the food and agriculture industries. The development of prediction models has facilitated the estimation process in recent years. In this study, 110 crude palm oil (CPO) samples were used to build a free fatty acid (FFA) prediction model. 60% of the collected data were used for training purposes and the remaining 40% used for testing. The visible peaks on the NIR spectrum were at 1725 nm and 1760 nm, indicating the existence of the first overtone of C-H bands. Principal component regression (PCR) was applied to the data in order to build this mathematical prediction model. The optimal number of principal components was 10. The results showed R2=0.7147 for the training set and R2=0.6404 for the testing set.Keywords: palm oil, fatty acid, NIRS, regression
Procedia PDF Downloads 50619675 Multicollinearity and MRA in Sustainability: Application of the Raise Regression
Authors: Claudia García-García, Catalina B. García-García, Román Salmerón-Gómez
Abstract:
Much economic-environmental research includes the analysis of possible interactions by using Moderated Regression Analysis (MRA), which is a specific application of multiple linear regression analysis. This methodology allows analyzing how the effect of one of the independent variables is moderated by a second independent variable by adding a cross-product term between them as an additional explanatory variable. Due to the very specification of the methodology, the moderated factor is often highly correlated with the constitutive terms. Thus, great multicollinearity problems arise. The appearance of strong multicollinearity in a model has important consequences. Inflated variances of the estimators may appear, there is a tendency to consider non-significant regressors that they probably are together with a very high coefficient of determination, incorrect signs of our coefficients may appear and also the high sensibility of the results to small changes in the dataset. Finally, the high relationship among explanatory variables implies difficulties in fixing the individual effects of each one on the model under study. These consequences shifted to the moderated analysis may imply that it is not worth including an interaction term that may be distorting the model. Thus, it is important to manage the problem with some methodology that allows for obtaining reliable results. After a review of those works that applied the MRA among the ten top journals of the field, it is clear that multicollinearity is mostly disregarded. Less than 15% of the reviewed works take into account potential multicollinearity problems. To overcome the issue, this work studies the possible application of recent methodologies to MRA. Particularly, the raised regression is analyzed. This methodology mitigates collinearity from a geometrical point of view: the collinearity problem arises because the variables under study are very close geometrically, so by separating both variables, the problem can be mitigated. Raise regression maintains the available information and modifies the problematic variables instead of deleting variables, for example. Furthermore, the global characteristics of the initial model are also maintained (sum of squared residuals, estimated variance, coefficient of determination, global significance test and prediction). The proposal is implemented to data from countries of the European Union during the last year available regarding greenhouse gas emissions, per capita GDP and a dummy variable that represents the topography of the country. The use of a dummy variable as the moderator is a special variant of MRA, sometimes called “subgroup regression analysis.” The main conclusion of this work is that applying new techniques to the field can improve in a substantial way the results of the analysis. Particularly, the use of raised regression mitigates great multicollinearity problems, so the researcher is able to rely on the interaction term when interpreting the results of a particular study.Keywords: multicollinearity, MRA, interaction, raise
Procedia PDF Downloads 10419674 Using the Bootstrap for Problems Statistics
Authors: Brahim Boukabcha, Amar Rebbouh
Abstract:
The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models
Procedia PDF Downloads 38019673 A Machine Learning Model for Predicting Students’ Academic Performance in Higher Institutions
Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu
Abstract:
There has been a need in recent years to predict student academic achievement prior to graduation. This is to assist them in improving their grades, especially for those who have struggled in the past. The purpose of this research is to use supervised learning techniques to create a model that predicts student academic progress. Many scholars have developed models that predict student academic achievement based on characteristics including smoking, demography, culture, social media, parent educational background, parent finances, and family background, to mention a few. This element, as well as the model used, could have misclassified the kids in terms of their academic achievement. As a prerequisite to predicting if the student will perform well in the future on related courses, this model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester. With a 96.7 percent accuracy, the model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost. This model is offered as a desktop application with user-friendly interfaces for forecasting student academic progress for both teachers and students. As a result, both students and professors are encouraged to use this technique to predict outcomes better.Keywords: artificial intelligence, ML, logistic regression, performance, prediction
Procedia PDF Downloads 10919672 Binary Logistic Regression Model in Predicting the Employability of Senior High School Graduates
Authors: Cromwell F. Gopo, Joy L. Picar
Abstract:
This study aimed to predict the employability of senior high school graduates for S.Y. 2018- 2019 in the Davao del Norte Division through quantitative research design using the descriptive status and predictive approaches among the indicated parameters, namely gender, school type, academics, academic award recipient, skills, values, and strand. The respondents of the study were the 33 secondary schools offering senior high school programs identified through simple random sampling, which resulted in 1,530 cases of graduates’ secondary data, which were analyzed using frequency, percentage, mean, standard deviation, and binary logistic regression. Results showed that the majority of the senior high school graduates who come from large schools were females. Further, less than half of these graduates received any academic award in any semester. In general, the graduates’ performance in academics, skills, and values were proficient. Moreover, less than half of the graduates were not employed. Then, those who were employed were either contractual, casual, or part-time workers dominated by GAS graduates. Further, the predictors of employability were gender and the Information and Communications Technology (ICT) strand, while the remaining variables did not add significantly to the model. The null hypothesis had been rejected as the coefficients of the predictors in the binary logistic regression equation did not take the value of 0. After utilizing the model, it was concluded that Technical-Vocational-Livelihood (TVL) graduates except ICT had greater estimates of employability.Keywords: employability, senior high school graduates, Davao del Norte, Philippines
Procedia PDF Downloads 15219671 Model Averaging in a Multiplicative Heteroscedastic Model
Authors: Alan Wan
Abstract:
In recent years, the body of literature on frequentist model averaging in statistics has grown significantly. Most of this work focuses on models with different mean structures but leaves out the variance consideration. In this paper, we consider a regression model with multiplicative heteroscedasticity and develop a model averaging method that combines maximum likelihood estimators of unknown parameters in both the mean and variance functions of the model. Our weight choice criterion is based on a minimisation of a plug-in estimator of the model average estimator's squared prediction risk. We prove that the new estimator possesses an asymptotic optimality property. Our investigation of finite-sample performance by simulations demonstrates that the new estimator frequently exhibits very favourable properties compared to some existing heteroscedasticity-robust model average estimators. The model averaging method hedges against the selection of very bad models and serves as a remedy to variance function misspecification, which often discourages practitioners from modeling heteroscedasticity altogether. The proposed model average estimator is applied to the analysis of two real data sets.Keywords: heteroscedasticity-robust, model averaging, multiplicative heteroscedasticity, plug-in, squared prediction risk
Procedia PDF Downloads 38419670 Poverty Dynamics in Thailand: Evidence from Household Panel Data
Authors: Nattabhorn Leamcharaskul
Abstract:
This study aims to examine determining factors of the dynamics of poverty in Thailand by using panel data of 3,567 households in 2007-2017. Four techniques of estimation are employed to analyze the situation of poverty across households and time periods: the multinomial logit model, the sequential logit model, the quantile regression model, and the difference in difference model. Households are categorized based on their experiences into 5 groups, namely chronically poor, falling into poverty, re-entering into poverty, exiting from poverty and never poor households. Estimation results emphasize the effects of demographic and socioeconomic factors as well as unexpected events on the economic status of a household. It is found that remittances have positive impact on household’s economic status in that they are likely to lower the probability of falling into poverty or trapping in poverty while they tend to increase the probability of exiting from poverty. In addition, not only receiving a secondary source of household income can raise the probability of being a never poor household, but it also significantly increases household income per capita of the chronically poor and falling into poverty households. Public work programs are recommended as an important tool to relieve household financial burden and uncertainty and thus consequently increase a chance for households to escape from poverty.Keywords: difference in difference, dynamic, multinomial logit model, panel data, poverty, quantile regression, remittance, sequential logit model, Thailand, transfer
Procedia PDF Downloads 11219669 BART Matching Method: Using Bayesian Additive Regression Tree for Data Matching
Authors: Gianna Zou
Abstract:
Propensity score matching (PSM), introduced by Paul R. Rosenbaum and Donald Rubin in 1983, is a popular statistical matching technique which tries to estimate the treatment effects by taking into account covariates that could impact the efficacy of study medication in clinical trials. PSM can be used to reduce the bias due to confounding variables. However, PSM assumes that the response values are normally distributed. In some cases, this assumption may not be held. In this paper, a machine learning method - Bayesian Additive Regression Tree (BART), is used as a more robust method of matching. BART can work well when models are misspecified since it can be used to model heterogeneous treatment effects. Moreover, it has the capability to handle non-linear main effects and multiway interactions. In this research, a BART Matching Method (BMM) is proposed to provide a more reliable matching method over PSM. By comparing the analysis results from PSM and BMM, BMM can perform well and has better prediction capability when the response values are not normally distributed.Keywords: BART, Bayesian, matching, regression
Procedia PDF Downloads 14719668 Determining the Factors Affecting Social Media Addiction (Virtual Tolerance, Virtual Communication), Phubbing, and Perception of Addiction in Nurses
Authors: Fatima Zehra Allahverdi, Nukhet Bayer
Abstract:
Objective: Three questions were formulated to examine stressful working units (intensive care units, emergency unit nurses) utilizing the self-perception theory and social support theory. This study provides a distinctive input by inspecting the combination of variables regarding stressful working environments. Method: The descriptive research was conducted with the participation of 400 nurses working at Ankara City Hospital. The study used Multivariate Analysis of Variance (MANOVA), regression analysis, and a mediation model. Hypothesis one used MANOVA followed by a Scheffe post hoc test. Hypothesis two utilized regression analysis using a hierarchical linear regression model. Hypothesis three used a mediation model. Result: The study utilized mediation analyses. Findings supported the hypotheses that intensive care units have significantly high scores in virtual communication and virtual tolerance. The number of years on the job, virtual communication, virtual tolerance, and phubbing significantly predicted 51% of the variance of perception of addiction. Interestingly, the number of years on the job, while significant, was negatively related to perception of addiction. Conclusion: The reasoning behind these findings and the lack of significance in the emergency unit is discussed. Around 7% of the variance of phubbing was accounted for through working in intensive care units. The model accounted for 26.80 % of the differences in the perception of addiction.Keywords: phubbing, social media, working units, years on the job, stress
Procedia PDF Downloads 5319667 Laboratory Evaluation of Rutting and Fatigue Damage Resistance of Asphalt Mixtures Modified with Carbon Nano Tubes
Authors: Ali Zain Ul Abadeen, Arshad Hussain
Abstract:
Roads are considered as the national capital, and huge developmental budget is spent on its construction, maintenance, and rehabilitation. Due to proliferating traffic volume, heavier loads and challenging environmental factors, the need for high-performance asphalt pavement is increased. In this research, the asphalt mixture was modified with carbon nanotubes ranging from 0.2% to 2% of binder to study the effect of CNT modification on rutting potential and fatigue life of asphalt mixtures. During this study, the conventional and modified asphalt mixture was subjected to a uni-axial dynamic creep test and dry Hamburg wheel tracking test to study rutting resistance. Fatigue behavior of asphalt mixture was studied using a four-point bending test apparatus. The plateau value of asphalt mixture was taken as a measure of fatigue performance according to the ratio of dissipated energy approach. Results of these experiments showed that CNT modified asphalt mixtures had reduced rut depth and increased rutting and fatigue resistance at higher percentages of carbon nanotubes.Keywords: carbon nanotubes, fatigue, four point bending test, modified asphalt, rutting
Procedia PDF Downloads 14419666 Influence of the Granular Mixture Properties on the Rheological Properties of Concrete: Yield Stress Determination Using Modified Chateau et al. Model
Authors: Rachid Zentar, Mokrane Bala, Pascal Boustingorry
Abstract:
The prediction of the rheological behavior of concrete is at the center of current concerns of the concrete industry for different reasons. The shortage of good quality standard materials combined with variable properties of available materials imposes to improve existing models to take into account these variations at the design stage of concrete. The main reasons for improving the predictive models are, of course, saving time and cost at the design stage as well as to optimize concrete performances. In this study, we will highlight the different properties of the granular mixtures that affect the rheological properties of concrete. Our objective is to identify the intrinsic parameters of the aggregates which make it possible to predict the yield stress of concrete. The work was done using two typologies of grains: crushed and rolled aggregates. The experimental results have shown that the rheology of concrete is improved by increasing the packing density of the granular mixture using rolled aggregates. The experimental program realized allowed to model the yield stress of concrete by a modified model of Chateau et al. through a dimensionless parameter following Krieger-Dougherty law. The modelling confirms that the yield stress of concrete depends not only on the properties of cement paste but also on the packing density of the granular skeleton and the shape of grains.Keywords: crushed aggregates, intrinsic viscosity, packing density, rolled aggregates, slump, yield stress of concrete
Procedia PDF Downloads 12619665 Ground Motion Modeling Using the Least Absolute Shrinkage and Selection Operator
Authors: Yildiz Stella Dak, Jale Tezcan
Abstract:
Ground motion models that relate a strong motion parameter of interest to a set of predictive seismological variables describing the earthquake source, the propagation path of the seismic wave, and the local site conditions constitute a critical component of seismic hazard analyses. When a sufficient number of strong motion records are available, ground motion relations are developed using statistical analysis of the recorded ground motion data. In regions lacking a sufficient number of recordings, a synthetic database is developed using stochastic, theoretical or hybrid approaches. Regardless of the manner the database was developed, ground motion relations are developed using regression analysis. Development of a ground motion relation is a challenging process which inevitably requires the modeler to make subjective decisions regarding the inclusion criteria of the recordings, the functional form of the model and the set of seismological variables to be included in the model. Because these decisions are critically important to the validity and the applicability of the model, there is a continuous interest on procedures that will facilitate the development of ground motion models. This paper proposes the use of the Least Absolute Shrinkage and Selection Operator (LASSO) in selecting the set predictive seismological variables to be used in developing a ground motion relation. The LASSO can be described as a penalized regression technique with a built-in capability of variable selection. Similar to the ridge regression, the LASSO is based on the idea of shrinking the regression coefficients to reduce the variance of the model. Unlike ridge regression, where the coefficients are shrunk but never set equal to zero, the LASSO sets some of the coefficients exactly to zero, effectively performing variable selection. Given a set of candidate input variables and the output variable of interest, LASSO allows ranking the input variables in terms of their relative importance, thereby facilitating the selection of the set of variables to be included in the model. Because the risk of overfitting increases as the ratio of the number of predictors to the number of recordings increases, selection of a compact set of variables is important in cases where a small number of recordings are available. In addition, identification of a small set of variables can improve the interpretability of the resulting model, especially when there is a large number of candidate predictors. A practical application of the proposed approach is presented, using more than 600 recordings from the National Geospatial-Intelligence Agency (NGA) database, where the effect of a set of seismological predictors on the 5% damped maximum direction spectral acceleration is investigated. The set of candidate predictors considered are Magnitude, Rrup, Vs30. Using LASSO, the relative importance of the candidate predictors has been ranked. Regression models with increasing levels of complexity were constructed using one, two, three, and four best predictors, and the models’ ability to explain the observed variance in the target variable have been compared. The bias-variance trade-off in the context of model selection is discussed.Keywords: ground motion modeling, least absolute shrinkage and selection operator, penalized regression, variable selection
Procedia PDF Downloads 33019664 Econometric Analysis of West African Countries’ Container Terminal Throughput and Gross Domestic Products
Authors: Kehinde Peter Oyeduntan, Kayode Oshinubi
Abstract:
The west African ports have been experiencing large inflow and outflow of containerized cargo in the last decades, and this has created a quest amongst the countries to attain the status of hub port for the sub-region. This study analyzed the relationship between the container throughput and Gross Domestic Products (GDP) of nine west African countries, using Simple Linear Regression (SLR), Polynomial Regression Model (PRM) and Support Vector Machines (SVM) with a time series of 20 years. The results showed that there exists a high correlation between the GDP and container throughput. The model also predicted the container throughput in west Africa for the next 20 years. The findings and recommendations presented in this research will guide policy makers and help improve the management of container ports and terminals in west Africa, thereby boosting the economy.Keywords: container, ports, terminals, throughput
Procedia PDF Downloads 21419663 A Survey on Quasi-Likelihood Estimation Approaches for Longitudinal Set-ups
Authors: Naushad Mamode Khan
Abstract:
The Com-Poisson (CMP) model is one of the most popular discrete generalized linear models (GLMS) that handles both equi-, over- and under-dispersed data. In longitudinal context, an integer-valued autoregressive (INAR(1)) process that incorporates covariate specification has been developed to model longitudinal CMP counts. However, the joint likelihood CMP function is difficult to specify and thus restricts the likelihood based estimating methodology. The joint generalized quasilikelihood approach (GQL-I) was instead considered but is rather computationally intensive and may not even estimate the regression effects due to a complex and frequently ill conditioned covariance structure. This paper proposes a new GQL approach for estimating the regression parameters (GQLIII) that are based on a single score vector representation. The performance of GQL-III is compared with GQL-I and separate marginal GQLs (GQL-II) through some simulation experiments and is proved to yield equally efficient estimates as GQL-I and is far more computationally stable.Keywords: longitudinal, com-Poisson, ill-conditioned, INAR(1), GLMS, GQL
Procedia PDF Downloads 354