Search results for: poisson regression model
18695 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors
Authors: Katawut Kaewbanjong
Abstract:
We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.Keywords: prediction model, statistical analysis, software project, user satisfaction factor
Procedia PDF Downloads 12418694 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models
Authors: Jihye Jeon
Abstract:
This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.Keywords: multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon
Procedia PDF Downloads 65218693 Predicting Options Prices Using Machine Learning
Authors: Krishang Surapaneni
Abstract:
The goal of this project is to determine how to predict important aspects of options, including the ask price. We want to compare different machine learning models to learn the best model and the best hyperparameters for that model for this purpose and data set. Option pricing is a relatively new field, and it can be very complicated and intimidating, especially to inexperienced people, so we want to create a machine learning model that can predict important aspects of an option stock, which can aid in future research. We tested multiple different models and experimented with hyperparameter tuning, trying to find some of the best parameters for a machine-learning model. We tested three different models: a Random Forest Regressor, a linear regressor, and an MLP (multi-layer perceptron) regressor. The most important feature in this experiment is the ask price; this is what we were trying to predict. In the field of stock pricing prediction, there is a large potential for error, so we are unable to determine the accuracy of the models based on if they predict the pricing perfectly. Due to this factor, we determined the accuracy of the model by finding the average percentage difference between the predicted and actual values. We tested the accuracy of the machine learning models by comparing the actual results in the testing data and the predictions made by the models. The linear regression model performed worst, with an average percentage error of 17.46%. The MLP regressor had an average percentage error of 11.45%, and the random forest regressor had an average percentage error of 7.42%Keywords: finance, linear regression model, machine learning model, neural network, stock price
Procedia PDF Downloads 7518692 Estimating Anthropometric Dimensions for Saudi Males Using Artificial Neural Networks
Authors: Waleed Basuliman
Abstract:
Anthropometric dimensions are considered one of the important factors when designing human-machine systems. In this study, the estimation of anthropometric dimensions has been improved by using Artificial Neural Network (ANN) model that is able to predict the anthropometric measurements of Saudi males in Riyadh City. A total of 1427 Saudi males aged 6 to 60 years participated in measuring 20 anthropometric dimensions. These anthropometric measurements are considered important for designing the work and life applications in Saudi Arabia. The data were collected during eight months from different locations in Riyadh City. Five of these dimensions were used as predictors variables (inputs) of the model, and the remaining 15 dimensions were set to be the measured variables (Model’s outcomes). The hidden layers varied during the structuring stage, and the best performance was achieved with the network structure 6-25-15. The results showed that the developed Neural Network model was able to estimate the body dimensions of Saudi male population in Riyadh City. The network's mean absolute percentage error (MAPE) and the root mean squared error (RMSE) were found to be 0.0348 and 3.225, respectively. These results were found less, and then better, than the errors found in the literature. Finally, the accuracy of the developed neural network was evaluated by comparing the predicted outcomes with regression model. The ANN model showed higher coefficient of determination (R2) between the predicted and actual dimensions than the regression model.Keywords: artificial neural network, anthropometric measurements, back-propagation
Procedia PDF Downloads 48718691 A Two Server Poisson Queue Operating under FCFS Discipline with an ‘m’ Policy
Authors: R. Sivasamy, G. Paulraj, S. Kalaimani, N.Thillaigovindan
Abstract:
For profitable businesses, queues are double-edged swords and hence the pain of long wait times in a queue often frustrates customers. This paper suggests a technical way of reducing the pain of lines through a Poisson M/M1, M2/2 queueing system operated by two heterogeneous servers with an objective of minimising the mean sojourn time of customers served under the queue discipline ‘First Come First Served with an ‘m’ policy, i.e. FCFS-m policy’. Arrivals to the system form a Poisson process of rate λ and are served by two exponential servers. The service times of successive customers at server ‘j’ are independent and identically distributed (i.i.d.) random variables and each of it is exponentially distributed with rate parameter μj (j=1, 2). The primary condition for implementing the queue discipline ‘FCFS-m policy’ on these service rates μj (j=1, 2) is that either (m+1) µ2 > µ1> m µ2 or (m+1) µ1 > µ2> m µ1 must be satisfied. Further waiting customers prefer the server-1 whenever it becomes available for service, and the server-2 should be installed if and only if the queue length exceeds the value ‘m’ as a threshold. Steady-state results on queue length and waiting time distributions have been obtained. A simple way of tracing the optimal service rate μ*2 of the server-2 is illustrated in a specific numerical exercise to equalize the average queue length cost with that of the service cost. Assuming that the server-1 has to dynamically adjust the service rates as μ1 during the system size is strictly less than T=(m+2) while μ2=0, and as μ1 +μ2 where μ2>0 if the system size is more than or equal to T, corresponding steady state results of M/M1+M2/1 queues have been deduced from those of M/M1,M2/2 queues. To conclude this investigation has a viable application, results of M/M1+M2/1 queues have been used in processing of those waiting messages into a single computer node and to measure the power consumption by the node.Keywords: two heterogeneous servers, M/M1, M2/2 queue, service cost and queue length cost, M/M1+M2/1 queue
Procedia PDF Downloads 36218690 Effects of Heat Treatment on the Elastic Constants of Cedar Wood
Authors: Tugba Yilmaz Aydin, Ergun Guntekin, Murat Aydin
Abstract:
Effects of heat treatment on the elastic constants of cedar wood (Cedrus libani) were investigated. Specimens were exposed to heat under atmospheric pressure at four different temperatures (120, 150, 180, 210 °C) and three different time levels (2, 5, 8 hours). Three Young’s modulus (EL, ER, ET) and six Poisson ratios (μLR, μLT, μRL, μRT, μTL, μTR) were determined from compression test using bi-axial extensometer at constant moisture content (12 %). Three shear modulus were determined using ultrasound. Six shear wave velocities propagating along the principal axes of anisotropy were measured using EPOCH 650 ultrasonic flaw detector with 1 MHz transverse transducers. The properties of the samples tested were significantly affected by heat treatment by different degree. As a result, softer treatments yielded some amount of increase in Young modulus and shear modulus values, but increase of time and temperature resulted in significant decrease for both values. Poisson ratios seemed insensitive to heat treatment.Keywords: cedar wood, elastic constants, heat treatment, ultrasound
Procedia PDF Downloads 38418689 Electrohydrodynamic Study of Microwave Plasma PECVD Reactor
Authors: Keltoum Bouherine, Olivier Leroy
Abstract:
The present work is dedicated to study a three–dimensional (3D) self-consistent fluid simulation of microwave discharges of argon plasma in PECVD reactor. The model solves the Maxwell’s equations, continuity equations for charged species and the electron energy balance equation, coupled with Poisson’s equation, and Navier-Stokes equations by finite element method, using COMSOL Multiphysics software. In this study, the simulations yield the profiles of plasma components as well as the charge densities and electron temperature, the electric field, the gas velocity, and gas temperature. The results show that the microwave plasma reactor is outside of local thermodynamic equilibrium.The present work is dedicated to study a three–dimensional (3D) self-consistent fluid simulation of microwave discharges of argon plasma in PECVD reactor. The model solves the Maxwell’s equations, continuity equations for charged species and the electron energy balance equation, coupled with Poisson’s equation, and Navier-Stokes equations by finite element method, using COMSOL Multiphysics software. In this study, the simulations yield the profiles of plasma components as well as the charge densities and electron temperature, the electric field, the gas velocity, and gas temperature. The results show that the microwave plasma reactor is outside of local thermodynamic equilibrium.Keywords: electron density, electric field, microwave plasma reactor, gas velocity, non-equilibrium plasma
Procedia PDF Downloads 33118688 Heart Attack Prediction Using Several Machine Learning Methods
Authors: Suzan Anwar, Utkarsh Goyal
Abstract:
Heart rate (HR) is a predictor of cardiovascular, cerebrovascular, and all-cause mortality in the general population, as well as in patients with cardio and cerebrovascular diseases. Machine learning (ML) significantly improves the accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment while avoiding unnecessary treatment of others. This research examines relationship between the individual's various heart health inputs like age, sex, cp, trestbps, thalach, oldpeaketc, and the likelihood of developing heart disease. Machine learning techniques like logistic regression and decision tree, and Python are used. The results of testing and evaluating the model using the Heart Failure Prediction Dataset show the chance of a person having a heart disease with variable accuracy. Logistic regression has yielded an accuracy of 80.48% without data handling. With data handling (normalization, standardscaler), the logistic regression resulted in improved accuracy of 87.80%, decision tree 100%, random forest 100%, and SVM 100%.Keywords: heart rate, machine learning, SVM, decision tree, logistic regression, random forest
Procedia PDF Downloads 13818687 Implementation of a Non-Poissonian Model in a Low-Seismicity Area
Authors: Ludivine Saint-Mard, Masato Nakajima, Gloria Senfaute
Abstract:
In areas with low to moderate seismicity, the probabilistic seismic hazard analysis frequently uses a Poisson approach, which assumes independence in time and space of events to determine the annual probability of earthquake occurrence. Nevertheless, in countries with high seismic rate, such as Japan, it is frequently use non-poissonian model which assumes that next earthquake occurrence depends on the date of previous one. The objective of this paper is to apply a non-poissonian models in a region of low to moderate seismicity to get a feedback on the following questions: can we overcome the lack of data to determine some key parameters?, and can we deal with uncertainties to apply largely this methodology on an industrial context?. The Brownian-Passage-Time model was applied to a fault located in France and conclude that even if the lack of data can be overcome with some calculations, the amount of uncertainties and number of scenarios leads to a numerous branches in PSHA, making this method difficult to apply on a large scale of low to moderate seismicity areas and in an industrial context.Keywords: probabilistic seismic hazard, non-poissonian model, earthquake occurrence, low seismicity
Procedia PDF Downloads 6218686 Policy Implications of Demographic Impacts on COVID-19, Pneumonia, and Influenza Mortality: A Multivariable Regression Approach to Death Toll Reduction
Authors: Saiakhil Chilaka
Abstract:
Understanding the demographic factors that influence mortality from respiratory diseases like COVID-19, pneumonia, and influenza is crucial for informing public health policy. This study utilizes multivariable regression models to assess the relationship between state, sex, and age group on deaths from these diseases using U.S. data from 2020 to 2023. The analysis reveals that age and sex play significant roles in mortality, while state-level variations are minimal. Although the model’s low R-squared values indicate that additional factors are at play, this paper discusses how these findings, in light of recent research, can inform future public health policy, resource allocation, and intervention strategies.Keywords: COVID-19, multivariable regression, public policy, data science
Procedia PDF Downloads 2118685 Multi-Linear Regression Based Prediction of Mass Transfer by Multiple Plunging Jets
Abstract:
The paper aims to compare the performance of vertical and inclined multiple plunging jets and to model and predict their mass transfer capacity by multi-linear regression based approach. The multiple vertical plunging jets have jet impact angle of θ = 90O; whereas, multiple inclined plunging jets have jet impact angle of θ = 600. The results of the study suggests that mass transfer is higher for multiple jets, and inclined multiple plunging jets have up to 1.6 times higher mass transfer than vertical multiple plunging jets under similar conditions. The derived relationship, based on multi-linear regression approach, has successfully predicted the volumetric mass transfer coefficient (KLa) from operational parameters of multiple plunging jets with a correlation coefficient of 0.973, root mean square error of 0.002 and coefficient of determination of 0.946. The results suggests that predicted overall mass transfer coefficient is in good agreement with actual experimental values; thereby suggesting the utility of derived relationship based on multi-linear regression based approach and can be successfully employed in modelling mass transfer by multiple plunging jets.Keywords: mass transfer, multiple plunging jets, multi-linear regression, earth sciences
Procedia PDF Downloads 46118684 Factors Related with Self-Care Behaviors among Iranian Type 2 Diabetic Patients: An Application of Health Belief Model
Authors: Ali Soroush, Mehdi Mirzaei Alavijeh, Touraj Ahmadi Jouybari, Fazel Zinat-Motlagh, Abbas Aghaei, Mari Ataee
Abstract:
Diabetes is a disease with long cardiovascular, renal, ophthalmic and neural complications. It is prevalent all around the world including Iran, and its prevalence is increasing. The aim of this study was to determine the factors related to self-care behavior based on health belief model among sample of Iranian diabetic patients. This cross-sectional study was conducted among 301 type 2 diabetic patients in Gachsaran, Iran. Data collection was based on an interview and the data were analyzed by SPSS version 20 using ANOVA, t-tests, Pearson correlation, and linear regression statistical tests at 95% significant level. Linear regression analyses showed the health belief model variables accounted for 29% of the variation in self-care behavior; and perceived severity and perceived self-efficacy are more influential predictors on self-care behavior among diabetic patients.Keywords: diabetes, patients, self-care behaviors, health belief model
Procedia PDF Downloads 46818683 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Models
Authors: Alam Ali, Ashok Kumar Pathak
Abstract:
Path analysis is a statistical technique used to evaluate the direct and indirect effects of variables in path models. One or more structural regression equations are used to estimate a series of parameters in path models to find the better fit of data. However, sometimes the assumptions of classical regression models, such as ordinary least squares (OLS), are violated by the nature of the data, resulting in insignificant direct and indirect effects of exogenous variables. This article aims to explore the effectiveness of a copula-based regression approach as an alternative to classical regression, specifically when variables are linked through an elliptical copula.Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique
Procedia PDF Downloads 4118682 Factors for Entry Timing Choices Using Principal Axis Factorial Analysis and Logistic Regression Model
Authors: C. M. Mat Isa, H. Mohd Saman, S. R. Mohd Nasir, A. Jaapar
Abstract:
International market expansion involves a strategic process of market entry decision through which a firm expands its operation from domestic to the international domain. Hence, entry timing choices require the needs to balance the early entry risks and the problems in losing opportunities as a result of late entry into a new market. Questionnaire surveys administered to 115 Malaysian construction firms operating in 51 countries worldwide have resulted in 39.1 percent response rate. Factor analysis was used to determine the most significant factors affecting entry timing choices of the firms to penetrate the international market. A logistic regression analysis used to examine the firms’ entry timing choices, indicates that the model has correctly classified 89.5 per cent of cases as late movers. The findings reveal that the most significant factor influencing the construction firms’ choices as late movers was the firm factor related to the firm’s international experience, resources, competencies and financing capacity. The study also offers valuable information to construction firms with intention to internationalize their businesses.Keywords: factors, early movers, entry timing choices, late movers, logistic regression model, principal axis factorial analysis, Malaysian construction firms
Procedia PDF Downloads 37818681 A Regression Model for Predicting Sugar Crystal Size in a Fed-Batch Vacuum Evaporative Crystallizer
Authors: Sunday B. Alabi, Edikan P. Felix, Aniediong M. Umo
Abstract:
Crystal size distribution is of great importance in the sugar factories. It determines the market value of granulated sugar and also influences the cost of production of sugar crystals. Typically, sugar is produced using fed-batch vacuum evaporative crystallizer. The crystallization quality is examined by crystal size distribution at the end of the process which is quantified by two parameters: the average crystal size of the distribution in the mean aperture (MA) and the width of the distribution of the coefficient of variation (CV). Lack of real-time measurement of the sugar crystal size hinders its feedback control and eventual optimisation of the crystallization process. An attractive alternative is to use a soft sensor (model-based method) for online estimation of the sugar crystal size. Unfortunately, the available models for sugar crystallization process are not suitable as they do not contain variables that can be measured easily online. The main contribution of this paper is the development of a regression model for estimating the sugar crystal size as a function of input variables which are easy to measure online. This has the potential to provide real-time estimates of crystal size for its effective feedback control. Using 7 input variables namely: initial crystal size (Lo), temperature (T), vacuum pressure (P), feed flowrate (Ff), steam flowrate (Fs), initial super-saturation (S0) and crystallization time (t), preliminary studies were carried out using Minitab 14 statistical software. Based on the existing sugar crystallizer models, and the typical ranges of these 7 input variables, 128 datasets were obtained from a 2-level factorial experimental design. These datasets were used to obtain a simple but online-implementable 6-input crystal size model. It seems the initial crystal size (Lₒ) does not play a significant role. The goodness of the resulting regression model was evaluated. The coefficient of determination, R² was obtained as 0.994, and the maximum absolute relative error (MARE) was obtained as 4.6%. The high R² (~1.0) and the reasonably low MARE values are an indication that the model is able to predict sugar crystal size accurately as a function of the 6 easy-to-measure online variables. Thus, the model can be used as a soft sensor to provide real-time estimates of sugar crystal size during sugar crystallization process in a fed-batch vacuum evaporative crystallizer.Keywords: crystal size, regression model, soft sensor, sugar, vacuum evaporative crystallizer
Procedia PDF Downloads 20818680 Economic Analysis of Cowpea (Unguiculata spp) Production in Northern Nigeria: A Case Study of Kano Katsina and Jigawa States
Authors: Yakubu Suleiman, S. A. Musa
Abstract:
Nigeria is the largest cowpea producer in the world, accounting for about 45%, followed by Brazil with about 17%. Cowpea is grown in Kano, Bauchi, Katsina, Borno in the north, Oyo in the west, and to the lesser extent in Enugu in the east. This study was conducted to determine the input–output relationship of Cowpea production in Kano, Katsina, and Jigawa states of Nigeria. The data were collected with the aid of 1000 structured questionnaires that were randomly distributed to Cowpea farmers in the three states mentioned above of the study area. The data collected were analyzed using regression analysis (Cobb–Douglass production function model). The result of the regression analysis revealed the coefficient of multiple determinations, R2, to be 72.5% and the F ration to be 106.20 and was found to be significant (P < 0.01). The regression coefficient of constant is 0.5382 and is significant (P < 0.01). The regression coefficient with respect to labor and seeds were 0.65554 and 0.4336, respectively, and they are highly significant (P < 0.01). The regression coefficient with respect to fertilizer is 0.26341 which is significant (P < 0.05). This implies that a unit increase of any one of the variable inputs used while holding all other variables inputs constants, will significantly increase the total Cowpea output by their corresponding coefficient. This indicated that farmers in the study area are operating in stage II of the production function. The result revealed that Cowpea farmer in Kano, Jigawa and Katsina States realized a profit of N15,997, N34,016 and N19,788 per hectare respectively. It is hereby recommended that more attention should be given to Cowpea production by government and research institutions.Keywords: coefficient, constant, inputs, regression
Procedia PDF Downloads 41018679 Climate Changes in Albania and Their Effect on Cereal Yield
Authors: Lule Basha, Eralda Gjika
Abstract:
This study is focused on analyzing climate change in Albania and its potential effects on cereal yields. Initially, monthly temperature and rainfalls in Albania were studied for the period 1960-2021. Climacteric variables are important variables when trying to model cereal yield behavior, especially when significant changes in weather conditions are observed. For this purpose, in the second part of the study, linear and nonlinear models explaining cereal yield are constructed for the same period, 1960-2021. The multiple linear regression analysis and lasso regression method are applied to the data between cereal yield and each independent variable: average temperature, average rainfall, fertilizer consumption, arable land, land under cereal production, and nitrous oxide emissions. In our regression model, heteroscedasticity is not observed, data follow a normal distribution, and there is a low correlation between factors, so we do not have the problem of multicollinearity. Machine-learning methods, such as random forest, are used to predict cereal yield responses to climacteric and other variables. Random Forest showed high accuracy compared to the other statistical models in the prediction of cereal yield. We found that changes in average temperature negatively affect cereal yield. The coefficients of fertilizer consumption, arable land, and land under cereal production are positively affecting production. Our results show that the Random Forest method is an effective and versatile machine-learning method for cereal yield prediction compared to the other two methods.Keywords: cereal yield, climate change, machine learning, multiple regression model, random forest
Procedia PDF Downloads 9118678 Fuzzy Logic Classification Approach for Exponential Data Set in Health Care System for Predication of Future Data
Authors: Manish Pandey, Gurinderjit Kaur, Meenu Talwar, Sachin Chauhan, Jagbir Gill
Abstract:
Health-care management systems are a unit of nice connection as a result of the supply a straightforward and fast management of all aspects relating to a patient, not essentially medical. What is more, there are unit additional and additional cases of pathologies during which diagnosing and treatment may be solely allotted by victimization medical imaging techniques. With associate ever-increasing prevalence, medical pictures area unit directly acquired in or regenerate into digital type, for his or her storage additionally as sequent retrieval and process. Data Mining is the process of extracting information from large data sets through using algorithms and Techniques drawn from the field of Statistics, Machine Learning and Data Base Management Systems. Forecasting may be a prediction of what's going to occur within the future, associated it's an unsure method. Owing to the uncertainty, the accuracy of a forecast is as vital because the outcome foretold by foretelling the freelance variables. A forecast management should be wont to establish if the accuracy of the forecast is within satisfactory limits. Fuzzy regression strategies have normally been wont to develop shopper preferences models that correlate the engineering characteristics with shopper preferences relating to a replacement product; the patron preference models offer a platform, wherever by product developers will decide the engineering characteristics so as to satisfy shopper preferences before developing the merchandise. Recent analysis shows that these fuzzy regression strategies area units normally will not to model client preferences. We tend to propose a Testing the strength of Exponential Regression Model over regression toward the mean Model.Keywords: health-care management systems, fuzzy regression, data mining, forecasting, fuzzy membership function
Procedia PDF Downloads 27918677 A Model for Diagnosis and Prediction of Coronavirus Using Neural Network
Authors: Sajjad Baghernezhad
Abstract:
Meta-heuristic and hybrid algorithms have high adeer in modeling medical problems. In this study, a neural network was used to predict covid-19 among high-risk and low-risk patients. This study was conducted to collect the applied method and its target population consisting of 550 high-risk and low-risk patients from the Kerman University of medical sciences medical center to predict the coronavirus. In this study, the memetic algorithm, which is a combination of a genetic algorithm and a local search algorithm, has been used to update the weights of the neural network and develop the accuracy of the neural network. The initial study showed that the accuracy of the neural network was 88%. After updating the weights, the memetic algorithm increased by 93%. For the proposed model, sensitivity, specificity, positive predictivity value, value/accuracy to 97.4, 92.3, 95.8, 96.2, and 0.918, respectively; for the genetic algorithm model, 87.05, 9.20 7, 89.45, 97.30 and 0.967 and for logistic regression model were 87.40, 95.20, 93.79, 0.87 and 0.916. Based on the findings of this study, neural network models have a lower error rate in the diagnosis of patients based on individual variables and vital signs compared to the regression model. The findings of this study can help planners and health care providers in signing programs and early diagnosis of COVID-19 or Corona.Keywords: COVID-19, decision support technique, neural network, genetic algorithm, memetic algorithm
Procedia PDF Downloads 6618676 Analysis of Spatial Heterogeneity of Residential Prices in Guangzhou: An Actual Study Based on Point of Interest Geographically Weighted Regression Model
Authors: Zichun Guo
Abstract:
Guangzhou's house price has long been lower than the other three major cities; with the gradual increase in Guangzhou's house price, the influencing factors of house price have gradually been paid attention to; this paper tries to use house price data and POI (Point of Interest) data, and explores the distribution of house price and influencing factors by applying the Kriging spatial interpolation method and geographically weighted regression model in ArcGIS. The results show that the interpolation result of house price has a significant relationship with the economic development and development potential of the region and that different POI types have different impacts on the growth of house prices in different regions.Keywords: POI, house price, spatial heterogeneity, Guangzhou
Procedia PDF Downloads 5518675 An Epsilon Hierarchical Fuzzy Twin Support Vector Regression
Authors: Arindam Chaudhuri
Abstract:
The research presents epsilon- hierarchical fuzzy twin support vector regression (epsilon-HFTSVR) based on epsilon-fuzzy twin support vector regression (epsilon-FTSVR) and epsilon-twin support vector regression (epsilon-TSVR). Epsilon-FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon-TSVR which takes care of uncertainty existing in forecasting problems. Epsilon-FTSVR determines a pair of epsilon-insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon-FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon-FTSVR is then reformulated as epsilon-HFTSVR consisting of a set of hierarchical layers each containing epsilon-FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon-HFTSVR has remarkable generalization performance with minimum training time.Keywords: regression, epsilon-TSVR, epsilon-FTSVR, epsilon-HFTSVR
Procedia PDF Downloads 37518674 Minimizing the Impact of Covariate Detection Limit in Logistic Regression
Authors: Shahadut Hossain, Jacek Wesolowski, Zahirul Hoque
Abstract:
In many epidemiological and environmental studies covariate measurements are subject to the detection limit. In most applications, covariate measurements are usually truncated from below which is known as left-truncation. Because the measuring device, which we use to measure the covariate, fails to detect values falling below the certain threshold. In regression analyses, it causes inflated bias and inaccurate mean squared error (MSE) to the estimators. This paper suggests a response-based regression calibration method to correct the deleterious impact introduced by the covariate detection limit in the estimators of the parameters of simple logistic regression model. Compared to the maximum likelihood method, the proposed method is computationally simpler, and hence easier to implement. It is robust to the violation of distributional assumption about the covariate of interest. In producing correct inference, the performance of the proposed method compared to the other competing methods has been investigated through extensive simulations. A real-life application of the method is also shown using data from a population-based case-control study of non-Hodgkin lymphoma.Keywords: environmental exposure, detection limit, left truncation, bias, ad-hoc substitution
Procedia PDF Downloads 23618673 Non-Methane Hydrocarbons Emission during the Photocopying Process
Authors: Kiurski S. Jelena, Aksentijević M. Snežana, Kecić S. Vesna, Oros B. Ivana
Abstract:
The prosperity of electronic equipment in photocopying environment not only has improved work efficiency, but also has changed indoor air quality. Considering the number of photocopying employed, indoor air quality might be worse than in general office environments. Determining the contribution from any type of equipment to indoor air pollution is a complex matter. Non-methane hydrocarbons are known to have an important role of air quality due to their high reactivity. The presence of hazardous pollutants in indoor air has been detected in one photocopying shop in Novi Sad, Serbia. Air samples were collected and analyzed for five days, during 8-hr working time in three-time intervals, whereas three different sampling points were determined. Using multiple linear regression model and software package STATISTICA 10 the concentrations of occupational hazards and micro-climates parameters were mutually correlated. Based on the obtained multiple coefficients of determination (0.3751, 0.2389, and 0.1975), a weak positive correlation between the observed variables was determined. Small values of parameter F indicated that there was no statistically significant difference between the concentration levels of non-methane hydrocarbons and micro-climates parameters. The results showed that variable could be presented by the general regression model: y = b0 + b1xi1+ b2xi2. Obtained regression equations allow to measure the quantitative agreement between the variation of variables and thus obtain more accurate knowledge of their mutual relations.Keywords: non-methane hydrocarbons, photocopying process, multiple regression analysis, indoor air quality, pollutant emission
Procedia PDF Downloads 37818672 Free Fatty Acid Assessment of Crude Palm Oil Using a Non-Destructive Approach
Authors: Siti Nurhidayah Naqiah Abdull Rani, Herlina Abdul Rahim, Rashidah Ghazali, Noramli Abdul Razak
Abstract:
Near infrared (NIR) spectroscopy has always been of great interest in the food and agriculture industries. The development of prediction models has facilitated the estimation process in recent years. In this study, 110 crude palm oil (CPO) samples were used to build a free fatty acid (FFA) prediction model. 60% of the collected data were used for training purposes and the remaining 40% used for testing. The visible peaks on the NIR spectrum were at 1725 nm and 1760 nm, indicating the existence of the first overtone of C-H bands. Principal component regression (PCR) was applied to the data in order to build this mathematical prediction model. The optimal number of principal components was 10. The results showed R2=0.7147 for the training set and R2=0.6404 for the testing set.Keywords: palm oil, fatty acid, NIRS, regression
Procedia PDF Downloads 50718671 Numerical Investigation of Entropy Signatures in Fluid Turbulence: Poisson Equation for Pressure Transformation from Navier-Stokes Equation
Authors: Samuel Ahamefula Mba
Abstract:
Fluid turbulence is a complex and nonlinear phenomenon that occurs in various natural and industrial processes. Understanding turbulence remains a challenging task due to its intricate nature. One approach to gain insights into turbulence is through the study of entropy, which quantifies the disorder or randomness of a system. This research presents a numerical investigation of entropy signatures in fluid turbulence. The work is to develop a numerical framework to describe and analyse fluid turbulence in terms of entropy. This decomposes the turbulent flow field into different scales, ranging from large energy-containing eddies to small dissipative structures, thus establishing a correlation between entropy and other turbulence statistics. This entropy-based framework provides a powerful tool for understanding the underlying mechanisms driving turbulence and its impact on various phenomena. This work necessitates the derivation of the Poisson equation for pressure transformation of Navier-Stokes equation and using Chebyshev-Finite Difference techniques to effectively resolve it. To carry out the mathematical analysis, consider bounded domains with smooth solutions and non-periodic boundary conditions. To address this, a hybrid computational approach combining direct numerical simulation (DNS) and Large Eddy Simulation with Wall Models (LES-WM) is utilized to perform extensive simulations of turbulent flows. The potential impact ranges from industrial process optimization and improved prediction of weather patterns.Keywords: turbulence, Navier-Stokes equation, Poisson pressure equation, numerical investigation, Chebyshev-finite difference, hybrid computational approach, large Eddy simulation with wall models, direct numerical simulation
Procedia PDF Downloads 9418670 Multicollinearity and MRA in Sustainability: Application of the Raise Regression
Authors: Claudia García-García, Catalina B. García-García, Román Salmerón-Gómez
Abstract:
Much economic-environmental research includes the analysis of possible interactions by using Moderated Regression Analysis (MRA), which is a specific application of multiple linear regression analysis. This methodology allows analyzing how the effect of one of the independent variables is moderated by a second independent variable by adding a cross-product term between them as an additional explanatory variable. Due to the very specification of the methodology, the moderated factor is often highly correlated with the constitutive terms. Thus, great multicollinearity problems arise. The appearance of strong multicollinearity in a model has important consequences. Inflated variances of the estimators may appear, there is a tendency to consider non-significant regressors that they probably are together with a very high coefficient of determination, incorrect signs of our coefficients may appear and also the high sensibility of the results to small changes in the dataset. Finally, the high relationship among explanatory variables implies difficulties in fixing the individual effects of each one on the model under study. These consequences shifted to the moderated analysis may imply that it is not worth including an interaction term that may be distorting the model. Thus, it is important to manage the problem with some methodology that allows for obtaining reliable results. After a review of those works that applied the MRA among the ten top journals of the field, it is clear that multicollinearity is mostly disregarded. Less than 15% of the reviewed works take into account potential multicollinearity problems. To overcome the issue, this work studies the possible application of recent methodologies to MRA. Particularly, the raised regression is analyzed. This methodology mitigates collinearity from a geometrical point of view: the collinearity problem arises because the variables under study are very close geometrically, so by separating both variables, the problem can be mitigated. Raise regression maintains the available information and modifies the problematic variables instead of deleting variables, for example. Furthermore, the global characteristics of the initial model are also maintained (sum of squared residuals, estimated variance, coefficient of determination, global significance test and prediction). The proposal is implemented to data from countries of the European Union during the last year available regarding greenhouse gas emissions, per capita GDP and a dummy variable that represents the topography of the country. The use of a dummy variable as the moderator is a special variant of MRA, sometimes called “subgroup regression analysis.” The main conclusion of this work is that applying new techniques to the field can improve in a substantial way the results of the analysis. Particularly, the use of raised regression mitigates great multicollinearity problems, so the researcher is able to rely on the interaction term when interpreting the results of a particular study.Keywords: multicollinearity, MRA, interaction, raise
Procedia PDF Downloads 10418669 Using the Bootstrap for Problems Statistics
Authors: Brahim Boukabcha, Amar Rebbouh
Abstract:
The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models
Procedia PDF Downloads 38018668 A Machine Learning Model for Predicting Students’ Academic Performance in Higher Institutions
Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu
Abstract:
There has been a need in recent years to predict student academic achievement prior to graduation. This is to assist them in improving their grades, especially for those who have struggled in the past. The purpose of this research is to use supervised learning techniques to create a model that predicts student academic progress. Many scholars have developed models that predict student academic achievement based on characteristics including smoking, demography, culture, social media, parent educational background, parent finances, and family background, to mention a few. This element, as well as the model used, could have misclassified the kids in terms of their academic achievement. As a prerequisite to predicting if the student will perform well in the future on related courses, this model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester. With a 96.7 percent accuracy, the model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost. This model is offered as a desktop application with user-friendly interfaces for forecasting student academic progress for both teachers and students. As a result, both students and professors are encouraged to use this technique to predict outcomes better.Keywords: artificial intelligence, ML, logistic regression, performance, prediction
Procedia PDF Downloads 10918667 Binary Logistic Regression Model in Predicting the Employability of Senior High School Graduates
Authors: Cromwell F. Gopo, Joy L. Picar
Abstract:
This study aimed to predict the employability of senior high school graduates for S.Y. 2018- 2019 in the Davao del Norte Division through quantitative research design using the descriptive status and predictive approaches among the indicated parameters, namely gender, school type, academics, academic award recipient, skills, values, and strand. The respondents of the study were the 33 secondary schools offering senior high school programs identified through simple random sampling, which resulted in 1,530 cases of graduates’ secondary data, which were analyzed using frequency, percentage, mean, standard deviation, and binary logistic regression. Results showed that the majority of the senior high school graduates who come from large schools were females. Further, less than half of these graduates received any academic award in any semester. In general, the graduates’ performance in academics, skills, and values were proficient. Moreover, less than half of the graduates were not employed. Then, those who were employed were either contractual, casual, or part-time workers dominated by GAS graduates. Further, the predictors of employability were gender and the Information and Communications Technology (ICT) strand, while the remaining variables did not add significantly to the model. The null hypothesis had been rejected as the coefficients of the predictors in the binary logistic regression equation did not take the value of 0. After utilizing the model, it was concluded that Technical-Vocational-Livelihood (TVL) graduates except ICT had greater estimates of employability.Keywords: employability, senior high school graduates, Davao del Norte, Philippines
Procedia PDF Downloads 15218666 Model Averaging in a Multiplicative Heteroscedastic Model
Authors: Alan Wan
Abstract:
In recent years, the body of literature on frequentist model averaging in statistics has grown significantly. Most of this work focuses on models with different mean structures but leaves out the variance consideration. In this paper, we consider a regression model with multiplicative heteroscedasticity and develop a model averaging method that combines maximum likelihood estimators of unknown parameters in both the mean and variance functions of the model. Our weight choice criterion is based on a minimisation of a plug-in estimator of the model average estimator's squared prediction risk. We prove that the new estimator possesses an asymptotic optimality property. Our investigation of finite-sample performance by simulations demonstrates that the new estimator frequently exhibits very favourable properties compared to some existing heteroscedasticity-robust model average estimators. The model averaging method hedges against the selection of very bad models and serves as a remedy to variance function misspecification, which often discourages practitioners from modeling heteroscedasticity altogether. The proposed model average estimator is applied to the analysis of two real data sets.Keywords: heteroscedasticity-robust, model averaging, multiplicative heteroscedasticity, plug-in, squared prediction risk
Procedia PDF Downloads 384