Search results for: Tobit regression model
18656 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison
Authors: Xiangtuo Chen, Paul-Henry Cournéde
Abstract:
Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest
Procedia PDF Downloads 23118655 Qsar Studies of Certain Novel Heterocycles Derived From bis-1, 2, 4 Triazoles as Anti-Tumor Agents
Authors: Madhusudan Purohit, Stephen Philip, Bharathkumar Inturi
Abstract:
In this paper we report the quantitative structure activity relationship of novel bis-triazole derivatives for predicting the activity profile. The full model encompassed a dataset of 46 Bis- triazoles. Tripos Sybyl X 2.0 program was used to conduct CoMSIA QSAR modeling. The Partial Least-Squares (PLS) analysis method was used to conduct statistical analysis and to derive a QSAR model based on the field values of CoMSIA descriptor. The compounds were divided into test and training set. The compounds were evaluated by various CoMSIA parameters to predict the best QSAR model. An optimum numbers of components were first determined separately by cross-validation regression for CoMSIA model, which were then applied in the final analysis. A series of parameters were used for the study and the best fit model was obtained using donor, partition coefficient and steric parameters. The CoMSIA models demonstrated good statistical results with regression coefficient (r2) and the cross-validated coefficient (q2) of 0.575 and 0.830 respectively. The standard error for the predicted model was 0.16322. In the CoMSIA model, the steric descriptors make a marginally larger contribution than the electrostatic descriptors. The finding that the steric descriptor is the largest contributor for the CoMSIA QSAR models is consistent with the observation that more than half of the binding site area is occupied by steric regions.Keywords: 3D QSAR, CoMSIA, triazoles, novel heterocycles
Procedia PDF Downloads 44418654 Modelling the Indonesian Goverment Securities Yield Curve Using Nelson-Siegel-Svensson and Support Vector Regression
Authors: Jamilatuzzahro, Rezzy Eko Caraka
Abstract:
The yield curve is the plot of the yield to maturity of zero-coupon bonds against maturity. In practice, the yield curve is not observed but must be extracted from observed bond prices for a set of (usually) incomplete maturities. There exist many methodologies and theory to analyze of yield curve. We use two methods (the Nelson-Siegel Method, the Svensson Method, and the SVR method) in order to construct and compare our zero-coupon yield curves. The objectives of this research were: (i) to study the adequacy of NSS model and SVR to Indonesian government bonds data, (ii) to choose the best optimization or estimation method for NSS model and SVR. To obtain that objective, this research was done by the following steps: data preparation, cleaning or filtering data, modeling, and model evaluation.Keywords: support vector regression, Nelson-Siegel-Svensson, yield curve, Indonesian government
Procedia PDF Downloads 24418653 Long-Term Indoor Air Monitoring for Students with Emphasis on Particulate Matter (PM2.5) Exposure
Authors: Seyedtaghi Mirmohammadi, Jamshid Yazdani, Syavash Etemadi Nejad
Abstract:
One of the main indoor air parameters in classrooms is dust pollution and it depends on the particle size and exposure duration. However, there is a lake of data about the exposure level to PM2.5 concentrations in rural area classrooms. The objective of the current study was exposure assessment for PM2.5 for students in the classrooms. One year monitoring was carried out for fifteen schools by time-series sampling to evaluate the indoor air PM2.5 in the rural district of Sari city, Iran. A hygrometer and thermometer were used to measure some psychrometric parameters (temperature, relative humidity, and wind speed) and Real-Time Dust Monitor, (MicroDust Pro, Casella, UK) was used to monitor particulate matters (PM2.5) concentration. The results show the mean indoor PM2.5 concentration in the studied classrooms was 135µg/m3. The regression model indicated that a positive correlation between indoor PM2.5 concentration and relative humidity, also with distance from city center and classroom size. Meanwhile, the regression model revealed that the indoor PM2.5 concentration, the relative humidity, and dry bulb temperature was significant at 0.05, 0.035, and 0.05 levels, respectively. A statistical predictive model was obtained from multiple regressions modeling for indoor PM2.5 concentration and indoor psychrometric parameters conditions.Keywords: classrooms, concentration, humidity, particulate matters, regression
Procedia PDF Downloads 33518652 Exploration and Evaluation of the Effect of Multiple Countermeasures on Road Safety
Authors: Atheer Al-Nuaimi, Harry Evdorides
Abstract:
Every day many people die or get disabled or injured on roads around the world, which necessitates more specific treatments for transportation safety issues. International road assessment program (iRAP) model is one of the comprehensive road safety models which accounting for many factors that affect road safety in a cost-effective way in low and middle income countries. In iRAP model road safety has been divided into five star ratings from 1 star (the lowest level) to 5 star (the highest level). These star ratings are based on star rating score which is calculated by iRAP methodology depending on road attributes, traffic volumes and operating speeds. The outcome of iRAP methodology are the treatments that can be used to improve road safety and reduce fatalities and serious injuries (FSI) numbers. These countermeasures can be used separately as a single countermeasure or mix as multiple countermeasures for a location. There is general agreement that the adequacy of a countermeasure is liable to consistent losses when it is utilized as a part of mix with different countermeasures. That is, accident diminishment appraisals of individual countermeasures cannot be easily added together. The iRAP model philosophy makes utilization of a multiple countermeasure adjustment factors to predict diminishments in the effectiveness of road safety countermeasures when more than one countermeasure is chosen. A multiple countermeasure correction factors are figured for every 100-meter segment and for every accident type. However, restrictions of this methodology incorporate a presumable over-estimation in the predicted crash reduction. This study aims to adjust this correction factor by developing new models to calculate the effect of using multiple countermeasures on the number of fatalities for a location or an entire road. Regression models have been used to establish relationships between crash frequencies and the factors that affect their rates. Multiple linear regression, negative binomial regression, and Poisson regression techniques were used to develop models that can address the effectiveness of using multiple countermeasures. Analyses are conducted using The R Project for Statistical Computing showed that a model developed by negative binomial regression technique could give more reliable results of the predicted number of fatalities after the implementation of road safety multiple countermeasures than the results from iRAP model. The results also showed that the negative binomial regression approach gives more precise results in comparison with multiple linear and Poisson regression techniques because of the overdispersion and standard error issues.Keywords: international road assessment program, negative binomial, road multiple countermeasures, road safety
Procedia PDF Downloads 24018651 A Research on Tourism Market Forecast and Its Evaluation
Authors: Min Wei
Abstract:
The traditional prediction methods of the forecast for tourism market are paid more attention to the accuracy of the forecasts, ignoring the results of the feasibility of forecasting and predicting operability, which had made it difficult to predict the results of scientific testing. With the application of Linear Regression Model, this paper attempts to construct a scientific evaluation system for predictive value, both to ensure the accuracy, stability of the predicted value, and to ensure the feasibility of forecasting and predicting the results of operation. The findings show is that a scientific evaluation system can implement the scientific concept of development, the harmonious development of man and nature co-ordinate.Keywords: linear regression model, tourism market, forecast, tourism economics
Procedia PDF Downloads 33218650 The Impact of Corporate Governance Mechanisms on Dividend Policy
Authors: Tahar Tayachi, Ahlam Alrehaili
Abstract:
Purpose: The purpose of this paper is to investigate the relationship between the corporate board characteristics and the dividend policy among firms on the Saudi Stock Exchange. Design/Methodology/Approach: This paper uses a sample of 103 nonfinancial firms over a time period of 4 years from 2015 to 2018. To investigate how corporate governance mechanisms such as board independence, the board size, frequency of meetings, and free cash flow impact dividends, the study uses Logit and Tobit models. Findings: This paper finds that board size, board independence, and frequency of board meetings have no influence on a firm’s decision to pay dividends, while board size has a significantly positive impact on the levels of cash dividends paid to investors. This study also finds that the level of free cash flows has a positively significant influence on both the decision to pay dividends and the magnitude of dividend payouts. Research Limitations/Implications: This paper attempts to study the effectiveness of dividend policy among some firms on the Saudi Stock Exchange. Practical Implications: The findings reveal that board characteristics, which represent one of the crucial mechanisms of corporate governance, were found to be complementary to corporate laws and regulations imposed on the Saudi market in 2015. The findings also imply that capital market authorities should revise their corporate regulations and ensure that protection laws are adequate and strong enough to protect the interests of all shareholders. Originality/Value: This paper is among the few studies focusing on dividend policy in Saudi Arabia. Finally, these findings suggest that the improvements in corporate laws in Saudi Arabia led to such an outcome, and it has become prevalent in dividend policy decisions and behaviors of Saudi firms.Keywords: agency theory, Tobit, corporate governance, dividend payout, Logit
Procedia PDF Downloads 20418649 A Statistical Approach to Predict and Classify the Commercial Hatchability of Chickens Using Extrinsic Parameters of Breeders and Eggs
Authors: M. S. Wickramarachchi, L. S. Nawarathna, C. M. B. Dematawewa
Abstract:
Hatchery performance is critical for the profitability of poultry breeder operations. Some extrinsic parameters of eggs and breeders cause to increase or decrease the hatchability. This study aims to identify the affecting extrinsic parameters on the commercial hatchability of local chicken's eggs and determine the most efficient classification model with a hatchability rate greater than 90%. In this study, seven extrinsic parameters were considered: egg weight, moisture loss, breeders age, number of fertilised eggs, shell width, shell length, and shell thickness. Multiple linear regression was performed to determine the most influencing variable on hatchability. First, the correlation between each parameter and hatchability were checked. Then a multiple regression model was developed, and the accuracy of the fitted model was evaluated. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF) algorithms were applied to classify the hatchability. This grouping process was conducted using binary classification techniques. Hatchability was negatively correlated with egg weight, breeders' age, shell width, shell length, and positive correlations were identified with moisture loss, number of fertilised eggs, and shell thickness. Multiple linear regression models were more accurate than single linear models regarding the highest coefficient of determination (R²) with 94% and minimum AIC and BIC values. According to the classification results, RF, CART, and kNN had performed the highest accuracy values 0.99, 0.975, and 0.972, respectively, for the commercial hatchery process. Therefore, the RF is the most appropriate machine learning algorithm for classifying the breeder outcomes, which are economically profitable or not, in a commercial hatchery.Keywords: classification models, egg weight, fertilised eggs, multiple linear regression
Procedia PDF Downloads 8718648 Forecasting Equity Premium Out-of-Sample with Sophisticated Regression Training Techniques
Authors: Jonathan Iworiso
Abstract:
Forecasting the equity premium out-of-sample is a major concern to researchers in finance and emerging markets. The quest for a superior model that can forecast the equity premium with significant economic gains has resulted in several controversies on the choice of variables and suitable techniques among scholars. This research focuses mainly on the application of Regression Training (RT) techniques to forecast monthly equity premium out-of-sample recursively with an expanding window method. A broad category of sophisticated regression models involving model complexity was employed. The RT models include Ridge, Forward-Backward (FOBA) Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), Relaxed LASSO, Elastic Net, and Least Angle Regression were trained and used to forecast the equity premium out-of-sample. In this study, the empirical investigation of the RT models demonstrates significant evidence of equity premium predictability both statistically and economically relative to the benchmark historical average, delivering significant utility gains. They seek to provide meaningful economic information on mean-variance portfolio investment for investors who are timing the market to earn future gains at minimal risk. Thus, the forecasting models appeared to guarantee an investor in a market setting who optimally reallocates a monthly portfolio between equities and risk-free treasury bills using equity premium forecasts at minimal risk.Keywords: regression training, out-of-sample forecasts, expanding window, statistical predictability, economic significance, utility gains
Procedia PDF Downloads 10718647 Employee Aggression, Labeling and Emotional Intelligence
Authors: Martin Popescu D. Dana Maria
Abstract:
The aims of this research are to broaden the study on the relationship between emotional intelligence and counterproductive work behavior (CWB). The study sample consisted in 441 Romanian employees from companies all over the country. Data has been collected through web surveys and processed with SPSS. The results indicated an average correlation between the two constructs and their sub variables, employees with a high level of emotional intelligence tend to be less aggressive. In addition, labeling was considered an individual difference which has the power to influence the level of employee aggression. A regression model was used to underline the importance of emotional intelligence together with labeling as predictors of CWB. Results have shown that this regression model enforces the assumption that labeling and emotional intelligence, taken together, predict CWB. Employees, who label themselves as victims and have a low degree of emotional intelligence, have a higher level of CWB.Keywords: aggression, CWB, emotional intelligence, labeling
Procedia PDF Downloads 47318646 Applying the Regression Technique for Prediction of the Acute Heart Attack
Authors: Paria Soleimani, Arezoo Neshati
Abstract:
Myocardial infarction is one of the leading causes of death in the world. Some of these deaths occur even before the patient reaches the hospital. Myocardial infarction occurs as a result of impaired blood supply. Because the most of these deaths are due to coronary artery disease, hence the awareness of the warning signs of a heart attack is essential. Some heart attacks are sudden and intense, but most of them start slowly, with mild pain or discomfort, then early detection and successful treatment of these symptoms is vital to save them. Therefore, importance and usefulness of a system designing to assist physicians in the early diagnosis of the acute heart attacks is obvious. The purpose of this study is to determine how well a predictive model would perform based on the only patient-reportable clinical history factors, without using diagnostic tests or physical exams. This type of the prediction model might have application outside of the hospital setting to give accurate advice to patients to influence them to seek care in appropriate situations. For this purpose, the data were collected on 711 heart patients in Iran hospitals. 28 attributes of clinical factors can be reported by patients; were studied. Three logistic regression models were made on the basis of the 28 features to predict the risk of heart attacks. The best logistic regression model in terms of performance had a C-index of 0.955 and with an accuracy of 94.9%. The variables, severe chest pain, back pain, cold sweats, shortness of breath, nausea, and vomiting were selected as the main features.Keywords: Coronary heart disease, Acute heart attacks, Prediction, Logistic regression
Procedia PDF Downloads 44918645 Modelling Agricultural Commodity Price Volatility with Markov-Switching Regression, Single Regime GARCH and Markov-Switching GARCH Models: Empirical Evidence from South Africa
Authors: Yegnanew A. Shiferaw
Abstract:
Background: commodity price volatility originating from excessive commodity price fluctuation has been a global problem especially after the recent financial crises. Volatility is a measure of risk or uncertainty in financial analysis. It plays a vital role in risk management, portfolio management, and pricing equity. Objectives: the core objective of this paper is to examine the relationship between the prices of agricultural commodities with oil price, gas price, coal price and exchange rate (USD/Rand). In addition, the paper tries to fit an appropriate model that best describes the log return price volatility and estimate Value-at-Risk and expected shortfall. Data and methods: the data used in this study are the daily returns of agricultural commodity prices from 02 January 2007 to 31st October 2016. The data sets consists of the daily returns of agricultural commodity prices namely: white maize, yellow maize, wheat, sunflower, soya, corn, and sorghum. The paper applies the three-state Markov-switching (MS) regression, the standard single-regime GARCH and the two regime Markov-switching GARCH (MS-GARCH) models. Results: to choose the best fit model, the log-likelihood function, Akaike information criterion (AIC), Bayesian information criterion (BIC) and deviance information criterion (DIC) are employed under three distributions for innovations. The results indicate that: (i) the price of agricultural commodities was found to be significantly associated with the price of coal, price of natural gas, price of oil and exchange rate, (ii) for all agricultural commodities except sunflower, k=3 had higher log-likelihood values and lower AIC and BIC values. Thus, the three-state MS regression model outperformed the two-state MS regression model (iii) MS-GARCH(1,1) with generalized error distribution (ged) innovation performs best for white maize and yellow maize; MS-GARCH(1,1) with student-t distribution (std) innovation performs better for sorghum; MS-gjrGARCH(1,1) with ged innovation performs better for wheat, sunflower and soya and MS-GARCH(1,1) with std innovation performs better for corn. In conclusion, this paper provided a practical guide for modelling agricultural commodity prices by MS regression and MS-GARCH processes. This paper can be good as a reference when facing modelling agricultural commodity price problems.Keywords: commodity prices, MS-GARCH model, MS regression model, South Africa, volatility
Procedia PDF Downloads 20218644 Institutional Capacity and Corruption: Evidence from Brazil
Authors: Dalson Figueiredo, Enivaldo Rocha, Ranulfo Paranhos, José Alexandre
Abstract:
This paper analyzes the effects of institutional capacity on corruption. Methodologically, the research design combines both descriptive and multivariate statistics to examine two original datasets based on secondary data. In particular, we employ a principal component model to estimate an indicator of institutional capacity for both state audit institutions and subnational judiciary courts. Then, we estimate the effect of institutional capacity on two dependent variables: (1) incidence of administrative irregularities and (2) time elapsed to judge corruption cases. The preliminary results using ordinary least squares, negative binomial and Tobit models suggest the same conclusions: higher the institutional audit capacity, higher is the probability of detecting a corruption case. On the other hand, higher the institutional capacity of state judiciary, the lower is the time to judge corruption cases.Keywords: institutional capacity, corruption, state level institutions, evidence from Brazil
Procedia PDF Downloads 37218643 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size
Authors: Jude Opara, Esemokumo Perewarebo Akpos
Abstract:
This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS
Procedia PDF Downloads 30518642 Efficiency of Secondary Schools by ICT Intervention in Sylhet Division of Bangladesh
Authors: Azizul Baten, Kamrul Hossain, Abdullah-Al-Zabir
Abstract:
The objective of this study is to develop an appropriate stochastic frontier secondary schools efficiency model by ICT Intervention and to examine the impact of ICT challenges on secondary schools efficiency in the Sylhet division in Bangladesh using stochastic frontier analysis. The Translog stochastic frontier model was found an appropriate than the Cobb-Douglas model in secondary schools efficiency by ICT Intervention. Based on the results of the Cobb-Douglas model, it is found that the coefficient of the number of teachers, the number of students, and teaching ability had a positive effect on increasing the level of efficiency. It indicated that these are related to technical efficiency. In the case of inefficiency effects for both Cobb-Douglas and Translog models, the coefficient of the ICT lab decreased secondary school inefficiency, but the online class in school was found to increase the level of inefficiency. The coefficients of teacher’s preference for ICT tools like multimedia projectors played a contributor role in decreasing the secondary school inefficiency in the Sylhet division of Bangladesh. The interaction effects of the number of teachers and the classrooms, and the number of students and the number of classrooms, the number of students and teaching ability, and the classrooms and teaching ability of the teachers were recorded with the positive values and these have a positive impact on increasing the secondary school efficiency. The overall mean efficiency of urban secondary schools was found at 84.66% for the Translog model, while it was 83.63% for the Cobb-Douglas model. The overall mean efficiency of rural secondary schools was found at 80.98% for the Translog model, while it was 81.24% for the Cobb-Douglas model. So, the urban secondary schools performed better than the rural secondary schools in the Sylhet division. It is observed from the results of the Tobit model that the teacher-student ratio had a positive influence on secondary school efficiency. The teaching experiences of those who have 1 to 5 years and 10 years above, MPO type school, conventional teaching method have had a negative and significant influence on secondary school efficiency. The estimated value of σ-square (0.0625) was different from Zero, indicating a good fit. The value of γ (0.9872) was recorded as positive and it can be interpreted as follows: 98.72 percent of random variation around in secondary school outcomes due to inefficiency.Keywords: efficiency, secondary schools, ICT, stochastic frontier analysis
Procedia PDF Downloads 15118641 Association Between Short-term NOx Exposure and Asthma Exacerbations in East London: A Time Series Regression Model
Authors: Hajar Hajmohammadi, Paul Pfeffer, Anna De Simoni, Jim Cole, Chris Griffiths, Sally Hull, Benjamin Heydecker
Abstract:
Background: There is strong interest in the relationship between short-term air pollution exposure and human health. Most studies in this field focus on serious health effects such as death or hospital admission, but air pollution exposure affects many people with less severe impacts, such as exacerbations of respiratory conditions. A lack of quantitative analysis and inconsistent findings suggest improved methodology is needed to understand these effectsmore fully. Method: We developed a time series regression model to quantify the relationship between daily NOₓ concentration and Asthma exacerbations requiring oral steroids from primary care settings. Explanatory variables include daily NOₓ concentration measurements extracted from 8 available background and roadside monitoring stations in east London and daily ambient temperature extracted for London City Airport, located in east London. Lags of NOx concentrations up to 21 days (3 weeks) were used in the model. The dependent variable was the daily number of oral steroid courses prescribed for GP registered patients with asthma in east London. A mixed distribution model was then fitted to the significant lags of the regression model. Result: Results of the time series modelling showed a significant relationship between NOₓconcentrations on each day and the number of oral steroid courses prescribed in the following three weeks. In addition, the model using only roadside stations performs better than the model with a mixture of roadside and background stations.Keywords: air pollution, time series modeling, public health, road transport
Procedia PDF Downloads 14218640 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors
Authors: Katawut Kaewbanjong
Abstract:
We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.Keywords: prediction model, statistical analysis, software project, user satisfaction factor
Procedia PDF Downloads 12418639 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models
Authors: Jihye Jeon
Abstract:
This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.Keywords: multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon
Procedia PDF Downloads 65218638 An Assessment into the Drift in Direction of International Migration of Labor: Changing Aspirations for Religiosity and Cultural Assimilation
Authors: Syed Toqueer Akhter, Rabia Zulfiqar
Abstract:
This paper attempts to trace the determining factor- as far as individual preferences and expectations are concerned- of what causes the direction of international migration to drift in certain ways owing to factors such as Religiosity and Cultural Assimilation. The narrative on migration has graduated from the age long ‘push/pull’ debate to that of complex factors that may vary across each individual. We explore the longstanding factor of religiosity widely acknowledged in mentioned literature as a key variable in the assessment of migration, wherein the impact of religiosity in the form of a drift into the intent of migration has been analyzed. A more conventional factor cultural assimilation is used in a contemporary way to estimate how it plays a role in affecting the drift in direction. In particular what our research aims at achieving is to isolate the effect our key variables: Cultural Assimilation and Religiosity have on direction of migration, and to explore how they interplay as a composite unit- and how we may be able to justify the change in behavior displayed by these key variables. In order to establish a true sense of what drives individual choices we employ the method of survey research and use a questionnaire to conduct primary research. The questionnaire was divided into six sections covering subjects including household characteristics, perceptions and inclinations of the respondents relevant to our study. Religiosity was quantified using a proxy of Migration Network that utilized secondary data to estimate religious hubs in recipient countries. To estimate the relationship between Intent of Migration and its variants three competing econometric models namely: the Ordered Probit Model, the Ordered Logit Model and the Tobit Model were employed. For every model that included our key variables, a highly significant relationship with the intent of migration was estimated.Keywords: international migration, drift in direction, cultural assimilation, religiosity, ordered probit model
Procedia PDF Downloads 30718637 Predicting Options Prices Using Machine Learning
Authors: Krishang Surapaneni
Abstract:
The goal of this project is to determine how to predict important aspects of options, including the ask price. We want to compare different machine learning models to learn the best model and the best hyperparameters for that model for this purpose and data set. Option pricing is a relatively new field, and it can be very complicated and intimidating, especially to inexperienced people, so we want to create a machine learning model that can predict important aspects of an option stock, which can aid in future research. We tested multiple different models and experimented with hyperparameter tuning, trying to find some of the best parameters for a machine-learning model. We tested three different models: a Random Forest Regressor, a linear regressor, and an MLP (multi-layer perceptron) regressor. The most important feature in this experiment is the ask price; this is what we were trying to predict. In the field of stock pricing prediction, there is a large potential for error, so we are unable to determine the accuracy of the models based on if they predict the pricing perfectly. Due to this factor, we determined the accuracy of the model by finding the average percentage difference between the predicted and actual values. We tested the accuracy of the machine learning models by comparing the actual results in the testing data and the predictions made by the models. The linear regression model performed worst, with an average percentage error of 17.46%. The MLP regressor had an average percentage error of 11.45%, and the random forest regressor had an average percentage error of 7.42%Keywords: finance, linear regression model, machine learning model, neural network, stock price
Procedia PDF Downloads 7518636 Estimating Anthropometric Dimensions for Saudi Males Using Artificial Neural Networks
Authors: Waleed Basuliman
Abstract:
Anthropometric dimensions are considered one of the important factors when designing human-machine systems. In this study, the estimation of anthropometric dimensions has been improved by using Artificial Neural Network (ANN) model that is able to predict the anthropometric measurements of Saudi males in Riyadh City. A total of 1427 Saudi males aged 6 to 60 years participated in measuring 20 anthropometric dimensions. These anthropometric measurements are considered important for designing the work and life applications in Saudi Arabia. The data were collected during eight months from different locations in Riyadh City. Five of these dimensions were used as predictors variables (inputs) of the model, and the remaining 15 dimensions were set to be the measured variables (Model’s outcomes). The hidden layers varied during the structuring stage, and the best performance was achieved with the network structure 6-25-15. The results showed that the developed Neural Network model was able to estimate the body dimensions of Saudi male population in Riyadh City. The network's mean absolute percentage error (MAPE) and the root mean squared error (RMSE) were found to be 0.0348 and 3.225, respectively. These results were found less, and then better, than the errors found in the literature. Finally, the accuracy of the developed neural network was evaluated by comparing the predicted outcomes with regression model. The ANN model showed higher coefficient of determination (R2) between the predicted and actual dimensions than the regression model.Keywords: artificial neural network, anthropometric measurements, back-propagation
Procedia PDF Downloads 48718635 Heart Attack Prediction Using Several Machine Learning Methods
Authors: Suzan Anwar, Utkarsh Goyal
Abstract:
Heart rate (HR) is a predictor of cardiovascular, cerebrovascular, and all-cause mortality in the general population, as well as in patients with cardio and cerebrovascular diseases. Machine learning (ML) significantly improves the accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment while avoiding unnecessary treatment of others. This research examines relationship between the individual's various heart health inputs like age, sex, cp, trestbps, thalach, oldpeaketc, and the likelihood of developing heart disease. Machine learning techniques like logistic regression and decision tree, and Python are used. The results of testing and evaluating the model using the Heart Failure Prediction Dataset show the chance of a person having a heart disease with variable accuracy. Logistic regression has yielded an accuracy of 80.48% without data handling. With data handling (normalization, standardscaler), the logistic regression resulted in improved accuracy of 87.80%, decision tree 100%, random forest 100%, and SVM 100%.Keywords: heart rate, machine learning, SVM, decision tree, logistic regression, random forest
Procedia PDF Downloads 13818634 Policy Implications of Demographic Impacts on COVID-19, Pneumonia, and Influenza Mortality: A Multivariable Regression Approach to Death Toll Reduction
Authors: Saiakhil Chilaka
Abstract:
Understanding the demographic factors that influence mortality from respiratory diseases like COVID-19, pneumonia, and influenza is crucial for informing public health policy. This study utilizes multivariable regression models to assess the relationship between state, sex, and age group on deaths from these diseases using U.S. data from 2020 to 2023. The analysis reveals that age and sex play significant roles in mortality, while state-level variations are minimal. Although the model’s low R-squared values indicate that additional factors are at play, this paper discusses how these findings, in light of recent research, can inform future public health policy, resource allocation, and intervention strategies.Keywords: COVID-19, multivariable regression, public policy, data science
Procedia PDF Downloads 2118633 Multi-Linear Regression Based Prediction of Mass Transfer by Multiple Plunging Jets
Abstract:
The paper aims to compare the performance of vertical and inclined multiple plunging jets and to model and predict their mass transfer capacity by multi-linear regression based approach. The multiple vertical plunging jets have jet impact angle of θ = 90O; whereas, multiple inclined plunging jets have jet impact angle of θ = 600. The results of the study suggests that mass transfer is higher for multiple jets, and inclined multiple plunging jets have up to 1.6 times higher mass transfer than vertical multiple plunging jets under similar conditions. The derived relationship, based on multi-linear regression approach, has successfully predicted the volumetric mass transfer coefficient (KLa) from operational parameters of multiple plunging jets with a correlation coefficient of 0.973, root mean square error of 0.002 and coefficient of determination of 0.946. The results suggests that predicted overall mass transfer coefficient is in good agreement with actual experimental values; thereby suggesting the utility of derived relationship based on multi-linear regression based approach and can be successfully employed in modelling mass transfer by multiple plunging jets.Keywords: mass transfer, multiple plunging jets, multi-linear regression, earth sciences
Procedia PDF Downloads 46118632 Robust Shrinkage Principal Component Parameter Estimator for Combating Multicollinearity and Outliers’ Problems in a Poisson Regression Model
Authors: Arum Kingsley Chinedu, Ugwuowo Fidelis Ifeanyi, Oranye Henrietta Ebele
Abstract:
The Poisson regression model (PRM) is a nonlinear model that belongs to the exponential family of distribution. PRM is suitable for studying count variables using appropriate covariates and sometimes experiences the problem of multicollinearity in the explanatory variables and outliers on the response variable. This study aims to address the problem of multicollinearity and outliers jointly in a Poisson regression model. We developed an estimator called the robust modified jackknife PCKL parameter estimator by combining the principal component estimator, modified jackknife KL and transformed M-estimator estimator to address both problems in a PRM. The superiority conditions for this estimator were established, and the properties of the estimator were also derived. The estimator inherits the characteristics of the combined estimators, thereby making it efficient in addressing both problems. And will also be of immediate interest to the research community and advance this study in terms of novelty compared to other studies undertaken in this area. The performance of the estimator (robust modified jackknife PCKL) with other existing estimators was compared using mean squared error (MSE) as a performance evaluation criterion through a Monte Carlo simulation study and the use of real-life data. The results of the analytical study show that the estimator outperformed other existing estimators compared with by having the smallest MSE across all sample sizes, different levels of correlation, percentages of outliers and different numbers of explanatory variables.Keywords: jackknife modified KL, outliers, multicollinearity, principal component, transformed M-estimator.
Procedia PDF Downloads 6618631 Factors Related with Self-Care Behaviors among Iranian Type 2 Diabetic Patients: An Application of Health Belief Model
Authors: Ali Soroush, Mehdi Mirzaei Alavijeh, Touraj Ahmadi Jouybari, Fazel Zinat-Motlagh, Abbas Aghaei, Mari Ataee
Abstract:
Diabetes is a disease with long cardiovascular, renal, ophthalmic and neural complications. It is prevalent all around the world including Iran, and its prevalence is increasing. The aim of this study was to determine the factors related to self-care behavior based on health belief model among sample of Iranian diabetic patients. This cross-sectional study was conducted among 301 type 2 diabetic patients in Gachsaran, Iran. Data collection was based on an interview and the data were analyzed by SPSS version 20 using ANOVA, t-tests, Pearson correlation, and linear regression statistical tests at 95% significant level. Linear regression analyses showed the health belief model variables accounted for 29% of the variation in self-care behavior; and perceived severity and perceived self-efficacy are more influential predictors on self-care behavior among diabetic patients.Keywords: diabetes, patients, self-care behaviors, health belief model
Procedia PDF Downloads 46818630 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Models
Authors: Alam Ali, Ashok Kumar Pathak
Abstract:
Path analysis is a statistical technique used to evaluate the direct and indirect effects of variables in path models. One or more structural regression equations are used to estimate a series of parameters in path models to find the better fit of data. However, sometimes the assumptions of classical regression models, such as ordinary least squares (OLS), are violated by the nature of the data, resulting in insignificant direct and indirect effects of exogenous variables. This article aims to explore the effectiveness of a copula-based regression approach as an alternative to classical regression, specifically when variables are linked through an elliptical copula.Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique
Procedia PDF Downloads 4118629 Factors for Entry Timing Choices Using Principal Axis Factorial Analysis and Logistic Regression Model
Authors: C. M. Mat Isa, H. Mohd Saman, S. R. Mohd Nasir, A. Jaapar
Abstract:
International market expansion involves a strategic process of market entry decision through which a firm expands its operation from domestic to the international domain. Hence, entry timing choices require the needs to balance the early entry risks and the problems in losing opportunities as a result of late entry into a new market. Questionnaire surveys administered to 115 Malaysian construction firms operating in 51 countries worldwide have resulted in 39.1 percent response rate. Factor analysis was used to determine the most significant factors affecting entry timing choices of the firms to penetrate the international market. A logistic regression analysis used to examine the firms’ entry timing choices, indicates that the model has correctly classified 89.5 per cent of cases as late movers. The findings reveal that the most significant factor influencing the construction firms’ choices as late movers was the firm factor related to the firm’s international experience, resources, competencies and financing capacity. The study also offers valuable information to construction firms with intention to internationalize their businesses.Keywords: factors, early movers, entry timing choices, late movers, logistic regression model, principal axis factorial analysis, Malaysian construction firms
Procedia PDF Downloads 37818628 A Regression Model for Predicting Sugar Crystal Size in a Fed-Batch Vacuum Evaporative Crystallizer
Authors: Sunday B. Alabi, Edikan P. Felix, Aniediong M. Umo
Abstract:
Crystal size distribution is of great importance in the sugar factories. It determines the market value of granulated sugar and also influences the cost of production of sugar crystals. Typically, sugar is produced using fed-batch vacuum evaporative crystallizer. The crystallization quality is examined by crystal size distribution at the end of the process which is quantified by two parameters: the average crystal size of the distribution in the mean aperture (MA) and the width of the distribution of the coefficient of variation (CV). Lack of real-time measurement of the sugar crystal size hinders its feedback control and eventual optimisation of the crystallization process. An attractive alternative is to use a soft sensor (model-based method) for online estimation of the sugar crystal size. Unfortunately, the available models for sugar crystallization process are not suitable as they do not contain variables that can be measured easily online. The main contribution of this paper is the development of a regression model for estimating the sugar crystal size as a function of input variables which are easy to measure online. This has the potential to provide real-time estimates of crystal size for its effective feedback control. Using 7 input variables namely: initial crystal size (Lo), temperature (T), vacuum pressure (P), feed flowrate (Ff), steam flowrate (Fs), initial super-saturation (S0) and crystallization time (t), preliminary studies were carried out using Minitab 14 statistical software. Based on the existing sugar crystallizer models, and the typical ranges of these 7 input variables, 128 datasets were obtained from a 2-level factorial experimental design. These datasets were used to obtain a simple but online-implementable 6-input crystal size model. It seems the initial crystal size (Lₒ) does not play a significant role. The goodness of the resulting regression model was evaluated. The coefficient of determination, R² was obtained as 0.994, and the maximum absolute relative error (MARE) was obtained as 4.6%. The high R² (~1.0) and the reasonably low MARE values are an indication that the model is able to predict sugar crystal size accurately as a function of the 6 easy-to-measure online variables. Thus, the model can be used as a soft sensor to provide real-time estimates of sugar crystal size during sugar crystallization process in a fed-batch vacuum evaporative crystallizer.Keywords: crystal size, regression model, soft sensor, sugar, vacuum evaporative crystallizer
Procedia PDF Downloads 20818627 Economic Analysis of Cowpea (Unguiculata spp) Production in Northern Nigeria: A Case Study of Kano Katsina and Jigawa States
Authors: Yakubu Suleiman, S. A. Musa
Abstract:
Nigeria is the largest cowpea producer in the world, accounting for about 45%, followed by Brazil with about 17%. Cowpea is grown in Kano, Bauchi, Katsina, Borno in the north, Oyo in the west, and to the lesser extent in Enugu in the east. This study was conducted to determine the input–output relationship of Cowpea production in Kano, Katsina, and Jigawa states of Nigeria. The data were collected with the aid of 1000 structured questionnaires that were randomly distributed to Cowpea farmers in the three states mentioned above of the study area. The data collected were analyzed using regression analysis (Cobb–Douglass production function model). The result of the regression analysis revealed the coefficient of multiple determinations, R2, to be 72.5% and the F ration to be 106.20 and was found to be significant (P < 0.01). The regression coefficient of constant is 0.5382 and is significant (P < 0.01). The regression coefficient with respect to labor and seeds were 0.65554 and 0.4336, respectively, and they are highly significant (P < 0.01). The regression coefficient with respect to fertilizer is 0.26341 which is significant (P < 0.05). This implies that a unit increase of any one of the variable inputs used while holding all other variables inputs constants, will significantly increase the total Cowpea output by their corresponding coefficient. This indicated that farmers in the study area are operating in stage II of the production function. The result revealed that Cowpea farmer in Kano, Jigawa and Katsina States realized a profit of N15,997, N34,016 and N19,788 per hectare respectively. It is hereby recommended that more attention should be given to Cowpea production by government and research institutions.Keywords: coefficient, constant, inputs, regression
Procedia PDF Downloads 410