Search results for: least squares regression
3341 Vertical and Horizontal Mismatches in Thailand and the Wage Penalty
Authors: Potjana Chunthanom
Abstract:
The Thai labor market experiences increasing challenges due to disruptive technologies and demographic shifts, including the COVID-19 pandemic. Consequently, there is a widening gap between the skills that firms seek and the skills that employees possess. This study aims to examine the incidence of vertical and horizontal mismatches and their impact on wages in Thailand before and during the COVID-19 pandemic using data from the third quarter of 2018 to 2021 from Thailand's National Labor Force Survey. This paper applies three methods: ordinary least squares (OLS), pooled ordinary least squares (Pooled OLS), and counterfactual decomposition. The findings suggest that the incidence of overeducation and field-of-study mismatch continues to increase during the COVID-19 pandemic in comparison to the two previous years. In contrast, there is a notable decline in the percentage of undereducated workers during the same period. Additionally, overeducated workers earn wage premiums, whereas undereducated and horizontally mismatched workers face wage penalties. The result also indicates that the COVID-19 pandemic has significant negative (positive) effects on overeducated (undereducated) workers.Keywords: COVID-19, horizontal mismatch, overeducation, undereducation
Procedia PDF Downloads 373340 Generalized Additive Model for Estimating Propensity Score
Authors: Tahmidul Islam
Abstract:
Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching
Procedia PDF Downloads 3653339 Econometric Analysis of Organic Vegetable Production in Turkey
Authors: Ersin Karakaya, Halit Tutar
Abstract:
Reliable foods must be consumed in terms of healthy nutrition. The production and dissemination of diatom products in Turkey is rapidly evolving on the basis of preserving ecological balance, ensuring sustainability in agriculture and offering quality, reliable products to consumers. In this study, year in Turkey as (2002- 2015) to determine values of such as cultivated land of organic vegetable production, production levels, production quantity, number of products, number of farmers. It is intended to make the econometric analysis of the factors affecting the production of organic vegetable production (Number of products, Number of farmers and cultivated land). The main material of the study has created secondary data in relation to the 2002-2015 period as organic vegetable production in Turkey and regression analysis of the factors affecting the value of production of organic vegetable is determined by the Least Squares Method with EViews statistical software package.Keywords: number of farmers, cultivated land, Eviews, Turkey
Procedia PDF Downloads 3053338 A Comparison of Neural Network and DOE-Regression Analysis for Predicting Resource Consumption of Manufacturing Processes
Authors: Frank Kuebler, Rolf Steinhilper
Abstract:
Artificial neural networks (ANN) as well as Design of Experiments (DOE) based regression analysis (RA) are mainly used for modeling of complex systems. Both methodologies are commonly applied in process and quality control of manufacturing processes. Due to the fact that resource efficiency has become a critical concern for manufacturing companies, these models needs to be extended to predict resource-consumption of manufacturing processes. This paper describes an approach to use neural networks as well as DOE based regression analysis for predicting resource consumption of manufacturing processes and gives a comparison of the achievable results based on an industrial case study of a turning process.Keywords: artificial neural network, design of experiments, regression analysis, resource efficiency, manufacturing process
Procedia PDF Downloads 5233337 Logistic Regression Model versus Additive Model for Recurrent Event Data
Authors: Entisar A. Elgmati
Abstract:
Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event
Procedia PDF Downloads 6333336 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data
Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone
Abstract:
This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression
Procedia PDF Downloads 1353335 An Analysis of the Effect of Sharia Financing and Work Relation Founding towards Non-Performing Financing in Islamic Banks in Indonesia
Authors: Muhammad Bahrul Ilmi
Abstract:
The purpose of this research is to analyze the influence of Islamic financing and work relation founding simultaneously and partially towards non-performing financing in Islamic banks. This research was regression quantitative field research, and had been done in Muammalat Indonesia Bank and Islamic Danamon Bank in 3 months. The populations of this research were 15 account officers of Muammalat Indonesia Bank and Islamic Danamon Bank in Surakarta, Indonesia. The techniques of collecting data used in this research were documentation, questionnaire, literary study and interview. Regression analysis result shows that Islamic financing and work relation founding simultaneously has positive and significant effect towards non performing financing of two Islamic Banks. It is obtained with probability value 0.003 which is less than 0.05 and F value 9.584. The analysis result of Islamic financing regression towards non performing financing shows the significant effect. It is supported by double linear regression analysis with probability value 0.001 which is less than 0.05. The regression analysis of work relation founding effect towards non-performing financing shows insignificant effect. This is shown in the double linear regression analysis with probability value 0.161 which is bigger than 0.05.Keywords: Syariah financing, work relation founding, non-performing financing (NPF), Islamic Bank
Procedia PDF Downloads 4303334 A Kolmogorov-Smirnov Type Goodness-Of-Fit Test of Multinomial Logistic Regression Model in Case-Control Studies
Authors: Chen Li-Ching
Abstract:
The multinomial logistic regression model is used popularly for inferring the relationship of risk factors and disease with multiple categories. This study based on the discrepancy between the nonparametric maximum likelihood estimator and semiparametric maximum likelihood estimator of the cumulative distribution function to propose a Kolmogorov-Smirnov type test statistic to assess adequacy of the multinomial logistic regression model for case-control data. A bootstrap procedure is presented to calculate the critical value of the proposed test statistic. Empirical type I error rates and powers of the test are performed by simulation studies. Some examples will be illustrated the implementation of the test.Keywords: case-control studies, goodness-of-fit test, Kolmogorov-Smirnov test, multinomial logistic regression
Procedia PDF Downloads 4553333 A Study on Inference from Distance Variables in Hedonic Regression
Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro
Abstract:
In urban area, several landmarks may affect housing price and rents, hedonic analysis should employ distance variables corresponding to each landmarks. Unfortunately, the effects of distances to landmarks on housing prices are generally not consistent with the true price. These distance variables may cause magnitude error in regression, pointing a problem of spatial multicollinearity. In this paper, we provided some approaches for getting the samples with less bias and method on locating the specific sampling area to avoid the multicollinerity problem in two specific landmarks case.Keywords: landmarks, hedonic regression, distance variables, collinearity, multicollinerity
Procedia PDF Downloads 4513332 Forecasting of Grape Juice Flavor by Using Support Vector Regression
Authors: Ren-Jieh Kuo, Chun-Shou Huang
Abstract:
The research of juice flavor forecasting has become more important in China. Due to the fast economic growth in China, many different kinds of juices have been introduced to the market. If a beverage company can understand their customers’ preference well, the juice can be served more attractively. Thus, this study intends to introduce the basic theory and computing process of grapes juice flavor forecasting based on support vector regression (SVR). Applying SVR, BPN and LR to forecast the flavor of grapes juice in real data, the result shows that SVR is more suitable and effective at predicting performance.Keywords: flavor forecasting, artificial neural networks, Support Vector Regression, China
Procedia PDF Downloads 4923331 Comprehensive Machine Learning-Based Glucose Sensing from Near-Infrared Spectra
Authors: Bitewulign Mekonnen
Abstract:
Context: This scientific paper focuses on the use of near-infrared (NIR) spectroscopy to determine glucose concentration in aqueous solutions accurately and rapidly. The study compares six different machine learning methods for predicting glucose concentration and also explores the development of a deep learning model for classifying NIR spectra. The objective is to optimize the detection model and improve the accuracy of glucose prediction. This research is important because it provides a comprehensive analysis of various machine-learning techniques for estimating aqueous glucose concentrations. Research Aim: The aim of this study is to compare and evaluate different machine-learning methods for predicting glucose concentration from NIR spectra. Additionally, the study aims to develop and assess a deep-learning model for classifying NIR spectra. Methodology: The research methodology involves the use of machine learning and deep learning techniques. Six machine learning regression models, including support vector machine regression, partial least squares regression, extra tree regression, random forest regression, extreme gradient boosting, and principal component analysis-neural network, are employed to predict glucose concentration. The NIR spectra data is randomly divided into train and test sets, and the process is repeated ten times to increase generalization ability. In addition, a convolutional neural network is developed for classifying NIR spectra. Findings: The study reveals that the SVMR, ETR, and PCA-NN models exhibit excellent performance in predicting glucose concentration, with correlation coefficients (R) > 0.99 and determination coefficients (R²)> 0.985. The deep learning model achieves high macro-averaging scores for precision, recall, and F1-measure. These findings demonstrate the effectiveness of machine learning and deep learning methods in optimizing the detection model and improving glucose prediction accuracy. Theoretical Importance: This research contributes to the field by providing a comprehensive analysis of various machine-learning techniques for estimating glucose concentrations from NIR spectra. It also explores the use of deep learning for the classification of indistinguishable NIR spectra. The findings highlight the potential of machine learning and deep learning in enhancing the prediction accuracy of glucose-relevant features. Data Collection and Analysis Procedures: The NIR spectra and corresponding references for glucose concentration are measured in increments of 20 mg/dl. The data is randomly divided into train and test sets, and the models are evaluated using regression analysis and classification metrics. The performance of each model is assessed based on correlation coefficients, determination coefficients, precision, recall, and F1-measure. Question Addressed: The study addresses the question of whether machine learning and deep learning methods can optimize the detection model and improve the accuracy of glucose prediction from NIR spectra. Conclusion: The research demonstrates that machine learning and deep learning methods can effectively predict glucose concentration from NIR spectra. The SVMR, ETR, and PCA-NN models exhibit superior performance, while the deep learning model achieves high classification scores. These findings suggest that machine learning and deep learning techniques can be used to improve the prediction accuracy of glucose-relevant features. Further research is needed to explore their clinical utility in analyzing complex matrices, such as blood glucose levels.Keywords: machine learning, signal processing, near-infrared spectroscopy, support vector machine, neural network
Procedia PDF Downloads 923330 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing
Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva
Abstract:
In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.Keywords: power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity
Procedia PDF Downloads 3103329 Incorporating Anomaly Detection in a Digital Twin Scenario Using Symbolic Regression
Authors: Manuel Alves, Angelica Reis, Armindo Lobo, Valdemar Leiras
Abstract:
In industry 4.0, it is common to have a lot of sensor data. In this deluge of data, hints of possible problems are difficult to spot. The digital twin concept aims to help answer this problem, but it is mainly used as a monitoring tool to handle the visualisation of data. Failure detection is of paramount importance in any industry, and it consumes a lot of resources. Any improvement in this regard is of tangible value to the organisation. The aim of this paper is to add the ability to forecast test failures, curtailing detection times. To achieve this, several anomaly detection algorithms were compared with a symbolic regression approach. To this end, Isolation Forest, One-Class SVM and an auto-encoder have been explored. For the symbolic regression PySR library was used. The first results show that this approach is valid and can be added to the tools available in this context as a low resource anomaly detection method since, after training, the only requirement is the calculation of a polynomial, a useful feature in the digital twin context.Keywords: anomaly detection, digital twin, industry 4.0, symbolic regression
Procedia PDF Downloads 1183328 Spatial Characters Adapted to Rainwater Natural Circulation in Residential Landscape
Authors: Yun Zhang
Abstract:
Urban housing in China is typified by residential districts that occupy 25 to 40 percentage of the urban land. In residential districts, squares, roads, and building facades, as well as plants, usually form a four-grade spatial structure: district entrances, central landscapes, housing cluster entrances, green spaces between dwellings. This spatial structure and its elements not only compose the visible residential landscape but also play a major role of carrying rain water. These elements, therefore, imply ecological significance to urban fitness. Based upon theories of landscape ecology, residential landscape can be understood as a pattern typified by minor soft patch of planted area and major hard patch of buildings and squares, as well as hard corridors of roads. Use five landscape districts in Hangzhou as examples; this paper finds that the size, shape and slope direction of soft patch, the bend of roads, and the form of the four-grade spatial structure are influential for adapting to natural rainwater circulation.Keywords: Hangzhou China, rainwater, residential landscape, spatial character, urban housing
Procedia PDF Downloads 3223327 Impact of Infrastructural Development on Socio-Economic Growth: An Empirical Investigation in India
Authors: Jonardan Koner
Abstract:
The study attempts to find out the impact of infrastructural investment on state economic growth in India. It further tries to determine the magnitude of the impact of infrastructural investment on economic indicator, i.e., per-capita income (PCI) in Indian States. The study uses panel regression technique to measure the impact of infrastructural investment on per-capita income (PCI) in Indian States. Panel regression technique helps incorporate both the cross-section and time-series aspects of the dataset. In order to analyze the difference in impact of the explanatory variables on the explained variables across states, the study uses Fixed Effect Panel Regression Model. The conclusions of the study are that infrastructural investment has a desirable impact on economic development and that the impact is different for different states in India. We analyze time series data (annual frequency) ranging from 1991 to 2010. The study reveals that the infrastructural investment significantly explains the variation of economic indicators.Keywords: infrastructural investment, multiple regression, panel regression techniques, economic development, fixed effect dummy variable model
Procedia PDF Downloads 3703326 A Quadratic Model to Early Predict the Blastocyst Stage with a Time Lapse Incubator
Authors: Cecile Edel, Sandrine Giscard D'Estaing, Elsa Labrune, Jacqueline Lornage, Mehdi Benchaib
Abstract:
Introduction: The use of incubator equipped with time-lapse technology in Artificial Reproductive Technology (ART) allows a continuous surveillance. With morphocinetic parameters, algorithms are available to predict the potential outcome of an embryo. However, the different proposed time-lapse algorithms do not take account the missing data, and then some embryos could not be classified. The aim of this work is to construct a predictive model even in the case of missing data. Materials and methods: Patients: A retrospective study was performed, in biology laboratory of reproduction at the hospital ‘Femme Mère Enfant’ (Lyon, France) between 1 May 2013 and 30 April 2015. Embryos (n= 557) obtained from couples (n=108) were cultured in a time-lapse incubator (Embryoscope®, Vitrolife, Goteborg, Sweden). Time-lapse incubator: The morphocinetic parameters obtained during the three first days of embryo life were used to build the predictive model. Predictive model: A quadratic regression was performed between the number of cells and time. N = a. T² + b. T + c. N: number of cells at T time (T in hours). The regression coefficients were calculated with Excel software (Microsoft, Redmond, WA, USA), a program with Visual Basic for Application (VBA) (Microsoft) was written for this purpose. The quadratic equation was used to find a value that allows to predict the blastocyst formation: the synthetize value. The area under the curve (AUC) obtained from the ROC curve was used to appreciate the performance of the regression coefficients and the synthetize value. A cut-off value has been calculated for each regression coefficient and for the synthetize value to obtain two groups where the difference of blastocyst formation rate according to the cut-off values was maximal. The data were analyzed with SPSS (IBM, Il, Chicago, USA). Results: Among the 557 embryos, 79.7% had reached the blastocyst stage. The synthetize value corresponds to the value calculated with time value equal to 99, the highest AUC was then obtained. The AUC for regression coefficient ‘a’ was 0.648 (p < 0.001), 0.363 (p < 0.001) for the regression coefficient ‘b’, 0.633 (p < 0.001) for the regression coefficient ‘c’, and 0.659 (p < 0.001) for the synthetize value. The results are presented as follow: blastocyst formation rate under cut-off value versus blastocyst rate formation above cut-off value. For the regression coefficient ‘a’ the optimum cut-off value was -1.14.10-3 (61.3% versus 84.3%, p < 0.001), 0.26 for the regression coefficient ‘b’ (83.9% versus 63.1%, p < 0.001), -4.4 for the regression coefficient ‘c’ (62.2% versus 83.1%, p < 0.001) and 8.89 for the synthetize value (58.6% versus 85.0%, p < 0.001). Conclusion: This quadratic regression allows to predict the outcome of an embryo even in case of missing data. Three regression coefficients and a synthetize value could represent the identity card of an embryo. ‘a’ regression coefficient represents the acceleration of cells division, ‘b’ regression coefficient represents the speed of cell division. We could hypothesize that ‘c’ regression coefficient could represent the intrinsic potential of an embryo. This intrinsic potential could be dependent from oocyte originating the embryo. These hypotheses should be confirmed by studies analyzing relationship between regression coefficients and ART parameters.Keywords: ART procedure, blastocyst formation, time-lapse incubator, quadratic model
Procedia PDF Downloads 3053325 Exploring the Impact of Domestic Credit Extension, Government Claims, Inflation, Exchange Rates, and Interest Rates on Manufacturing Output: A Financial Analysis.
Authors: Ojo Johnson Adelakun
Abstract:
This study explores the long-term relationships between manufacturing output (MO) and several economic determinants, interest rate (IR), inflation rate (INF), exchange rate (EX), credit to the private sector (CPSM), gross claims on the government sector (GCGS), using monthly data from March 1966 to December 2023. Employing advanced econometric techniques including Fully Modified Ordinary Least Squares (FMOLS), Dynamic Ordinary Least Squares (DOLS), and Canonical Cointegrating Regression (CCR), the analysis provides several key insights. The findings reveal a positive association between interest rates and manufacturing output, which diverges from traditional economic theory that predicts a negative correlation due to increased borrowing costs. This outcome is attributed to the financial resilience of large enterprises, allowing them to sustain investment in production despite higher interest rates. In addition, inflation demonstrates a positive relationship with manufacturing output, suggesting that stable inflation within target ranges creates a favourable environment for investment in productivity-enhancing technologies. Conversely, the exchange rate shows a negative relationship with manufacturing output, reflecting the adverse effects of currency depreciation on the cost of imported raw materials. The negative impact of CPSM underscores the importance of directing credit efficiently towards productive sectors rather than speculative ventures. Moreover, increased government borrowing appears to crowd out private sector credit, negatively affecting manufacturing output. Overall, the study highlights the need for a coordinated policy approach integrating monetary, fiscal, and financial sector strategies. Policymakers should account for the differential impacts of interest rates, inflation, exchange rates, and credit allocation on various sectors. Ensuring stable inflation, efficient credit distribution, and mitigating exchange rate volatility are critical for supporting manufacturing output and promoting sustainable economic growth. This research provides valuable insights into the economic dynamics influencing manufacturing output and offers policy recommendations tailored to South Africa’s economic context.Keywords: domestic credit, government claims, financial variables, manufacturing output, financial analysis
Procedia PDF Downloads 173324 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values
Authors: Daniel Fundi Murithi
Abstract:
Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.Keywords: finite population total, missing data, model-based imputation, two-phase sampling
Procedia PDF Downloads 1283323 Deep Learning for Qualitative and Quantitative Grain Quality Analysis Using Hyperspectral Imaging
Authors: Ole-Christian Galbo Engstrøm, Erik Schou Dreier, Birthe Møller Jespersen, Kim Steenstrup Pedersen
Abstract:
Grain quality analysis is a multi-parameterized problem that includes a variety of qualitative and quantitative parameters such as grain type classification, damage type classification, and nutrient regression. Currently, these parameters require human inspection, a multitude of instruments employing a variety of sensor technologies, and predictive model types or destructive and slow chemical analysis. This paper investigates the feasibility of applying near-infrared hyperspectral imaging (NIR-HSI) to grain quality analysis. For this study two datasets of NIR hyperspectral images in the wavelength range of 900 nm - 1700 nm have been used. Both datasets contain images of sparsely and densely packed grain kernels. The first dataset contains ~87,000 image crops of bulk wheat samples from 63 harvests where protein value has been determined by the FOSS Infratec NOVA which is the golden industry standard for protein content estimation in bulk samples of cereal grain. The second dataset consists of ~28,000 image crops of bulk grain kernels from seven different wheat varieties and a single rye variety. In the first dataset, protein regression analysis is the problem to solve while variety classification analysis is the problem to solve in the second dataset. Deep convolutional neural networks (CNNs) have the potential to utilize spatio-spectral correlations within a hyperspectral image to simultaneously estimate the qualitative and quantitative parameters. CNNs can autonomously derive meaningful representations of the input data reducing the need for advanced preprocessing techniques required for classical chemometric model types such as artificial neural networks (ANNs) and partial least-squares regression (PLS-R). A comparison between different CNN architectures utilizing 2D and 3D convolution is conducted. These results are compared to the performance of ANNs and PLS-R. Additionally, a variety of preprocessing techniques from image analysis and chemometrics are tested. These include centering, scaling, standard normal variate (SNV), Savitzky-Golay (SG) filtering, and detrending. The results indicate that the combination of NIR-HSI and CNNs has the potential to be the foundation for an automatic system unifying qualitative and quantitative grain quality analysis within a single sensor technology and predictive model type.Keywords: deep learning, grain analysis, hyperspectral imaging, preprocessing techniques
Procedia PDF Downloads 983322 Infrastructural Investment and Economic Growth in Indian States: A Panel Data Analysis
Authors: Jonardan Koner, Basabi Bhattacharya, Avinash Purandare
Abstract:
The study is focused to find out the impact of infrastructural investment on economic development in Indian states. The study uses panel data analysis to measure the impact of infrastructural investment on Real Gross Domestic Product in Indian States. Panel data analysis incorporates Unit Root Test, Cointegration Teat, Pooled Ordinary Least Squares, Fixed Effect Approach, Random Effect Approach, Hausman Test. The study analyzes panel data (annual in frequency) ranging from 1991 to 2012 and concludes that infrastructural investment has a desirable impact on economic development in Indian. Finally, the study reveals that the infrastructural investment significantly explains the variation of economic indicator.Keywords: infrastructural investment, real GDP, unit root test, cointegration teat, pooled ordinary least squares, fixed effect approach, random effect approach, Hausman test
Procedia PDF Downloads 4013321 Evaluation of Best-Fit Probability Distribution for Prediction of Extreme Hydrologic Phenomena
Authors: Karim Hamidi Machekposhti, Hossein Sedghi
Abstract:
The probability distributions are the best method for forecasting of extreme hydrologic phenomena such as rainfall and flood flows. In this research, in order to determine suitable probability distribution for estimating of annual extreme rainfall and flood flows (discharge) series with different return periods, precipitation with 40 and discharge with 58 years time period had been collected from Karkheh River at Iran. After homogeneity and adequacy tests, data have been analyzed by Stormwater Management and Design Aid (SMADA) software and residual sum of squares (R.S.S). The best probability distribution was Log Pearson Type III with R.S.S value (145.91) and value (13.67) for peak discharge and Log Pearson Type III with R.S.S values (141.08) and (8.95) for maximum discharge in Jelogir Majin and Pole Zal stations, respectively. The best distribution for maximum precipitation in Jelogir Majin and Pole Zal stations was Log Pearson Type III distribution with R.S.S values (1.74&1.90) and then Pearson Type III distribution with R.S.S values (1.53&1.69). Overall, the Log Pearson Type III distributions are acceptable distribution types for representing statistics of extreme hydrologic phenomena in Karkheh River at Iran with the Pearson Type III distribution as a potential alternative.Keywords: Karkheh River, Log Pearson Type III, probability distribution, residual sum of squares
Procedia PDF Downloads 1963320 A Novel Approach towards Test Case Prioritization Technique
Authors: Kamna Solanki, Yudhvir Singh, Sandeep Dalal
Abstract:
Software testing is a time and cost intensive process. A scrutiny of the code and rigorous testing is required to identify and rectify the putative bugs. The process of bug identification and its consequent correction is continuous in nature and often some of the bugs are removed after the software has been launched in the market. This process of code validation of the altered software during the maintenance phase is termed as Regression testing. Regression testing ubiquitously considers resource constraints; therefore, the deduction of an appropriate set of test cases, from the ensemble of the entire gamut of test cases, is a critical issue for regression test planning. This paper presents a novel method for designing a suitable prioritization process to optimize fault detection rate and performance of regression test on predefined constraints. The proposed method for test case prioritization m-ACO alters the food source selection criteria of natural ants and is basically a modified version of Ant Colony Optimization (ACO). The proposed m-ACO approach has been coded in 'Perl' language and results are validated using three examples by computation of Average Percentage of Faults Detected (APFD) metric.Keywords: regression testing, software testing, test case prioritization, test suite optimization
Procedia PDF Downloads 3363319 Exploring Re-Configuration of Ordinary Spaces into Recreation and Leisure Space in Compact Unplanned Settlements: Experience from Manzese Informal Settlement-Dar Es Salaam Tanzania
Authors: Edson Ephraim Sanga
Abstract:
This paper stems to explore possible places used for recreation in unplanned settlements in order to avail knowledge on how to create and shape urban spaces essential for recreation and leisure. The context of unplanned settlements is spatially characterized compactness and congestions of buildings developed by residents without professional inputs. These characteristics surpass greenery landscapes such as parks and squares essential for health, happiness and wellbeing. The lack of recreational greenery landscape arises a question on how possible can recreation take places in the settlements? This study used qualitative methods mainly observation and in-depth interview to explore the recreational situation in Manzese informal settlements as an instrumental case and found that ordinary spaces are re-configured into recreational spaces and used as ‘parks’ and ‘squares’ in the settlements. The spaces are diverse and complex as they possess different spatial characteristics based on their physical attributes and the way they are used and interpreted by respective users. This paper argues that the re-configuration processes of ordinary spaces should not be taken for granted because they portray the appropriation of spaces from quotidian dimensions in a particular context.Keywords: ordinary spaces, recreation, unplanned settlement, urban spaces
Procedia PDF Downloads 2743318 Prediction of the Thermodynamic Properties of Hydrocarbons Using Gaussian Process Regression
Authors: N. Alhazmi
Abstract:
Knowing the thermodynamics properties of hydrocarbons is vital when it comes to analyzing the related chemical reaction outcomes and understanding the reaction process, especially in terms of petrochemical industrial applications, combustions, and catalytic reactions. However, measuring the thermodynamics properties experimentally is time-consuming and costly. In this paper, Gaussian process regression (GPR) has been used to directly predict the main thermodynamic properties - standard enthalpy of formation, standard entropy, and heat capacity -for more than 360 cyclic and non-cyclic alkanes, alkenes, and alkynes. A simple workflow has been proposed that can be applied to directly predict the main properties of any hydrocarbon by knowing its descriptors and chemical structure and can be generalized to predict the main properties of any material. The model was evaluated by calculating the statistical error R², which was more than 0.9794 for all the predicted properties.Keywords: thermodynamic, Gaussian process regression, hydrocarbons, regression, supervised learning, entropy, enthalpy, heat capacity
Procedia PDF Downloads 2203317 Solving Single Machine Total Weighted Tardiness Problem Using Gaussian Process Regression
Authors: Wanatchapong Kongkaew
Abstract:
This paper proposes an application of probabilistic technique, namely Gaussian process regression, for estimating an optimal sequence of the single machine with total weighted tardiness (SMTWT) scheduling problem. In this work, the Gaussian process regression (GPR) model is utilized to predict an optimal sequence of the SMTWT problem, and its solution is improved by using an iterated local search based on simulated annealing scheme, called GPRISA algorithm. The results show that the proposed GPRISA method achieves a very good performance and a reasonable trade-off between solution quality and time consumption. Moreover, in the comparison of deviation from the best-known solution, the proposed mechanism noticeably outperforms the recently existing approaches.Keywords: Gaussian process regression, iterated local search, simulated annealing, single machine total weighted tardiness
Procedia PDF Downloads 3083316 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation
Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski
Abstract:
Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.Keywords: bootstrap, edgeworth approximation, IID, quantile
Procedia PDF Downloads 1583315 Harnessing Entrepreneurial Opportunities for National Security
Authors: Itiola Kehinde Adeniran
Abstract:
This paper investigated the influence of harnessing entrepreneurial opportunities on the national security in Nigeria with a specific focus on the security situation of the post-amnesty programmes of the Federal Government in Ondo State. The self-administered structured questionnaire was employed to collect data from one hundred and twenty participants through purposive sampling method. Inferential statistics was used to analyze the data, specifically; ordinary least squares linear regression method was employed with the aid of statistical package for social science (SPSS) version 20 in order to determine the influence of independent variable (entrepreneurial opportunities) on dependent variable (national security). The result showed that business opportunities have a significant influence on the rate of criminal activities. The study also revealed that entrepreneurial opportunity creation and discovery as well as providing a model on how these entrepreneurial opportunities could be effectively and efficiently utilized jointly predict better national security, which counted for 69% variance of crime rate reduction. The paper, therefore, recommended that citizens should be encouraged to develop an interest in the skill-based activities in order to change their mindset towards self-employment which can motivate them in identify entrepreneurial opportunities.Keywords: entrepreneurship, entrepreneurial opportunities, national security, unemployment
Procedia PDF Downloads 3263314 Sea Level Rise and Sediment Supply Explain Large-Scale Patterns of Saltmarsh Expansion and Erosion
Authors: Cai J. T. Ladd, Mollie F. Duggan-Edwards, Tjeerd J. Bouma, Jordi F. Pages, Martin W. Skov
Abstract:
Salt marshes are valued for their role in coastal flood protection, carbon storage, and for supporting biodiverse ecosystems. As a biogeomorphic landscape, marshes evolve through the complex interactions between sea level rise, sediment supply and wave/current forcing, as well as and socio-economic factors. Climate change and direct human modification could lead to a global decline marsh extent if left unchecked. Whilst the processes of saltmarsh erosion and expansion are well understood, empirical evidence on the key drivers of long-term lateral marsh dynamics is lacking. In a GIS, saltmarsh areal extent in 25 estuaries across Great Britain was calculated from historical maps and aerial photographs, at intervals of approximately 30 years between 1846 and 2016. Data on the key perceived drivers of lateral marsh change (namely sea level rise rates, suspended sediment concentration, bedload sediment flux rates, and frequency of both river flood and storm events) were collated from national monitoring centres. Continuous datasets did not extend beyond 1970, therefore predictor variables that best explained rate change of marsh extent between 1970 and 2016 was calculated using a Partial Least Squares Regression model. Information about the spread of Spartina anglica (an invasive marsh plant responsible for marsh expansion around the globe) and coastal engineering works that may have impacted on marsh extent, were also recorded from historical documents and their impacts assessed on long-term, large-scale marsh extent change. Results showed that salt marshes in the northern regions of Great Britain expanded an average of 2.0 ha/yr, whilst marshes in the south eroded an average of -5.3 ha/yr. Spartina invasion and coastal engineering works could not explain these trends since a trend of either expansion or erosion preceded these events. Results from the Partial Least Squares Regression model indicated that the rate of relative sea level rise (RSLR) and availability of suspended sediment concentration (SSC) best explained the patterns of marsh change. RSLR increased from 1.6 to 2.8 mm/yr, as SSC decreased from 404.2 to 78.56 mg/l along the north-to-south gradient of Great Britain, resulting in the shift from marsh expansion to erosion. Regional differences in RSLR and SSC are due to isostatic rebound since deglaciation, and tidal amplitudes respectively. Marshes exposed to low RSLR and high SSC likely leads to sediment accumulation at the coast suitable for colonisation by marsh plants and thus lateral expansion. In contrast, high RSLR with are likely not offset deposition under low SSC, thus average water depth at the marsh edge increases, allowing larger wind-waves to trigger marsh erosion. Current global declines in sediment flux to the coast are likely to diminish the resilience of salt marshes to RSLR. Monitoring and managing suspended sediment supply is not common-place, but may be critical to mitigating coastal impacts from climate change.Keywords: lateral saltmarsh dynamics, sea level rise, sediment supply, wave forcing
Procedia PDF Downloads 1343313 Time Series Regression with Meta-Clusters
Authors: Monika Chuchro
Abstract:
This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.Keywords: clustering, data analysis, data mining, predictive models
Procedia PDF Downloads 4643312 Economic Analysis of Cowpea (Unguiculata spp) Production in Northern Nigeria: A Case Study of Kano Katsina and Jigawa States
Authors: Yakubu Suleiman, S. A. Musa
Abstract:
Nigeria is the largest cowpea producer in the world, accounting for about 45%, followed by Brazil with about 17%. Cowpea is grown in Kano, Bauchi, Katsina, Borno in the north, Oyo in the west, and to the lesser extent in Enugu in the east. This study was conducted to determine the input–output relationship of Cowpea production in Kano, Katsina, and Jigawa states of Nigeria. The data were collected with the aid of 1000 structured questionnaires that were randomly distributed to Cowpea farmers in the three states mentioned above of the study area. The data collected were analyzed using regression analysis (Cobb–Douglass production function model). The result of the regression analysis revealed the coefficient of multiple determinations, R2, to be 72.5% and the F ration to be 106.20 and was found to be significant (P < 0.01). The regression coefficient of constant is 0.5382 and is significant (P < 0.01). The regression coefficient with respect to labor and seeds were 0.65554 and 0.4336, respectively, and they are highly significant (P < 0.01). The regression coefficient with respect to fertilizer is 0.26341 which is significant (P < 0.05). This implies that a unit increase of any one of the variable inputs used while holding all other variables inputs constants, will significantly increase the total Cowpea output by their corresponding coefficient. This indicated that farmers in the study area are operating in stage II of the production function. The result revealed that Cowpea farmer in Kano, Jigawa and Katsina States realized a profit of N15,997, N34,016 and N19,788 per hectare respectively. It is hereby recommended that more attention should be given to Cowpea production by government and research institutions.Keywords: coefficient, constant, inputs, regression
Procedia PDF Downloads 408