Search results for: least squares regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3461

Search results for: least squares regression

3341 Vertical and Horizontal Mismatches in Thailand and the Wage Penalty

Authors: Potjana Chunthanom

Abstract:

The Thai labor market experiences increasing challenges due to disruptive technologies and demographic shifts, including the COVID-19 pandemic. Consequently, there is a widening gap between the skills that firms seek and the skills that employees possess. This study aims to examine the incidence of vertical and horizontal mismatches and their impact on wages in Thailand before and during the COVID-19 pandemic using data from the third quarter of 2018 to 2021 from Thailand's National Labor Force Survey. This paper applies three methods: ordinary least squares (OLS), pooled ordinary least squares (Pooled OLS), and counterfactual decomposition. The findings suggest that the incidence of overeducation and field-of-study mismatch continues to increase during the COVID-19 pandemic in comparison to the two previous years. In contrast, there is a notable decline in the percentage of undereducated workers during the same period. Additionally, overeducated workers earn wage premiums, whereas undereducated and horizontally mismatched workers face wage penalties. The result also indicates that the COVID-19 pandemic has significant negative (positive) effects on overeducated (undereducated) workers.

Keywords: COVID-19, horizontal mismatch, overeducation, undereducation

Procedia PDF Downloads 37
3340 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 365
3339 Econometric Analysis of Organic Vegetable Production in Turkey

Authors: Ersin Karakaya, Halit Tutar

Abstract:

Reliable foods must be consumed in terms of healthy nutrition. The production and dissemination of diatom products in Turkey is rapidly evolving on the basis of preserving ecological balance, ensuring sustainability in agriculture and offering quality, reliable products to consumers. In this study, year in Turkey as (2002- 2015) to determine values of such as cultivated land of organic vegetable production, production levels, production quantity, number of products, number of farmers. It is intended to make the econometric analysis of the factors affecting the production of organic vegetable production (Number of products, Number of farmers and cultivated land). The main material of the study has created secondary data in relation to the 2002-2015 period as organic vegetable production in Turkey and regression analysis of the factors affecting the value of production of organic vegetable is determined by the Least Squares Method with EViews statistical software package.

Keywords: number of farmers, cultivated land, Eviews, Turkey

Procedia PDF Downloads 305
3338 A Comparison of Neural Network and DOE-Regression Analysis for Predicting Resource Consumption of Manufacturing Processes

Authors: Frank Kuebler, Rolf Steinhilper

Abstract:

Artificial neural networks (ANN) as well as Design of Experiments (DOE) based regression analysis (RA) are mainly used for modeling of complex systems. Both methodologies are commonly applied in process and quality control of manufacturing processes. Due to the fact that resource efficiency has become a critical concern for manufacturing companies, these models needs to be extended to predict resource-consumption of manufacturing processes. This paper describes an approach to use neural networks as well as DOE based regression analysis for predicting resource consumption of manufacturing processes and gives a comparison of the achievable results based on an industrial case study of a turning process.

Keywords: artificial neural network, design of experiments, regression analysis, resource efficiency, manufacturing process

Procedia PDF Downloads 523
3337 Logistic Regression Model versus Additive Model for Recurrent Event Data

Authors: Entisar A. Elgmati

Abstract:

Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.

Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event

Procedia PDF Downloads 633
3336 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data

Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone

Abstract:

This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.

Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression

Procedia PDF Downloads 135
3335 An Analysis of the Effect of Sharia Financing and Work Relation Founding towards Non-Performing Financing in Islamic Banks in Indonesia

Authors: Muhammad Bahrul Ilmi

Abstract:

The purpose of this research is to analyze the influence of Islamic financing and work relation founding simultaneously and partially towards non-performing financing in Islamic banks. This research was regression quantitative field research, and had been done in Muammalat Indonesia Bank and Islamic Danamon Bank in 3 months. The populations of this research were 15 account officers of Muammalat Indonesia Bank and Islamic Danamon Bank in Surakarta, Indonesia. The techniques of collecting data used in this research were documentation, questionnaire, literary study and interview. Regression analysis result shows that Islamic financing and work relation founding simultaneously has positive and significant effect towards non performing financing of two Islamic Banks. It is obtained with probability value 0.003 which is less than 0.05 and F value 9.584. The analysis result of Islamic financing regression towards non performing financing shows the significant effect. It is supported by double linear regression analysis with probability value 0.001 which is less than 0.05. The regression analysis of work relation founding effect towards non-performing financing shows insignificant effect. This is shown in the double linear regression analysis with probability value 0.161 which is bigger than 0.05.

Keywords: Syariah financing, work relation founding, non-performing financing (NPF), Islamic Bank

Procedia PDF Downloads 430
3334 A Kolmogorov-Smirnov Type Goodness-Of-Fit Test of Multinomial Logistic Regression Model in Case-Control Studies

Authors: Chen Li-Ching

Abstract:

The multinomial logistic regression model is used popularly for inferring the relationship of risk factors and disease with multiple categories. This study based on the discrepancy between the nonparametric maximum likelihood estimator and semiparametric maximum likelihood estimator of the cumulative distribution function to propose a Kolmogorov-Smirnov type test statistic to assess adequacy of the multinomial logistic regression model for case-control data. A bootstrap procedure is presented to calculate the critical value of the proposed test statistic. Empirical type I error rates and powers of the test are performed by simulation studies. Some examples will be illustrated the implementation of the test.

Keywords: case-control studies, goodness-of-fit test, Kolmogorov-Smirnov test, multinomial logistic regression

Procedia PDF Downloads 455
3333 A Study on Inference from Distance Variables in Hedonic Regression

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban area, several landmarks may affect housing price and rents, hedonic analysis should employ distance variables corresponding to each landmarks. Unfortunately, the effects of distances to landmarks on housing prices are generally not consistent with the true price. These distance variables may cause magnitude error in regression, pointing a problem of spatial multicollinearity. In this paper, we provided some approaches for getting the samples with less bias and method on locating the specific sampling area to avoid the multicollinerity problem in two specific landmarks case.

Keywords: landmarks, hedonic regression, distance variables, collinearity, multicollinerity

Procedia PDF Downloads 451
3332 Forecasting of Grape Juice Flavor by Using Support Vector Regression

Authors: Ren-Jieh Kuo, Chun-Shou Huang

Abstract:

The research of juice flavor forecasting has become more important in China. Due to the fast economic growth in China, many different kinds of juices have been introduced to the market. If a beverage company can understand their customers’ preference well, the juice can be served more attractively. Thus, this study intends to introduce the basic theory and computing process of grapes juice flavor forecasting based on support vector regression (SVR). Applying SVR, BPN and LR to forecast the flavor of grapes juice in real data, the result shows that SVR is more suitable and effective at predicting performance.

Keywords: flavor forecasting, artificial neural networks, Support Vector Regression, China

Procedia PDF Downloads 492
3331 Comprehensive Machine Learning-Based Glucose Sensing from Near-Infrared Spectra

Authors: Bitewulign Mekonnen

Abstract:

Context: This scientific paper focuses on the use of near-infrared (NIR) spectroscopy to determine glucose concentration in aqueous solutions accurately and rapidly. The study compares six different machine learning methods for predicting glucose concentration and also explores the development of a deep learning model for classifying NIR spectra. The objective is to optimize the detection model and improve the accuracy of glucose prediction. This research is important because it provides a comprehensive analysis of various machine-learning techniques for estimating aqueous glucose concentrations. Research Aim: The aim of this study is to compare and evaluate different machine-learning methods for predicting glucose concentration from NIR spectra. Additionally, the study aims to develop and assess a deep-learning model for classifying NIR spectra. Methodology: The research methodology involves the use of machine learning and deep learning techniques. Six machine learning regression models, including support vector machine regression, partial least squares regression, extra tree regression, random forest regression, extreme gradient boosting, and principal component analysis-neural network, are employed to predict glucose concentration. The NIR spectra data is randomly divided into train and test sets, and the process is repeated ten times to increase generalization ability. In addition, a convolutional neural network is developed for classifying NIR spectra. Findings: The study reveals that the SVMR, ETR, and PCA-NN models exhibit excellent performance in predicting glucose concentration, with correlation coefficients (R) > 0.99 and determination coefficients (R²)> 0.985. The deep learning model achieves high macro-averaging scores for precision, recall, and F1-measure. These findings demonstrate the effectiveness of machine learning and deep learning methods in optimizing the detection model and improving glucose prediction accuracy. Theoretical Importance: This research contributes to the field by providing a comprehensive analysis of various machine-learning techniques for estimating glucose concentrations from NIR spectra. It also explores the use of deep learning for the classification of indistinguishable NIR spectra. The findings highlight the potential of machine learning and deep learning in enhancing the prediction accuracy of glucose-relevant features. Data Collection and Analysis Procedures: The NIR spectra and corresponding references for glucose concentration are measured in increments of 20 mg/dl. The data is randomly divided into train and test sets, and the models are evaluated using regression analysis and classification metrics. The performance of each model is assessed based on correlation coefficients, determination coefficients, precision, recall, and F1-measure. Question Addressed: The study addresses the question of whether machine learning and deep learning methods can optimize the detection model and improve the accuracy of glucose prediction from NIR spectra. Conclusion: The research demonstrates that machine learning and deep learning methods can effectively predict glucose concentration from NIR spectra. The SVMR, ETR, and PCA-NN models exhibit superior performance, while the deep learning model achieves high classification scores. These findings suggest that machine learning and deep learning techniques can be used to improve the prediction accuracy of glucose-relevant features. Further research is needed to explore their clinical utility in analyzing complex matrices, such as blood glucose levels.

Keywords: machine learning, signal processing, near-infrared spectroscopy, support vector machine, neural network

Procedia PDF Downloads 92
3330 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing

Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva

Abstract:

In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.

Keywords: power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity

Procedia PDF Downloads 310
3329 Incorporating Anomaly Detection in a Digital Twin Scenario Using Symbolic Regression

Authors: Manuel Alves, Angelica Reis, Armindo Lobo, Valdemar Leiras

Abstract:

In industry 4.0, it is common to have a lot of sensor data. In this deluge of data, hints of possible problems are difficult to spot. The digital twin concept aims to help answer this problem, but it is mainly used as a monitoring tool to handle the visualisation of data. Failure detection is of paramount importance in any industry, and it consumes a lot of resources. Any improvement in this regard is of tangible value to the organisation. The aim of this paper is to add the ability to forecast test failures, curtailing detection times. To achieve this, several anomaly detection algorithms were compared with a symbolic regression approach. To this end, Isolation Forest, One-Class SVM and an auto-encoder have been explored. For the symbolic regression PySR library was used. The first results show that this approach is valid and can be added to the tools available in this context as a low resource anomaly detection method since, after training, the only requirement is the calculation of a polynomial, a useful feature in the digital twin context.

Keywords: anomaly detection, digital twin, industry 4.0, symbolic regression

Procedia PDF Downloads 118
3328 Spatial Characters Adapted to Rainwater Natural Circulation in Residential Landscape

Authors: Yun Zhang

Abstract:

Urban housing in China is typified by residential districts that occupy 25 to 40 percentage of the urban land. In residential districts, squares, roads, and building facades, as well as plants, usually form a four-grade spatial structure: district entrances, central landscapes, housing cluster entrances, green spaces between dwellings. This spatial structure and its elements not only compose the visible residential landscape but also play a major role of carrying rain water. These elements, therefore, imply ecological significance to urban fitness. Based upon theories of landscape ecology, residential landscape can be understood as a pattern typified by minor soft patch of planted area and major hard patch of buildings and squares, as well as hard corridors of roads. Use five landscape districts in Hangzhou as examples; this paper finds that the size, shape and slope direction of soft patch, the bend of roads, and the form of the four-grade spatial structure are influential for adapting to natural rainwater circulation.

Keywords: Hangzhou China, rainwater, residential landscape, spatial character, urban housing

Procedia PDF Downloads 322
3327 Impact of Infrastructural Development on Socio-Economic Growth: An Empirical Investigation in India

Authors: Jonardan Koner

Abstract:

The study attempts to find out the impact of infrastructural investment on state economic growth in India. It further tries to determine the magnitude of the impact of infrastructural investment on economic indicator, i.e., per-capita income (PCI) in Indian States. The study uses panel regression technique to measure the impact of infrastructural investment on per-capita income (PCI) in Indian States. Panel regression technique helps incorporate both the cross-section and time-series aspects of the dataset. In order to analyze the difference in impact of the explanatory variables on the explained variables across states, the study uses Fixed Effect Panel Regression Model. The conclusions of the study are that infrastructural investment has a desirable impact on economic development and that the impact is different for different states in India. We analyze time series data (annual frequency) ranging from 1991 to 2010. The study reveals that the infrastructural investment significantly explains the variation of economic indicators.

Keywords: infrastructural investment, multiple regression, panel regression techniques, economic development, fixed effect dummy variable model

Procedia PDF Downloads 370
3326 A Quadratic Model to Early Predict the Blastocyst Stage with a Time Lapse Incubator

Authors: Cecile Edel, Sandrine Giscard D'Estaing, Elsa Labrune, Jacqueline Lornage, Mehdi Benchaib

Abstract:

Introduction: The use of incubator equipped with time-lapse technology in Artificial Reproductive Technology (ART) allows a continuous surveillance. With morphocinetic parameters, algorithms are available to predict the potential outcome of an embryo. However, the different proposed time-lapse algorithms do not take account the missing data, and then some embryos could not be classified. The aim of this work is to construct a predictive model even in the case of missing data. Materials and methods: Patients: A retrospective study was performed, in biology laboratory of reproduction at the hospital ‘Femme Mère Enfant’ (Lyon, France) between 1 May 2013 and 30 April 2015. Embryos (n= 557) obtained from couples (n=108) were cultured in a time-lapse incubator (Embryoscope®, Vitrolife, Goteborg, Sweden). Time-lapse incubator: The morphocinetic parameters obtained during the three first days of embryo life were used to build the predictive model. Predictive model: A quadratic regression was performed between the number of cells and time. N = a. T² + b. T + c. N: number of cells at T time (T in hours). The regression coefficients were calculated with Excel software (Microsoft, Redmond, WA, USA), a program with Visual Basic for Application (VBA) (Microsoft) was written for this purpose. The quadratic equation was used to find a value that allows to predict the blastocyst formation: the synthetize value. The area under the curve (AUC) obtained from the ROC curve was used to appreciate the performance of the regression coefficients and the synthetize value. A cut-off value has been calculated for each regression coefficient and for the synthetize value to obtain two groups where the difference of blastocyst formation rate according to the cut-off values was maximal. The data were analyzed with SPSS (IBM, Il, Chicago, USA). Results: Among the 557 embryos, 79.7% had reached the blastocyst stage. The synthetize value corresponds to the value calculated with time value equal to 99, the highest AUC was then obtained. The AUC for regression coefficient ‘a’ was 0.648 (p < 0.001), 0.363 (p < 0.001) for the regression coefficient ‘b’, 0.633 (p < 0.001) for the regression coefficient ‘c’, and 0.659 (p < 0.001) for the synthetize value. The results are presented as follow: blastocyst formation rate under cut-off value versus blastocyst rate formation above cut-off value. For the regression coefficient ‘a’ the optimum cut-off value was -1.14.10-3 (61.3% versus 84.3%, p < 0.001), 0.26 for the regression coefficient ‘b’ (83.9% versus 63.1%, p < 0.001), -4.4 for the regression coefficient ‘c’ (62.2% versus 83.1%, p < 0.001) and 8.89 for the synthetize value (58.6% versus 85.0%, p < 0.001). Conclusion: This quadratic regression allows to predict the outcome of an embryo even in case of missing data. Three regression coefficients and a synthetize value could represent the identity card of an embryo. ‘a’ regression coefficient represents the acceleration of cells division, ‘b’ regression coefficient represents the speed of cell division. We could hypothesize that ‘c’ regression coefficient could represent the intrinsic potential of an embryo. This intrinsic potential could be dependent from oocyte originating the embryo. These hypotheses should be confirmed by studies analyzing relationship between regression coefficients and ART parameters.

Keywords: ART procedure, blastocyst formation, time-lapse incubator, quadratic model

Procedia PDF Downloads 305
3325 Exploring the Impact of Domestic Credit Extension, Government Claims, Inflation, Exchange Rates, and Interest Rates on Manufacturing Output: A Financial Analysis.

Authors: Ojo Johnson Adelakun

Abstract:

This study explores the long-term relationships between manufacturing output (MO) and several economic determinants, interest rate (IR), inflation rate (INF), exchange rate (EX), credit to the private sector (CPSM), gross claims on the government sector (GCGS), using monthly data from March 1966 to December 2023. Employing advanced econometric techniques including Fully Modified Ordinary Least Squares (FMOLS), Dynamic Ordinary Least Squares (DOLS), and Canonical Cointegrating Regression (CCR), the analysis provides several key insights. The findings reveal a positive association between interest rates and manufacturing output, which diverges from traditional economic theory that predicts a negative correlation due to increased borrowing costs. This outcome is attributed to the financial resilience of large enterprises, allowing them to sustain investment in production despite higher interest rates. In addition, inflation demonstrates a positive relationship with manufacturing output, suggesting that stable inflation within target ranges creates a favourable environment for investment in productivity-enhancing technologies. Conversely, the exchange rate shows a negative relationship with manufacturing output, reflecting the adverse effects of currency depreciation on the cost of imported raw materials. The negative impact of CPSM underscores the importance of directing credit efficiently towards productive sectors rather than speculative ventures. Moreover, increased government borrowing appears to crowd out private sector credit, negatively affecting manufacturing output. Overall, the study highlights the need for a coordinated policy approach integrating monetary, fiscal, and financial sector strategies. Policymakers should account for the differential impacts of interest rates, inflation, exchange rates, and credit allocation on various sectors. Ensuring stable inflation, efficient credit distribution, and mitigating exchange rate volatility are critical for supporting manufacturing output and promoting sustainable economic growth. This research provides valuable insights into the economic dynamics influencing manufacturing output and offers policy recommendations tailored to South Africa’s economic context.

Keywords: domestic credit, government claims, financial variables, manufacturing output, financial analysis

Procedia PDF Downloads 17
3324 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values

Authors: Daniel Fundi Murithi

Abstract:

Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.

Keywords: finite population total, missing data, model-based imputation, two-phase sampling

Procedia PDF Downloads 128
3323 Deep Learning for Qualitative and Quantitative Grain Quality Analysis Using Hyperspectral Imaging

Authors: Ole-Christian Galbo Engstrøm, Erik Schou Dreier, Birthe Møller Jespersen, Kim Steenstrup Pedersen

Abstract:

Grain quality analysis is a multi-parameterized problem that includes a variety of qualitative and quantitative parameters such as grain type classification, damage type classification, and nutrient regression. Currently, these parameters require human inspection, a multitude of instruments employing a variety of sensor technologies, and predictive model types or destructive and slow chemical analysis. This paper investigates the feasibility of applying near-infrared hyperspectral imaging (NIR-HSI) to grain quality analysis. For this study two datasets of NIR hyperspectral images in the wavelength range of 900 nm - 1700 nm have been used. Both datasets contain images of sparsely and densely packed grain kernels. The first dataset contains ~87,000 image crops of bulk wheat samples from 63 harvests where protein value has been determined by the FOSS Infratec NOVA which is the golden industry standard for protein content estimation in bulk samples of cereal grain. The second dataset consists of ~28,000 image crops of bulk grain kernels from seven different wheat varieties and a single rye variety. In the first dataset, protein regression analysis is the problem to solve while variety classification analysis is the problem to solve in the second dataset. Deep convolutional neural networks (CNNs) have the potential to utilize spatio-spectral correlations within a hyperspectral image to simultaneously estimate the qualitative and quantitative parameters. CNNs can autonomously derive meaningful representations of the input data reducing the need for advanced preprocessing techniques required for classical chemometric model types such as artificial neural networks (ANNs) and partial least-squares regression (PLS-R). A comparison between different CNN architectures utilizing 2D and 3D convolution is conducted. These results are compared to the performance of ANNs and PLS-R. Additionally, a variety of preprocessing techniques from image analysis and chemometrics are tested. These include centering, scaling, standard normal variate (SNV), Savitzky-Golay (SG) filtering, and detrending. The results indicate that the combination of NIR-HSI and CNNs has the potential to be the foundation for an automatic system unifying qualitative and quantitative grain quality analysis within a single sensor technology and predictive model type.

Keywords: deep learning, grain analysis, hyperspectral imaging, preprocessing techniques

Procedia PDF Downloads 98
3322 Infrastructural Investment and Economic Growth in Indian States: A Panel Data Analysis

Authors: Jonardan Koner, Basabi Bhattacharya, Avinash Purandare

Abstract:

The study is focused to find out the impact of infrastructural investment on economic development in Indian states. The study uses panel data analysis to measure the impact of infrastructural investment on Real Gross Domestic Product in Indian States. Panel data analysis incorporates Unit Root Test, Cointegration Teat, Pooled Ordinary Least Squares, Fixed Effect Approach, Random Effect Approach, Hausman Test. The study analyzes panel data (annual in frequency) ranging from 1991 to 2012 and concludes that infrastructural investment has a desirable impact on economic development in Indian. Finally, the study reveals that the infrastructural investment significantly explains the variation of economic indicator.

Keywords: infrastructural investment, real GDP, unit root test, cointegration teat, pooled ordinary least squares, fixed effect approach, random effect approach, Hausman test

Procedia PDF Downloads 401
3321 Evaluation of Best-Fit Probability Distribution for Prediction of Extreme Hydrologic Phenomena

Authors: Karim Hamidi Machekposhti, Hossein Sedghi

Abstract:

The probability distributions are the best method for forecasting of extreme hydrologic phenomena such as rainfall and flood flows. In this research, in order to determine suitable probability distribution for estimating of annual extreme rainfall and flood flows (discharge) series with different return periods, precipitation with 40 and discharge with 58 years time period had been collected from Karkheh River at Iran. After homogeneity and adequacy tests, data have been analyzed by Stormwater Management and Design Aid (SMADA) software and residual sum of squares (R.S.S). The best probability distribution was Log Pearson Type III with R.S.S value (145.91) and value (13.67) for peak discharge and Log Pearson Type III with R.S.S values (141.08) and (8.95) for maximum discharge in Jelogir Majin and Pole Zal stations, respectively. The best distribution for maximum precipitation in Jelogir Majin and Pole Zal stations was Log Pearson Type III distribution with R.S.S values (1.74&1.90) and then Pearson Type III distribution with R.S.S values (1.53&1.69). Overall, the Log Pearson Type III distributions are acceptable distribution types for representing statistics of extreme hydrologic phenomena in Karkheh River at Iran with the Pearson Type III distribution as a potential alternative.

Keywords: Karkheh River, Log Pearson Type III, probability distribution, residual sum of squares

Procedia PDF Downloads 196
3320 A Novel Approach towards Test Case Prioritization Technique

Authors: Kamna Solanki, Yudhvir Singh, Sandeep Dalal

Abstract:

Software testing is a time and cost intensive process. A scrutiny of the code and rigorous testing is required to identify and rectify the putative bugs. The process of bug identification and its consequent correction is continuous in nature and often some of the bugs are removed after the software has been launched in the market. This process of code validation of the altered software during the maintenance phase is termed as Regression testing. Regression testing ubiquitously considers resource constraints; therefore, the deduction of an appropriate set of test cases, from the ensemble of the entire gamut of test cases, is a critical issue for regression test planning. This paper presents a novel method for designing a suitable prioritization process to optimize fault detection rate and performance of regression test on predefined constraints. The proposed method for test case prioritization m-ACO alters the food source selection criteria of natural ants and is basically a modified version of Ant Colony Optimization (ACO). The proposed m-ACO approach has been coded in 'Perl' language and results are validated using three examples by computation of Average Percentage of Faults Detected (APFD) metric.

Keywords: regression testing, software testing, test case prioritization, test suite optimization

Procedia PDF Downloads 336
3319 Exploring Re-Configuration of Ordinary Spaces into Recreation and Leisure Space in Compact Unplanned Settlements: Experience from Manzese Informal Settlement-Dar Es Salaam Tanzania

Authors: Edson Ephraim Sanga

Abstract:

This paper stems to explore possible places used for recreation in unplanned settlements in order to avail knowledge on how to create and shape urban spaces essential for recreation and leisure. The context of unplanned settlements is spatially characterized compactness and congestions of buildings developed by residents without professional inputs. These characteristics surpass greenery landscapes such as parks and squares essential for health, happiness and wellbeing. The lack of recreational greenery landscape arises a question on how possible can recreation take places in the settlements? This study used qualitative methods mainly observation and in-depth interview to explore the recreational situation in Manzese informal settlements as an instrumental case and found that ordinary spaces are re-configured into recreational spaces and used as ‘parks’ and ‘squares’ in the settlements. The spaces are diverse and complex as they possess different spatial characteristics based on their physical attributes and the way they are used and interpreted by respective users. This paper argues that the re-configuration processes of ordinary spaces should not be taken for granted because they portray the appropriation of spaces from quotidian dimensions in a particular context.

Keywords: ordinary spaces, recreation, unplanned settlement, urban spaces

Procedia PDF Downloads 274
3318 Prediction of the Thermodynamic Properties of Hydrocarbons Using Gaussian Process Regression

Authors: N. Alhazmi

Abstract:

Knowing the thermodynamics properties of hydrocarbons is vital when it comes to analyzing the related chemical reaction outcomes and understanding the reaction process, especially in terms of petrochemical industrial applications, combustions, and catalytic reactions. However, measuring the thermodynamics properties experimentally is time-consuming and costly. In this paper, Gaussian process regression (GPR) has been used to directly predict the main thermodynamic properties - standard enthalpy of formation, standard entropy, and heat capacity -for more than 360 cyclic and non-cyclic alkanes, alkenes, and alkynes. A simple workflow has been proposed that can be applied to directly predict the main properties of any hydrocarbon by knowing its descriptors and chemical structure and can be generalized to predict the main properties of any material. The model was evaluated by calculating the statistical error R², which was more than 0.9794 for all the predicted properties.

Keywords: thermodynamic, Gaussian process regression, hydrocarbons, regression, supervised learning, entropy, enthalpy, heat capacity

Procedia PDF Downloads 220
3317 Solving Single Machine Total Weighted Tardiness Problem Using Gaussian Process Regression

Authors: Wanatchapong Kongkaew

Abstract:

This paper proposes an application of probabilistic technique, namely Gaussian process regression, for estimating an optimal sequence of the single machine with total weighted tardiness (SMTWT) scheduling problem. In this work, the Gaussian process regression (GPR) model is utilized to predict an optimal sequence of the SMTWT problem, and its solution is improved by using an iterated local search based on simulated annealing scheme, called GPRISA algorithm. The results show that the proposed GPRISA method achieves a very good performance and a reasonable trade-off between solution quality and time consumption. Moreover, in the comparison of deviation from the best-known solution, the proposed mechanism noticeably outperforms the recently existing approaches.

Keywords: Gaussian process regression, iterated local search, simulated annealing, single machine total weighted tardiness

Procedia PDF Downloads 308
3316 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski

Abstract:

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Keywords: bootstrap, edgeworth approximation, IID, quantile

Procedia PDF Downloads 158
3315 Harnessing Entrepreneurial Opportunities for National Security

Authors: Itiola Kehinde Adeniran

Abstract:

This paper investigated the influence of harnessing entrepreneurial opportunities on the national security in Nigeria with a specific focus on the security situation of the post-amnesty programmes of the Federal Government in Ondo State. The self-administered structured questionnaire was employed to collect data from one hundred and twenty participants through purposive sampling method. Inferential statistics was used to analyze the data, specifically; ordinary least squares linear regression method was employed with the aid of statistical package for social science (SPSS) version 20 in order to determine the influence of independent variable (entrepreneurial opportunities) on dependent variable (national security). The result showed that business opportunities have a significant influence on the rate of criminal activities. The study also revealed that entrepreneurial opportunity creation and discovery as well as providing a model on how these entrepreneurial opportunities could be effectively and efficiently utilized jointly predict better national security, which counted for 69% variance of crime rate reduction. The paper, therefore, recommended that citizens should be encouraged to develop an interest in the skill-based activities in order to change their mindset towards self-employment which can motivate them in identify entrepreneurial opportunities.

Keywords: entrepreneurship, entrepreneurial opportunities, national security, unemployment

Procedia PDF Downloads 326
3314 Sea Level Rise and Sediment Supply Explain Large-Scale Patterns of Saltmarsh Expansion and Erosion

Authors: Cai J. T. Ladd, Mollie F. Duggan-Edwards, Tjeerd J. Bouma, Jordi F. Pages, Martin W. Skov

Abstract:

Salt marshes are valued for their role in coastal flood protection, carbon storage, and for supporting biodiverse ecosystems. As a biogeomorphic landscape, marshes evolve through the complex interactions between sea level rise, sediment supply and wave/current forcing, as well as and socio-economic factors. Climate change and direct human modification could lead to a global decline marsh extent if left unchecked. Whilst the processes of saltmarsh erosion and expansion are well understood, empirical evidence on the key drivers of long-term lateral marsh dynamics is lacking. In a GIS, saltmarsh areal extent in 25 estuaries across Great Britain was calculated from historical maps and aerial photographs, at intervals of approximately 30 years between 1846 and 2016. Data on the key perceived drivers of lateral marsh change (namely sea level rise rates, suspended sediment concentration, bedload sediment flux rates, and frequency of both river flood and storm events) were collated from national monitoring centres. Continuous datasets did not extend beyond 1970, therefore predictor variables that best explained rate change of marsh extent between 1970 and 2016 was calculated using a Partial Least Squares Regression model. Information about the spread of Spartina anglica (an invasive marsh plant responsible for marsh expansion around the globe) and coastal engineering works that may have impacted on marsh extent, were also recorded from historical documents and their impacts assessed on long-term, large-scale marsh extent change. Results showed that salt marshes in the northern regions of Great Britain expanded an average of 2.0 ha/yr, whilst marshes in the south eroded an average of -5.3 ha/yr. Spartina invasion and coastal engineering works could not explain these trends since a trend of either expansion or erosion preceded these events. Results from the Partial Least Squares Regression model indicated that the rate of relative sea level rise (RSLR) and availability of suspended sediment concentration (SSC) best explained the patterns of marsh change. RSLR increased from 1.6 to 2.8 mm/yr, as SSC decreased from 404.2 to 78.56 mg/l along the north-to-south gradient of Great Britain, resulting in the shift from marsh expansion to erosion. Regional differences in RSLR and SSC are due to isostatic rebound since deglaciation, and tidal amplitudes respectively. Marshes exposed to low RSLR and high SSC likely leads to sediment accumulation at the coast suitable for colonisation by marsh plants and thus lateral expansion. In contrast, high RSLR with are likely not offset deposition under low SSC, thus average water depth at the marsh edge increases, allowing larger wind-waves to trigger marsh erosion. Current global declines in sediment flux to the coast are likely to diminish the resilience of salt marshes to RSLR. Monitoring and managing suspended sediment supply is not common-place, but may be critical to mitigating coastal impacts from climate change.

Keywords: lateral saltmarsh dynamics, sea level rise, sediment supply, wave forcing

Procedia PDF Downloads 134
3313 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 464
3312 Economic Analysis of Cowpea (Unguiculata spp) Production in Northern Nigeria: A Case Study of Kano Katsina and Jigawa States

Authors: Yakubu Suleiman, S. A. Musa

Abstract:

Nigeria is the largest cowpea producer in the world, accounting for about 45%, followed by Brazil with about 17%. Cowpea is grown in Kano, Bauchi, Katsina, Borno in the north, Oyo in the west, and to the lesser extent in Enugu in the east. This study was conducted to determine the input–output relationship of Cowpea production in Kano, Katsina, and Jigawa states of Nigeria. The data were collected with the aid of 1000 structured questionnaires that were randomly distributed to Cowpea farmers in the three states mentioned above of the study area. The data collected were analyzed using regression analysis (Cobb–Douglass production function model). The result of the regression analysis revealed the coefficient of multiple determinations, R2, to be 72.5% and the F ration to be 106.20 and was found to be significant (P < 0.01). The regression coefficient of constant is 0.5382 and is significant (P < 0.01). The regression coefficient with respect to labor and seeds were 0.65554 and 0.4336, respectively, and they are highly significant (P < 0.01). The regression coefficient with respect to fertilizer is 0.26341 which is significant (P < 0.05). This implies that a unit increase of any one of the variable inputs used while holding all other variables inputs constants, will significantly increase the total Cowpea output by their corresponding coefficient. This indicated that farmers in the study area are operating in stage II of the production function. The result revealed that Cowpea farmer in Kano, Jigawa and Katsina States realized a profit of N15,997, N34,016 and N19,788 per hectare respectively. It is hereby recommended that more attention should be given to Cowpea production by government and research institutions.

Keywords: coefficient, constant, inputs, regression

Procedia PDF Downloads 408