Search results for: regression test
11810 Segmentation of Piecewise Polynomial Regression Model by Using Reversible Jump MCMC Algorithm
Authors: Suparman
Abstract:
Piecewise polynomial regression model is very flexible model for modeling the data. If the piecewise polynomial regression model is matched against the data, its parameters are not generally known. This paper studies the parameter estimation problem of piecewise polynomial regression model. The method which is used to estimate the parameters of the piecewise polynomial regression model is Bayesian method. Unfortunately, the Bayes estimator cannot be found analytically. Reversible jump MCMC algorithm is proposed to solve this problem. Reversible jump MCMC algorithm generates the Markov chain that converges to the limit distribution of the posterior distribution of piecewise polynomial regression model parameter. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of piecewise polynomial regression model.Keywords: piecewise regression, bayesian, reversible jump MCMC, segmentation
Procedia PDF Downloads 37311809 HIV Disclosure Status and Factors among Women to Their Sexual Partner in Victory plus, Yogyakarta, Indonesia
Authors: Dwi Kartika Rukmi, Miftafu Darussalam
Abstract:
Background: The disclosure of women’s HIV status toward their sexual partners is an important issue that should be regarded as one of the efforts to prevent and control the spread of HIV. Research on the disclosure of seropositive HIV status as well as women-related factors in Indonesia, especially Yogyakarta is only a few. Methods: This is a correlational descriptive research along with its cross-sectional approach on 329 women with HIV/AIDS at the Victory Plus NGO from June to July 2016. This research used a purposive sampling method and a questionnaire as the data collection technique. The bivariate analysis test was undertaken by using a chi-square and multivariate test along with a logistic regression. Result: The multivariate analysis and logistic regression show five independent variables related to the disclosure of seropositive HIV status of women with HIV/AIDS toward their sexual partners, namely ethnicity (aOR = 36,859; 95% CI; (6,544-207,616)) religion (aOR =0,255; 95%CI; (0,075-0,868)), discussion with partners prior to the HIV test (aOR =0,069; 95%CI; (0,065-0,438)) , types of sexual partners (aOR = 0.191; 95% CI; (0.082-0,445)) and knowledge on the partners’ HIV status (aOR = 0.036; 95% CI; (0.008-0.160)). The highest level of reason for seropositive HIV women not to be open about their partners’ status is the fear of being rejected by their partners and the environmental stigma of HIV AIDS disease. Conclusion: The disclosure of seropositive HIV status in women with HIV/AIDS in the Victory Plus NGO of Yogyakarta was 79.4% or classified as a high category with some related factors such as ethnicity, religion, discussion with partners prior to the HIV test, types of partners and knowledge on the partners’ HIV status.Keywords: women, HIV, disclosure, sexual partner
Procedia PDF Downloads 26111808 A Statistical Model for the Geotechnical Parameters of Cement-Stabilised Hightown’s Soft Soil: A Case Stufy of Liverpool, UK
Authors: Hassnen M. Jafer, Khalid S. Hashim, W. Atherton, Ali W. Alattabi
Abstract:
This study investigates the effect of two important parameters (length of curing period and percentage of the added binder) on the strength of soil treated with OPC. An intermediate plasticity silty clayey soil with medium organic content was used in this study. This soft soil was treated with different percentages of a commercially available cement type 32.5-N. laboratory experiments were carried out on the soil treated with 0, 1.5, 3, 6, 9, and 12% OPC by the dry weight to determine the effect of OPC on the compaction parameters, consistency limits, and the compressive strength. Unconfined compressive strength (UCS) test was carried out on cement-treated specimens after exposing them to different curing periods (1, 3, 7, 14, 28, and 90 days). The results of UCS test were used to develop a non-linear multi-regression model to find the relationship between the predicted and the measured maximum compressive strength of the treated soil (qu). The results indicated that there was a significant improvement in the index of plasticity (IP) by treating with OPC; IP was decreased from 20.2 to 14.1 by using 12% of OPC; this percentage was enough to increase the UCS of the treated soil up to 1362 kPa after 90 days of curing. With respect to the statistical model of the predicted qu, the results showed that the regression coefficients (R2) was equal to 0.8534 which indicates a good reproducibility for the constructed model.Keywords: cement admixtures, soft soil stabilisation, geotechnical parameters, multi-regression model
Procedia PDF Downloads 36611807 The Theory behind Logistic Regression
Authors: Jan Henrik Wosnitza
Abstract:
The logistic regression has developed into a standard approach for estimating conditional probabilities in a wide range of applications including credit risk prediction. The article at hand contributes to the current literature on logistic regression fourfold: First, it is demonstrated that the binary logistic regression automatically meets its model assumptions under very general conditions. This result explains, at least in part, the logistic regression's popularity. Second, the requirement of homoscedasticity in the context of binary logistic regression is theoretically substantiated. The variances among the groups of defaulted and non-defaulted obligors have to be the same across the level of the aggregated default indicators in order to achieve linear logits. Third, this article sheds some light on the question why nonlinear logits might be superior to linear logits in case of a small amount of data. Fourth, an innovative methodology for estimating correlations between obligor-specific log-odds is proposed. In order to crystallize the key ideas, this paper focuses on the example of credit risk prediction. However, the results presented in this paper can easily be transferred to any other field of application.Keywords: correlation, credit risk estimation, default correlation, homoscedasticity, logistic regression, nonlinear logistic regression
Procedia PDF Downloads 42711806 Use of Regression Analysis in Determining the Length of Plastic Hinge in Reinforced Concrete Columns
Authors: Mehmet Alpaslan Köroğlu, Musa Hakan Arslan, Muslu Kazım Körez
Abstract:
Basic objective of this study is to create a regression analysis method that can estimate the length of a plastic hinge which is an important design parameter, by making use of the outcomes of (lateral load-lateral displacement hysteretic curves) the experimental studies conducted for the reinforced square concrete columns. For this aim, 170 different square reinforced concrete column tests results have been collected from the existing literature. The parameters which are thought affecting the plastic hinge length such as cross-section properties, features of material used, axial loading level, confinement of the column, longitudinal reinforcement bars in the columns etc. have been obtained from these 170 different square reinforced concrete column tests. In the study, when determining the length of plastic hinge, using the experimental test results, a regression analysis have been separately tested and compared with each other. In addition, the outcome of mentioned methods on determination of plastic hinge length of the reinforced concrete columns has been compared to other methods available in the literature.Keywords: columns, plastic hinge length, regression analysis, reinforced concrete
Procedia PDF Downloads 48011805 Detecting Trends in Annual Discharge and Precipitation in the Chott Melghir Basin in Southeastern Algeria
Authors: M. T. Bouziane, A. Benkhaled, B. Achour
Abstract:
In this study, data from 30 catchments in the Chott Melghir basin in the semiarid region of southern East Algeria were analyzed to investigate changes in annual discharge, annual precipitation over the 1965-2005 period. These data were analyzed with the aid of Kendall test trend and regression analysis. The results indicate that the major variations in all catchments discharge in Chott Melghir correspond well to the precipitation. Changes in total annual discharge of Chott Melghir were lower than changes in annual precipitation. Annual precipitation decreased by 66 percent and annual discharge decreased by 4 percent. No significant trend is detected for annual discharge and precipitation at major catchments up to 95% confidence level. The decreasing trend in Chott Melghir discharge is mainly attributed to the decrease of precipitation.Keywords: trends, climate change, precipitation, discharge, Kendall test, regression analysis, Chott Melghir catchments
Procedia PDF Downloads 30511804 The Factors of Supply Chain Collaboration
Authors: Ghada Soltane
Abstract:
The objective of this study was to identify factors impacting supply chain collaboration. a quantitative study was carried out on a sample of 84 Tunisian industrial companies. To verify the research hypotheses and test the direct effect of these factors on supply chain collaboration a multiple regression method was used using SPSS 26 software. The results show that there are four factors direct effects that affect supply chain collaboration in a meaningful and positive way, including: trust, engagement, information sharing and information qualityKeywords: supply chain collaboration, factors of collaboration, principal component analysis, multiple regression
Procedia PDF Downloads 5111803 Detecting Earnings Management via Statistical and Neural Networks Techniques
Authors: Mohammad Namazi, Mohammad Sadeghzadeh Maharluie
Abstract:
Predicting earnings management is vital for the capital market participants, financial analysts and managers. The aim of this research is attempting to respond to this query: Is there a significant difference between the regression model and neural networks’ models in predicting earnings management, and which one leads to a superior prediction of it? In approaching this question, a Linear Regression (LR) model was compared with two neural networks including Multi-Layer Perceptron (MLP), and Generalized Regression Neural Network (GRNN). The population of this study includes 94 listed companies in Tehran Stock Exchange (TSE) market from 2003 to 2011. After the results of all models were acquired, ANOVA was exerted to test the hypotheses. In general, the summary of statistical results showed that the precision of GRNN did not exhibit a significant difference in comparison with MLP. In addition, the mean square error of the MLP and GRNN showed a significant difference with the multi variable LR model. These findings support the notion of nonlinear behavior of the earnings management. Therefore, it is more appropriate for capital market participants to analyze earnings management based upon neural networks techniques, and not to adopt linear regression models.Keywords: earnings management, generalized linear regression, neural networks multi-layer perceptron, Tehran stock exchange
Procedia PDF Downloads 42411802 Comparison of Direct and Indirect Tensile Strength of Brittle Materials and Accurate Estimate of Tensile Strength
Authors: M. Etezadi, A. Fahimifar
Abstract:
In many geotechnical designs in rocks and rock masses, tensile strength of rock and rock mass is needed. The difficulties associated with performing a direct uniaxial tensile test on a rock specimen have led to a number of indirect methods for assessing the tensile strength that in the meantime the Brazilian test is more popular. Brazilian test is widely applied in rock engineering because specimens are easy to prepare, the test is easy to conduct and uniaxial compression test machines are quite common. This study compares experimental results of direct and Brazilian tensile tests carried out on two rock types and three concrete types using 39 cylindrical and 28 disc specimens. The tests are performed using Servo-Control device. The relationship between direct and indirect tensile strength of specimens is extracted using linear regression. In the following, tensile strength of direct and indirect test is evaluated using finite element analysis. The results are analyzed and effective factors on results are studied. According to the experimental results Brazilian test is shown higher tensile strength than direct test. Because of decreasing the contact surface of grains and increasing the uniformity in concrete specimens with fine aggregate (largest grain size= 6mm), higher tensile strength in direct test is shown. The experimental and numerical results of tensile strength are compared and empirical relationship witch is obtained from experimental tests is validated.Keywords: tensile strength, brittle materials, direct and indirect tensile test, numerical modeling
Procedia PDF Downloads 55011801 Model Averaging for Poisson Regression
Authors: Zhou Jianhong
Abstract:
Model averaging is a desirable approach to deal with model uncertainty, which, however, has rarely been explored for Poisson regression. In this paper, we propose a model averaging procedure based on an unbiased estimator of the expected Kullback-Leibler distance for the Poisson regression. Simulation study shows that the proposed model average estimator outperforms some other commonly used model selection and model average estimators in some situations. Our proposed methods are further applied to a real data example and the advantage of this method is demonstrated again.Keywords: model averaging, poission regression, Kullback-Leibler distance, statistics
Procedia PDF Downloads 52011800 Establishment of the Regression Uncertainty of the Critical Heat Flux Power Correlation for an Advanced Fuel Bundle
Authors: L. Q. Yuan, J. Yang, A. Siddiqui
Abstract:
A new regression uncertainty analysis methodology was applied to determine the uncertainties of the critical heat flux (CHF) power correlation for an advanced 43-element bundle design, which was developed by Canadian Nuclear Laboratories (CNL) to achieve improved economics, resource utilization and energy sustainability. The new methodology is considered more appropriate than the traditional methodology in the assessment of the experimental uncertainty associated with regressions. The methodology was first assessed using both the Monte Carlo Method (MCM) and the Taylor Series Method (TSM) for a simple linear regression model, and then extended successfully to a non-linear CHF power regression model (CHF power as a function of inlet temperature, outlet pressure and mass flow rate). The regression uncertainty assessed by MCM agrees well with that by TSM. An equation to evaluate the CHF power regression uncertainty was developed and expressed as a function of independent variables that determine the CHF power.Keywords: CHF experiment, CHF correlation, regression uncertainty, Monte Carlo Method, Taylor Series Method
Procedia PDF Downloads 41711799 Assessment of Association Between Microalbuminuria and Lung Function Test Among the Community of Jimma Town
Authors: Diriba Dereje
Abstract:
Background: Cardiac and renal disease are the most prevalent chronic non-communicable diseases (CNCD) affecting the community in a significant manner. The best and recommended method in halting CNCD is by working on prevention as early as possible. This is only possible if early surrogate markers are identified. As part of the stated solution, this study will identify an association between microalbuminuria (an early surrogate marker of renal and cardiac disease) and lung function test among adult in the community. Objective: The main aim of this study was to assess an association between microalbuminuria (an early surrogate marker of renal and cardiac disease) and lung function test among adult in the community. Methodology: Community based cross sectional study was conducted among 384 adult in Jimma town. A systematic sampling technique was used in selecting participants to the study. In searching for the possible association, binary and multivariate logistic regression and t-test was conducted. Finally, the association between microalbuminuria and lung function test was well stated in the form of figures and written description. Result and Conclusion: A significant association was found between microalbuminuria and different lung function test parameters.Keywords: microalbuminuria, lung function, association, test
Procedia PDF Downloads 19411798 Modeling Aeration of Sharp Crested Weirs by Using Support Vector Machines
Authors: Arun Goel
Abstract:
The present paper attempts to investigate the prediction of air entrainment rate and aeration efficiency of a free over-fall jets issuing from a triangular sharp crested weir by using regression based modelling. The empirical equations, support vector machine (polynomial and radial basis function) models and the linear regression techniques were applied on the triangular sharp crested weirs relating the air entrainment rate and the aeration efficiency to the input parameters namely drop height, discharge, and vertex angle. It was observed that there exists a good agreement between the measured values and the values obtained using empirical equations, support vector machine (Polynomial and rbf) models, and the linear regression techniques. The test results demonstrated that the SVM based (Poly & rbf) model also provided acceptable prediction of the measured values with reasonable accuracy along with empirical equations and linear regression techniques in modelling the air entrainment rate and the aeration efficiency of a free over-fall jets issuing from triangular sharp crested weir. Further sensitivity analysis has also been performed to study the impact of input parameter on the output in terms of air entrainment rate and aeration efficiency.Keywords: air entrainment rate, dissolved oxygen, weir, SVM, regression
Procedia PDF Downloads 43611797 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second
Authors: P. V. Pramila , V. Mahesh
Abstract:
Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.Keywords: FEV, multivariate adaptive regression splines pulmonary function test, random forest
Procedia PDF Downloads 31111796 Use of Multistage Transition Regression Models for Credit Card Income Prediction
Authors: Denys Osipenko, Jonathan Crook
Abstract:
Because of the variety of the card holders’ behaviour types and income sources each consumer account can be transferred to a variety of states. Each consumer account can be inactive, transactor, revolver, delinquent, defaulted and requires an individual model for the income prediction. The estimation of transition probabilities between statuses at the account level helps to avoid the memorylessness of the Markov Chains approach. This paper investigates the transition probabilities estimation approaches to credit cards income prediction at the account level. The key question of empirical research is which approach gives more accurate results: multinomial logistic regression or multistage conditional logistic regression with binary target. Both models have shown moderate predictive power. Prediction accuracy for conditional logistic regression depends on the order of stages for the conditional binary logistic regression. On the other hand, multinomial logistic regression is easier for usage and gives integrate estimations for all states without priorities. Thus further investigations can be concentrated on alternative modeling approaches such as discrete choice models.Keywords: multinomial regression, conditional logistic regression, credit account state, transition probability
Procedia PDF Downloads 48711795 Semiparametric Regression Of Truncated Spline Biresponse On Farmer Loyalty And Attachment Modeling
Authors: Adji Achmad Rinaldo Fernandes
Abstract:
Regression analysis is a statistical method that is able to describe and predict causal relationships between individuals. Not all relationships have a known curve shape; often, there are relationship patterns that cannot be known in the shape of the curve; besides that, a cause can have an impact on more than one effect, so that between effects can also have a close relationship in it. Regression analysis that can be done to find out the relationship can be brought closer to the semiparametric regression of truncated spline biresponse. The purpose of this study is to examine the function estimator and determine the best model of truncated spline biresponse semiparametric regression. The results of the secondary data study showed that the best model with the highest order of quadratic and a maximum of two knots with a Goodness of fit value in the form of Adjusted R2 of 88.5%.Keywords: biresponse, farmer attachment, farmer loyalty, truncated spline
Procedia PDF Downloads 4111794 Internet Purchases in European Union Countries: Multiple Linear Regression Approach
Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić
Abstract:
This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analysed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analysed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.Keywords: European union, Internet purchases, multiple linear regression model, outlier
Procedia PDF Downloads 30311793 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Models
Authors: Alam Ali, Ashok Kumar Pathak
Abstract:
Path analysis is a statistical technique used to evaluate the direct and indirect effects of variables in path models. One or more structural regression equations are used to estimate a series of parameters in path models to find the better fit of data. However, sometimes the assumptions of classical regression models, such as ordinary least squares (OLS), are violated by the nature of the data, resulting in insignificant direct and indirect effects of exogenous variables. This article aims to explore the effectiveness of a copula-based regression approach as an alternative to classical regression, specifically when variables are linked through an elliptical copula.Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique
Procedia PDF Downloads 4311792 Determination of Small Shear Modulus of Clayey Sand Using Bender Element Test
Authors: R. Sadeghzadegan, S. A. Naeini, A. Mirzaii
Abstract:
In this article, the results of a series of carefully conducted laboratory test program were represented to determine the small strain shear modulus of sand mixed with a range of kaolinite including zero to 30%. This was experimentally achieved using a triaxial cell equipped with bender element. Results indicate that small shear modulus tends to increase, while clay content decreases and effective confining pressure increases. The exponent of stress in the power model regression analysis was not sensitive to the amount of clay content for all sand clay mixtures, while coefficient A was directly affected by change in clay content.Keywords: small shear modulus, bender element test, plastic fines, sand
Procedia PDF Downloads 47411791 The Relationship between Coping Styles and Internet Addiction among High School Students
Authors: Adil Kaval, Digdem Muge Siyez
Abstract:
With the negative effects of internet use in a person's life, the use of the Internet has become an issue. This subject was mostly considered as internet addiction, and it was investigated. In literature, it is noteworthy that some theoretical models have been proposed to explain the reasons for internet addiction. In addition to these theoretical models, it may be thought that the coping style for stressing events can be a predictor of internet addiction. It was aimed to test with logistic regression the effect of high school students' coping styles on internet addiction levels. Sample of the study consisted of 770 Turkish adolescents (471 girls, 299 boys) selected from high schools in the 2017-2018 academic year in İzmir province. Internet Addiction Test, Coping Scale for Child and Adolescents and a demographic information form were used in this study. The results of the logistic regression analysis indicated that the model of coping styles predicted internet addiction provides a statistically significant prediction of internet addiction. Gender does not predict whether or not to be addicted to the internet. The active coping style is not effective on internet addiction levels, while the avoiding and negative coping style are effective on internet addiction levels. With this model, % 79.1 of internet addiction in high school is estimated. The Negelkerke pseudo R2 indicated that the model accounted for %35 of the total variance. The results of this study on Turkish adolescents are similar to the results of other studies in the literature. It can be argued that avoiding and negative coping styles are important risk factors in the development of internet addiction.Keywords: adolescents, coping, internet addiction, regression analysis
Procedia PDF Downloads 17611790 Optimization of Slider Crank Mechanism Using Design of Experiments and Multi-Linear Regression
Authors: Galal Elkobrosy, Amr M. Abdelrazek, Bassuny M. Elsouhily, Mohamed E. Khidr
Abstract:
Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3rd degree to 1st degree and suggested valid predictions and stable explanations.Keywords: design of experiments, regression analysis, SI engine, statistical modeling
Procedia PDF Downloads 18611789 An Epsilon Hierarchical Fuzzy Twin Support Vector Regression
Authors: Arindam Chaudhuri
Abstract:
The research presents epsilon- hierarchical fuzzy twin support vector regression (epsilon-HFTSVR) based on epsilon-fuzzy twin support vector regression (epsilon-FTSVR) and epsilon-twin support vector regression (epsilon-TSVR). Epsilon-FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon-TSVR which takes care of uncertainty existing in forecasting problems. Epsilon-FTSVR determines a pair of epsilon-insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon-FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon-FTSVR is then reformulated as epsilon-HFTSVR consisting of a set of hierarchical layers each containing epsilon-FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon-HFTSVR has remarkable generalization performance with minimum training time.Keywords: regression, epsilon-TSVR, epsilon-FTSVR, epsilon-HFTSVR
Procedia PDF Downloads 37611788 Nonparametric Truncated Spline Regression Model on the Data of Human Development Index in Indonesia
Authors: Kornelius Ronald Demu, Dewi Retno Sari Saputro, Purnami Widyaningsih
Abstract:
Human Development Index (HDI) is a standard measurement for a country's human development. Several factors may have influenced it, such as life expectancy, gross domestic product (GDP) based on the province's annual expenditure, the number of poor people, and the percentage of an illiterate people. The scatter plot between HDI and the influenced factors show that the plot does not follow a specific pattern or form. Therefore, the HDI's data in Indonesia can be applied with a nonparametric regression model. The estimation of the regression curve in the nonparametric regression model is flexible because it follows the shape of the data pattern. One of the nonparametric regression's method is a truncated spline. Truncated spline regression is one of the nonparametric approach, which is a modification of the segmented polynomial functions. The estimator of a truncated spline regression model was affected by the selection of the optimal knots point. Knot points is a focus point of spline truncated functions. The optimal knots point was determined by the minimum value of generalized cross validation (GCV). In this article were applied the data of Human Development Index with a truncated spline nonparametric regression model. The results of this research were obtained the best-truncated spline regression model to the HDI's data in Indonesia with the combination of optimal knots point 5-5-5-4. Life expectancy and the percentage of an illiterate people were the significant factors depend to the HDI in Indonesia. The coefficient of determination is 94.54%. This means the regression model is good enough to applied on the data of HDI in Indonesia.Keywords: generalized cross validation (GCV), Human Development Index (HDI), knots point, nonparametric regression, truncated spline
Procedia PDF Downloads 34211787 Regression Model Evaluation on Depth Camera Data for Gaze Estimation
Authors: James Purnama, Riri Fitri Sari
Abstract:
We investigate the machine learning algorithm selection problem in the term of a depth image based eye gaze estimation, with respect to its essential difficulty in reducing the number of required training samples and duration time of training. Statistics based prediction accuracy are increasingly used to assess and evaluate prediction or estimation in gaze estimation. This article evaluates Root Mean Squared Error (RMSE) and R-Squared statistical analysis to assess machine learning methods on depth camera data for gaze estimation. There are 4 machines learning methods have been evaluated: Random Forest Regression, Regression Tree, Support Vector Machine (SVM), and Linear Regression. The experiment results show that the Random Forest Regression has the lowest RMSE and the highest R-Squared, which means that it is the best among other methods.Keywords: gaze estimation, gaze tracking, eye tracking, kinect, regression model, orange python
Procedia PDF Downloads 53911786 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques
Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas
Abstract:
The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.Keywords: Artificial Neural network, Competitive dynamics, Logistic Regression, Text classification, Text mining
Procedia PDF Downloads 12211785 Evaluation of the CRISP-DM Business Understanding Step: An Approach for Assessing the Predictive Power of Regression versus Classification for the Quality Prediction of Hydraulic Test Results
Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter
Abstract:
Digitalisation in production technology is a driver for the application of machine learning methods. Through the application of predictive quality, the great potential for saving necessary quality control can be exploited through the data-based prediction of product quality and states. However, the serial use of machine learning applications is often prevented by various problems. Fluctuations occur in real production data sets, which are reflected in trends and systematic shifts over time. To counteract these problems, data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets to extract stable features. Successful process control of the target variables aims to centre the measured values around a mean and minimise variance. Competitive leaders claim to have mastered their processes. As a result, much of the real data has a relatively low variance. For the training of prediction models, the highest possible generalisability is required, which is at least made more difficult by this data availability. The implementation of a machine learning application can be interpreted as a production process. The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that describes the life cycle of data science. As in any process, the costs to eliminate errors increase significantly with each advancing process phase. For the quality prediction of hydraulic test steps of directional control valves, the question arises in the initial phase whether a regression or a classification is more suitable. In the context of this work, the initial phase of the CRISP-DM, the business understanding, is critically compared for the use case at Bosch Rexroth with regard to regression and classification. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. Suitable methods for leakage volume flow regression and classification for inspection decision are applied. Impressively, classification is clearly superior to regression and achieves promising accuracies.Keywords: classification, CRISP-DM, machine learning, predictive quality, regression
Procedia PDF Downloads 14511784 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities
Authors: Retius Chifurira
Abstract:
Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities
Procedia PDF Downloads 20111783 Analysis of the Result for the Accelerated Life Cycle Test of the Motor for Washing Machine by Using Acceleration Factor
Authors: Youn-Sung Kim, Jin-Ho Jo, Mi-Sung Kim, Jae-Kun Lee
Abstract:
Accelerated life cycle test is applied to various products or components in order to reduce the time of life cycle test in industry. It must be considered for many test conditions according to the product characteristics for the test and the selection of acceleration parameter is especially very important. We have carried out the general life cycle test and the accelerated life cycle test by applying the acceleration factor (AF) considering the characteristics of brushless DC (BLDC) motor for washing machine. The final purpose of this study is to verify the validity by analyzing the results of the general life cycle test and the accelerated life cycle test. It will make it possible to reduce the life test time through the reasonable accelerated life cycle test.Keywords: accelerated life cycle test, reliability test, motor for washing machine, brushless dc motor test
Procedia PDF Downloads 61311782 Equipment Design for Lunar Lander Landing-Impact Test
Authors: Xiaohuan Li, Wangmin Yi, Xinghui Wu
Abstract:
In order to verify the performance of lunar lander structure, landing-impact test is urgently needed. Moreover, the test equipment is necessary for the test. The functions and the key points of the equipment is presented to satisfy the requirements of the test,and the design scheme is proposed. The composition, the major function and the critical parts’ design of the equipment are introduced. By the load test of releasing device and single-beam hoist, and the compatibility test of landing-impact testing system, the rationality and reliability of the equipment is proved.Keywords: landing-impact test, lunar lander, releasing device, test equipment
Procedia PDF Downloads 62311781 Principal Component Regression in Amylose Content on the Malaysian Market Rice Grains Using Near Infrared Reflectance Spectroscopy
Authors: Syahira Ibrahim, Herlina Abdul Rahim
Abstract:
The amylose content is an essential element in determining the texture and taste of rice grains. This paper evaluates the use of VIS-SWNIRS in estimating the amylose content for seven varieties of rice grains available in the Malaysian market. Each type consists of 30 samples and all the samples are scanned using the spectroscopy to obtain a range of values between 680-1000nm. The Savitzky-Golay (SG) smoothing filter is applied to each sample’s data before the Principal Component Regression (PCR) technique is used to examine the data and produce a single value for each sample. This value is then compared with reference values obtained from the standard iodine colorimetric test in terms of its coefficient of determination, R2. Results show that this technique produced low R2 values of less than 0.50. In order to improve the result, the range should include a wavelength range of 1100-2500nm and the number of samples processed should also be increased.Keywords: amylose content, diffuse reflectance, Malaysia rice grain, principal component regression (PCR), Visible and Shortwave near-infrared spectroscopy (VIS-SWNIRS)
Procedia PDF Downloads 382