Search results for: partial least square regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5591

Search results for: partial least square regression

5591 QSRR Analysis of 17-Picolyl and 17-Picolinylidene Androstane Derivatives Based on Partial Least Squares and Principal Component Regression

Authors: Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Lidija Jevrić, Evgenija Djurendić, Jovana Ajduković

Abstract:

There are several methods for determination of the lipophilicity of biologically active compounds, however chromatography has been shown as a very suitable method for this purpose. Chromatographic (C18-RP-HPLC) analysis of a series of 24 17-picolyl and 17-picolinylidene androstane derivatives was carried out. The obtained retention indices (logk, methanol (90%) / water (10%)) were correlated with calculated physicochemical and lipophilicity descriptors. The QSRR analysis was carried out applying principal component regression (PCR) and partial least squares regression (PLS). The PCR and PLS model were selected on the basis of the highest variance and the lowest root mean square error of cross-validation. The obtained PCR and PLS model successfully correlate the calculated molecular descriptors with logk parameter indicating the significance of the lipophilicity of compounds in chromatographic process. On the basis of the obtained results it can be concluded that the obtained logk parameters of the analyzed androstane derivatives can be considered as their chromatographic lipophilicity. These results are the part of the project No. 114-451-347/2015-02, financially supported by the Provincial Secretariat for Science and Technological Development of Vojvodina and CMST COST Action CM1105.

Keywords: androstane derivatives, chromatography, molecular structure, principal component regression, partial least squares regression

Procedia PDF Downloads 237
5590 Partial Least Square Regression for High-Dimentional and High-Correlated Data

Authors: Mohammed Abdullah Alshahrani

Abstract:

The research focuses on investigating the use of partial least squares (PLS) methodology for addressing challenges associated with high-dimensional correlated data. Recent technological advancements have led to experiments producing data characterized by a large number of variables compared to observations, with substantial inter-variable correlations. Such data patterns are common in chemometrics, where near-infrared (NIR) spectrometer calibrations record chemical absorbance levels across hundreds of wavelengths, and in genomics, where thousands of genomic regions' copy number alterations (CNA) are recorded from cancer patients. PLS serves as a widely used method for analyzing high-dimensional data, functioning as a regression tool in chemometrics and a classification method in genomics. It handles data complexity by creating latent variables (components) from original variables. However, applying PLS can present challenges. The study investigates key areas to address these challenges, including unifying interpretations across three main PLS algorithms and exploring unusual negative shrinkage factors encountered during model fitting. The research presents an alternative approach to addressing the interpretation challenge of predictor weights associated with PLS. Sparse estimation of predictor weights is employed using a penalty function combining a lasso penalty for sparsity and a Cauchy distribution-based penalty to account for variable dependencies. The results demonstrate sparse and grouped weight estimates, aiding interpretation and prediction tasks in genomic data analysis. High-dimensional data scenarios, where predictors outnumber observations, are common in regression analysis applications. Ordinary least squares regression (OLS), the standard method, performs inadequately with high-dimensional and highly correlated data. Copy number alterations (CNA) in key genes have been linked to disease phenotypes, highlighting the importance of accurate classification of gene expression data in bioinformatics and biology using regularized methods like PLS for regression and classification.

Keywords: partial least square regression, genetics data, negative filter factors, high dimensional data, high correlated data

Procedia PDF Downloads 0
5589 Effects of Temperature and Cysteine Addition on Formation of Flavor from Maillard Reaction Using Xylose and Rapeseed Meal Peptide

Authors: Zuoyong Zhang, Min Yu, Jinlong Zhao, Shudong He

Abstract:

The Maillard reaction can produce the flavor enhancing substance through the chemical crosslinking between free amino group of the protein or polypeptide with the carbonyl of the reducing sugar. In this research, solutions of rapeseed meal peptide and D-xylose with or without L-cysteine (RXC or RX) were heated over a range of temperatures (80-140 °C) for 2 h. It was observed that RXs had a severe browning,while RXCs accompanied by more pH decrement with the temperature increasing. Then the correlation among data of quantitative sensory descriptive analysis, free amino acid (FAA) and GC–MS of RXCs and RXs were analyzed using the partial least square regression method. Results suggested that the Maillard reaction product (MRPs) with cysteine formed at 120 °C (RXC-120) had greater sensory properties especially meat-like flavor compared to other MRPs. Meanwhile, it revealed that glutamic and glycine not only had a positive contribution to meaty aroma but also showed a significant and positive influence on umami taste of RXs based on the FAA data. Moreover, the sulfur-containing compounds showed a significant positive correlation with the meat-like flavor of RXCs, while RXs depended on furans and nitrogenous-containing compounds with more caramel-like flavor. Therefore, a MRP with strong meaty flavor could be obtained at 120 °C by addition of cysteine.

Keywords: rapeseed meal, Maillard reaction, sensory characteristics, FAA, GC–MS, partial least square regression

Procedia PDF Downloads 234
5588 CO₂ Absorption Studies Using Amine Solvents with Fourier Transform Infrared Analysis

Authors: Avoseh Funmilola, Osman Khalid, Wayne Nelson, Paramespri Naidoo, Deresh Ramjugernath

Abstract:

The increasing global atmospheric temperature is of great concern and this has led to the development of technologies to reduce the emission of greenhouse gases into the atmosphere. Flue gas emissions from fossil fuel combustion are major sources of greenhouse gases. One of the ways to reduce the emission of CO₂ from flue gases is by post combustion capture process and this can be done by absorbing the gas into suitable chemical solvents before emitting the gas into the atmosphere. Alkanolamines are promising solvents for this capture process. Vapour liquid equilibrium of CO₂-alkanolamine systems is often represented by CO₂ loading and partial pressure of CO₂ without considering the liquid phase. The liquid phase of this system is a complex one comprising of 9 species. Online analysis of the process is important to monitor the concentrations of the liquid phase reacting and product species. Liquid phase analysis of CO₂-diethanolamine (DEA) solution was performed by attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy. A robust Calibration was performed for the CO₂-aqueous DEA system prior to an online monitoring experiment. The partial least square regression method was used for the analysis of the calibration spectra obtained. The models obtained were used for prediction of DEA and CO₂ concentrations in the online monitoring experiment. The experiment was performed with a newly built recirculating experimental set up in the laboratory. The set up consist of a 750 ml equilibrium cell and ATR-FTIR liquid flow cell. Measurements were performed at 400°C. The results obtained indicated that the FTIR spectroscopy combined with Partial least square method is an effective tool for online monitoring of speciation.

Keywords: ATR-FTIR, CO₂ capture, online analysis, PLS regression

Procedia PDF Downloads 167
5587 Use of Regression Analysis in Determining the Length of Plastic Hinge in Reinforced Concrete Columns

Authors: Mehmet Alpaslan Köroğlu, Musa Hakan Arslan, Muslu Kazım Körez

Abstract:

Basic objective of this study is to create a regression analysis method that can estimate the length of a plastic hinge which is an important design parameter, by making use of the outcomes of (lateral load-lateral displacement hysteretic curves) the experimental studies conducted for the reinforced square concrete columns. For this aim, 170 different square reinforced concrete column tests results have been collected from the existing literature. The parameters which are thought affecting the plastic hinge length such as cross-section properties, features of material used, axial loading level, confinement of the column, longitudinal reinforcement bars in the columns etc. have been obtained from these 170 different square reinforced concrete column tests. In the study, when determining the length of plastic hinge, using the experimental test results, a regression analysis have been separately tested and compared with each other. In addition, the outcome of mentioned methods on determination of plastic hinge length of the reinforced concrete columns has been compared to other methods available in the literature.

Keywords: columns, plastic hinge length, regression analysis, reinforced concrete

Procedia PDF Downloads 445
5586 Analytical Authentication of Butter Using Fourier Transform Infrared Spectroscopy Coupled with Chemometrics

Authors: M. Bodner, M. Scampicchio

Abstract:

Fourier Transform Infrared (FT-IR) spectroscopy coupled with chemometrics was used to distinguish between butter samples and non-butter samples. Further, quantification of the content of margarine in adulterated butter samples was investigated. Fingerprinting region (1400-800 cm–1) was used to develop unsupervised pattern recognition (Principal Component Analysis, PCA), supervised modeling (Soft Independent Modelling by Class Analogy, SIMCA), classification (Partial Least Squares Discriminant Analysis, PLS-DA) and regression (Partial Least Squares Regression, PLS-R) models. PCA of the fingerprinting region shows a clustering of the two sample types. All samples were classified in their rightful class by SIMCA approach; however, nine adulterated samples (between 1% and 30% w/w of margarine) were classified as belonging both at the butter class and at the non-butter one. In the two-class PLS-DA model’s (R2 = 0.73, RMSEP, Root Mean Square Error of Prediction = 0.26% w/w) sensitivity was 71.4% and Positive Predictive Value (PPV) 100%. Its threshold was calculated at 7% w/w of margarine in adulterated butter samples. Finally, PLS-R model (R2 = 0.84, RMSEP = 16.54%) was developed. PLS-DA was a suitable classification tool and PLS-R a proper quantification approach. Results demonstrate that FT-IR spectroscopy combined with PLS-R can be used as a rapid, simple and safe method to identify pure butter samples from adulterated ones and to determine the grade of adulteration of margarine in butter samples.

Keywords: adulterated butter, margarine, PCA, PLS-DA, PLS-R, SIMCA

Procedia PDF Downloads 112
5585 Application of Regularized Spatio-Temporal Models to the Analysis of Remote Sensing Data

Authors: Salihah Alghamdi, Surajit Ray

Abstract:

Space-time data can be observed over irregularly shaped manifolds, which might have complex boundaries or interior gaps. Most of the existing methods do not consider the shape of the data, and as a result, it is difficult to model irregularly shaped data accommodating the complex domain. We used a method that can deal with space-time data that are distributed over non-planner shaped regions. The method is based on partial differential equations and finite element analysis. The model can be estimated using a penalized least squares approach with a regularization term that controls the over-fitting. The model is regularized using two roughness penalties, which consider the spatial and temporal regularities separately. The integrated square of the second derivative of the basis function is used as temporal penalty. While the spatial penalty consists of the integrated square of Laplace operator, which is integrated exclusively over the domain of interest that is determined using finite element technique. In this paper, we applied a spatio-temporal regression model with partial differential equations regularization (ST-PDE) approach to analyze a remote sensing data measuring the greenness of vegetation, measure by an index called enhanced vegetation index (EVI). The EVI data consist of measurements that take values between -1 and 1 reflecting the level of greenness of some region over a period of time. We applied (ST-PDE) approach to irregular shaped region of the EVI data. The approach efficiently accommodates the irregular shaped regions taking into account the complex boundaries rather than smoothing across the boundaries. Furthermore, the approach succeeds in capturing the temporal variation in the data.

Keywords: irregularly shaped domain, partial differential equations, finite element analysis, complex boundray

Procedia PDF Downloads 116
5584 Development of an Index for Asset Class in Ex-Ante Portfolio Management

Authors: Miang Hong Ngerng, Noor Diyana Jasme, May Jin Theong

Abstract:

Volatile market environment is inevitable. Fund managers are struggling to choose the right strategy to survive and overcome uncertainties and adverse market movement. Therefore, finding certainty in the mist of uncertainty future is one of the key performance objectives for fund managers. Current available theoretical results are not practical due to strong reliance on the investment assumption made. This paper is to identify the component that can be forecasted in Ex-ante setting which is the realistic situation facing a fund manager in the actual execution of asset allocation in portfolio management. Partial lease square method was used to generate an index with 10 years accounting data from 191 companies listed in KLSE. The result shows that the index reflects the inner nature of the business and up to 30% of the stock return can be explained by the index.

Keywords: active portfolio management, asset allocation ex-ante investment, asset class, partial lease square

Procedia PDF Downloads 245
5583 Design of Reconfigurable Fixed-Point LMS Adaptive FIR Filter

Authors: S. Padmapriya, V. Lakshmi Prabha

Abstract:

In this paper, an efficient reconfigurable fixed-point Least Mean Square Adaptive FIR filter is proposed. The proposed architecture has two methods of operation: one is area efficient design and the other is optimized power. Pipelining of the adder blocks and partial product generator are used to achieve low area and reversible logic is used to obtain low power design. Depending upon the input samples and filter coefficients, one of the techniques is chosen. Least-Mean-Square adaptation is performed to update the weights. The architecture is coded using Verilog and synthesized in cadence encounter 0.18μm technology. The synthesized results show that the area reduction ratio of the proposed when compared with conventional technique is about 1.2%.

Keywords: adaptive filter, carry select adder, least mean square algorithm, reversible logic

Procedia PDF Downloads 297
5582 Behind Fuzzy Regression Approach: An Exploration Study

Authors: Lavinia B. Dulla

Abstract:

The exploration study of the fuzzy regression approach attempts to present that fuzzy regression can be used as a possible alternative to classical regression. It likewise seeks to assess the differences and characteristics of simple linear regression and fuzzy regression using the width of prediction interval, mean absolute deviation, and variance of residuals. Based on the simple linear regression model, the fuzzy regression approach is worth considering as an alternative to simple linear regression when the sample size is between 10 and 20. As the sample size increases, the fuzzy regression approach is not applicable to use since the assumption regarding large sample size is already operating within the framework of simple linear regression. Nonetheless, it can be suggested for a practical alternative when decisions often have to be made on the basis of small data.

Keywords: fuzzy regression approach, minimum fuzziness criterion, interval regression, prediction interval

Procedia PDF Downloads 255
5581 Study on Bending Characteristics of Square Tube Using Energy Absorption Part

Authors: Shigeyuki Haruyama, Zefry Darmawan, Ken Kaminishi

Abstract:

In the square tube subjected to the bending load, the rigidity of the entire square tube is reduced when a collapse occurs due to local stress concentration. Therefore, in this research, the influence of bending load on the square tube with attached energy absorbing part was examined and reported. The analysis was conducted by using Finite Element Method (FEM) to produced bending deflection and buckling points. Energy absorption was compared from rigidity of attached part and square tube body. Buckling point was influenced by the rigidity of attached part and the thickness rate of square tube.

Keywords: energy absorber, square tube, bending, rigidity

Procedia PDF Downloads 219
5580 On the Relation between λ-Symmetries and μ-Symmetries of Partial Differential Equations

Authors: Teoman Ozer, Ozlem Orhan

Abstract:

This study deals with symmetry group properties and conservation laws of partial differential equations. We give a geometrical interpretation of notion of μ-prolongations of vector fields and of the related concept of μ-symmetry for partial differential equations. We show that these are in providing symmetry reduction of partial differential equations and systems and invariant solutions.

Keywords: λ-symmetry, μ-symmetry, classification, invariant solution

Procedia PDF Downloads 278
5579 The Role of Human Capital, Structural Capital, and Relation Capital towards Company Performance Using Partial Least Square

Authors: Novawiguna Kemalasari, Ahmad Badawi Saluy

Abstract:

Recent economic developments are more dependent on the value created by intangible assets than tangible company's assets. Intangible assets in question is intellectual capital that is recognized as the basis of individual, organizational, and general competition in the 21st century. The rapid global economy and technological innovations that have led to tough competition in the business world, make IC creation, management, measurement, and evaluation an important indicator in improving company performance that will affect the value of the company in the future. This study aims to determine the strong influence of intellectual capital on corporate performance, and how the influence of human capital on structural capital and relation capital. By distributing questionnaires to 100 employees of banking companies in Jakarta with middle and upper positions. Approach method used is Partial Least Square (PLS) Based on research that has been done, it can be concluded that human capital has influence on relation capital and structural capital. Similarly, the influence on the performance of the company turned out to human capital and relation capital has a significant influence, but structural capital has a non-significant effect on company performance.

Keywords: human capital, structural capital, relation capital, corporate performance

Procedia PDF Downloads 160
5578 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities

Authors: Retius Chifurira

Abstract:

Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.

Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities

Procedia PDF Downloads 158
5577 Analysis of Active Compounds in Thai Herbs by near Infrared Spectroscopy

Authors: Chaluntorn Vichasilp, Sutee Wangtueai

Abstract:

This study aims to develop a new method to detect active compounds in Thai herbs (1-deoxynojirimycin (DNJ) in mulberry leave, anthocyanin in Mao and curcumin in turmeric) using near infrared spectroscopy (NIRs). NIRs is non-destructive technique that rapid, non-chemical involved and low-cost determination. By NIRs and chemometrics technique, it was found that the DNJ prediction equation conducted with partial least square regression with cross-validation had low accuracy R2 (0.42) and SEP (31.87 mg/100g). On the other hand, the anthocyanin prediction equation showed moderate good results (R2 and SEP of 0.78 and 0.51 mg/g) with Multiplication scattering correction at wavelength of 2000-2200 nm. The high absorption could be observed at wavelength of 2047 nm and this model could be used as screening level. For curcumin prediction, the good result was obtained when applied original spectra with smoothing technique. The wavelength of 1400-2500 nm was created regression model with R2 (0.68) and SEP (0.17 mg/g). This model had high NIRs absorption at a wavelength of 1476, 1665, 1986 and 2395 nm, respectively. NIRs showed prospective technique for detection of some active compounds in Thai herbs.

Keywords: anthocyanin, curcumin, 1-deoxynojirimycin (DNJ), near infrared spectroscopy (NIRs)

Procedia PDF Downloads 349
5576 Arabic Character Recognition Using Regression Curves with the Expectation Maximization Algorithm

Authors: Abdullah A. AlShaher

Abstract:

In this paper, we demonstrate how regression curves can be used to recognize 2D non-rigid handwritten shapes. Each shape is represented by a set of non-overlapping uniformly distributed landmarks. The underlying models utilize 2nd order of polynomials to model shapes within a training set. To estimate the regression models, we need to extract the required coefficients which describe the variations for a set of shape class. Hence, a least square method is used to estimate such modes. We then proceed by training these coefficients using the apparatus Expectation Maximization algorithm. Recognition is carried out by finding the least error landmarks displacement with respect to the model curves. Handwritten isolated Arabic characters are used to evaluate our approach.

Keywords: character recognition, regression curves, handwritten Arabic letters, expectation maximization algorithm

Procedia PDF Downloads 113
5575 Survival and Hazard Maximum Likelihood Estimator with Covariate Based on Right Censored Data of Weibull Distribution

Authors: Al Omari Mohammed Ahmed

Abstract:

This paper focuses on Maximum Likelihood Estimator with Covariate. Covariates are incorporated into the Weibull model. Under this regression model with regards to maximum likelihood estimator, the parameters of the covariate, shape parameter, survival function and hazard rate of the Weibull regression distribution with right censored data are estimated. The mean square error (MSE) and absolute bias are used to compare the performance of Weibull regression distribution. For the simulation comparison, the study used various sample sizes and several specific values of the Weibull shape parameter.

Keywords: weibull regression distribution, maximum likelihood estimator, survival function, hazard rate, right censoring

Procedia PDF Downloads 411
5574 Solving SPDEs by Least Squares Method

Authors: Hassan Manouzi

Abstract:

We present in this paper a useful strategy to solve stochastic partial differential equations (SPDEs) involving stochastic coefficients. Using the Wick-product of higher order and the Wiener-Itˆo chaos expansion, the SPDEs is reformulated as a large system of deterministic partial differential equations. To reduce the computational complexity of this system, we shall use a decomposition-coordination method. To obtain the chaos coefficients in the corresponding deterministic equations, we use a least square formulation. Once this approximation is performed, the statistics of the numerical solution can be easily evaluated.

Keywords: least squares, wick product, SPDEs, finite element, wiener chaos expansion, gradient method

Procedia PDF Downloads 385
5573 Characteristics of Different Solar PV Modules under Partial Shading

Authors: Hla Hla Khaing, Yit Jian Liang, Nant Nyein Moe Htay, Jiang Fan

Abstract:

Partial shadowing is one of the problems that are always faced in terrestrial applications of solar photovoltaic (PV). The effects of partial shadow on the energy yield of conventional mono-crystalline and multi-crystalline PV modules have been researched for a long time. With deployment of new thin-film solar PV modules in the market, it is important to understand the performance of new PV modules operating under the partial shadow in the tropical zone. This paper addresses the impacts of different partial shadowing on the operating characteristics of four different types of solar PV modules that include multi-crystalline, amorphous thin-film, CdTe thin-film and CIGS thin-film PV modules.

Keywords: partial shade, CdTe, CIGS, multi-crystalline (mc-Si), amorphous silicon (a-Si), bypass diode

Procedia PDF Downloads 413
5572 A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures

Authors: Adriano Z. Zambom, Preethi Ravikumar

Abstract:

One of the biggest challenges in nonparametric regression is the curse of dimensionality. Additive models are known to overcome this problem by estimating only the individual additive effects of each covariate. However, if the model is misspecified, the accuracy of the estimator compared to the fully nonparametric one is unknown. In this work the efficiency of completely nonparametric regression estimators such as the Loess is compared to the estimators that assume additivity in several situations, including additive and non-additive regression scenarios. The comparison is done by computing the oracle mean square error of the estimators with regards to the true nonparametric regression function. Then, a backward elimination selection procedure based on the Akaike Information Criteria is proposed, which is computed from either the additive or the nonparametric model. Simulations show that if the additive model is misspecified, the percentage of time it fails to select important variables can be higher than that of the fully nonparametric approach. A dimension reduction step is included when nonparametric estimator cannot be computed due to the curse of dimensionality. Finally, the Boston housing dataset is analyzed using the proposed backward elimination procedure and the selected variables are identified.

Keywords: additive model, nonparametric regression, variable selection, Akaike Information Criteria

Procedia PDF Downloads 238
5571 Nadler's Fixed Point Theorem on Partial Metric Spaces and its Application to a Homotopy Result

Authors: Hemant Kumar Pathak

Abstract:

In 1994, Matthews (S.G. Matthews, Partial metric topology, in: Proc. 8th Summer Conference on General Topology and Applications, in: Ann. New York Acad. Sci., vol. 728, 1994, pp. 183-197) introduced the concept of a partial metric as a part of the study of denotational semantics of data flow networks. He gave a modified version of the Banach contraction principle, more suitable in this context. In fact, (complete) partial metric spaces constitute a suitable framework to model several distinguished examples of the theory of computation and also to model metric spaces via domain theory. In this paper, we introduce the concept of almost partial Hausdorff metric. We prove a fixed point theorem for multi-valued mappings on partial metric space using the concept of almost partial Hausdorff metric and prove an analogous to the well-known Nadler’s fixed point theorem. In the sequel, we derive a homotopy result as an application of our main result.

Keywords: fixed point, partial metric space, homotopy, physical sciences

Procedia PDF Downloads 406
5570 Non-Destructive Prediction System Using near Infrared Spectroscopy for Crude Palm Oil

Authors: Siti Nurhidayah Naqiah Abdull Rani, Herlina Abdul Rahim

Abstract:

Near infrared (NIR) spectroscopy has always been of great interest in the food and agriculture industries. The development of predictive models has facilitated the estimation process in recent years. In this research, 176 crude palm oil (CPO) samples acquired from Felda Johor Bulker Sdn Bhd were studied. A FOSS NIRSystem was used to tak e absorbance measurements from the sample. The wavelength range for the spectral measurement is taken at 1600nm to 1900nm. Partial Least Square Regression (PLSR) prediction model with 50 optimal number of principal components was implemented to study the relationship between the measured Free Fatty Acid (FFA) values and the measured spectral absorption. PLSR showed predictive ability of FFA values with correlative coefficient (R) of 0.9808 for the training set and 0.9684 for the testing set.

Keywords: palm oil, fatty acid, NIRS, PLSR

Procedia PDF Downloads 179
5569 Optimization of Machine Learning Regression Results: An Application on Health Expenditures

Authors: Songul Cinaroglu

Abstract:

Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.

Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure

Procedia PDF Downloads 187
5568 Quantitative Structure-Activity Relationship Study of Some Quinoline Derivatives as Antimalarial Agents

Authors: M. Ouassaf, S. Belaid

Abstract:

A series of quinoline derivatives with antimalarial activity were subjected to two-dimensional quantitative structure-activity relationship (2D-QSAR) studies. Three models were implemented using multiple regression linear MLR, a regression partial least squares (PLS), nonlinear regression (MNLR), to see which descriptors are closely related to the activity biologic. We relied on a principal component analysis (PCA). Based on our results, a comparison of the quality of, MLR, PLS, and MNLR models shows that the MNLR (R = 0.914 and R² = 0.835, RCV= 0.853) models have substantially better predictive capability because the MNLR approach gives better results than MLR (R = 0.835 and R² = 0,752, RCV=0.601)), PLS (R = 0.742 and R² = 0.552, RCV=0.550) The model of MNLR gave statistically significant results and showed good stability to data variation in leave-one-out cross-validation. The obtained results suggested that our proposed model MNLR may be useful to predict the biological activity of derivatives of quinoline.

Keywords: antimalarial, quinoline, QSAR, PCA, MLR , MNLR, MLR

Procedia PDF Downloads 121
5567 Examining Bulling Rates among Youth with Intellectual Disabilities

Authors: Kaycee L. Bills

Abstract:

Adolescents and youth who are members of a minority group are more likely to experience higher rates of bullying in comparison to other student demographics. Specifically, adolescents with intellectual disabilities are a minority population that is more susceptible to experience unfair treatment in social settings. This study employs the 2015 Wave of the National Crime Victimization Survey – School Crime Supplement (NCVS/SCS) longitudinal dataset to explore bullying rates experienced among adolescents with intellectual disabilities. This study uses chi-square testing and a logistic regression to analyze if having a disability influences the likelihood of being bullied in comparison to other student demographics. Results of the chi-square testing and the logistic regression indicate that adolescent students who were identified as having a disability were approximately four times more likely to experience higher bullying rates in comparison to all other majority and minority student populations. Thus, it means having a disability resulted in higher bullying rates in comparison to all student groups.

Keywords: disability, bullying, social work, school bullying

Procedia PDF Downloads 107
5566 Study on Optimal Control Strategy of PM2.5 in Wuhan, China

Authors: Qiuling Xie, Shanliang Zhu, Zongdi Sun

Abstract:

In this paper, we analyzed the correlation relationship among PM2.5 from other five Air Quality Indices (AQIs) based on the grey relational degree, and built a multivariate nonlinear regression equation model of PM2.5 and the five monitoring indexes. For the optimal control problem of PM2.5, we took the partial large Cauchy distribution of membership equation as satisfaction function. We established a nonlinear programming model with the goal of maximum performance to price ratio. And the optimal control scheme is given.

Keywords: grey relational degree, multiple linear regression, membership function, nonlinear programming

Procedia PDF Downloads 260
5565 A Comparison of Smoothing Spline Method and Penalized Spline Regression Method Based on Nonparametric Regression Model

Authors: Autcha Araveeporn

Abstract:

This paper presents a study about a nonparametric regression model consisting of a smoothing spline method and a penalized spline regression method. We also compare the techniques used for estimation and prediction of nonparametric regression model. We tried both methods with crude oil prices in dollars per barrel and the Stock Exchange of Thailand (SET) index. According to the results, it is concluded that smoothing spline method performs better than that of penalized spline regression method.

Keywords: nonparametric regression model, penalized spline regression method, smoothing spline method, Stock Exchange of Thailand (SET)

Procedia PDF Downloads 394
5564 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values

Authors: Daniel Fundi Murithi

Abstract:

Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.

Keywords: finite population total, missing data, model-based imputation, two-phase sampling

Procedia PDF Downloads 102
5563 Assessment of Solid Insulating Material Using Partial Discharge Characteristics

Authors: Qasim Khan, Furkan Ahmad, Asfar A. Khan, M. Saad Alam, Faiz Ahmad

Abstract:

In this paper, partial discharge analysis is performed in cavities artificially created in insulation. The setup is according with Cigre-II Method. Circular Samples created from Perspex Sheet with different configuration with changing number of cavities. Assessment of insulation health can be performed by Partial Discharge measurement as this has been found to be important means of condition monitoring. The experiments are done using MPD 540, which is a modern partial discharge measurement system. By analyzing the PD activity obtained for various voids/cavities, it is observed that the PD voltages show variation for cavity’s diameter, depth even for its ratios. This can be employed for scrutiny of insulation system.

Keywords: partial discharges, condition monitoring, insulation defects, degradation and corrosion, PMMA

Procedia PDF Downloads 479
5562 Tax Competition and Partial Tax Coordination under Fiscal Decentralization

Authors: Patricia Sanz-Cordoba, Bernd Theilen

Abstract:

This article analyzes the conditions where decentralization and partial tax harmonization in a coalition of asymmetric jurisdictions plays a role in the fight of fiscal competition (i.e. the race to bottom). Starting from a centralized economies, we use the ZM-W model to analyze the fiscal competition and coordination among three countries. We find that the asymmetry of jurisdictions facilitates partial tax harmonization between jurisdictions when these asymmetries are not too large. Furthermore, when the asymmetries are large enough, the level of labor tax plays an important role in the decision of decentralize capital tax. Accordingly, decentralization is achievable when labor tax is low. This result indicates that decentralization and partial tax harmonization between jurisdictions can be possible results in order to fight the negative externalities from fiscal competition, and more in the European Union countries where the asymmetries are substantial.

Keywords: centralization, decentralization, fiscal competition, partial tax harmonization

Procedia PDF Downloads 218