Search results for: step-wise linear regression
5716 Prediction of Coronary Artery Stenosis Severity Based on Machine Learning Algorithms
Authors: Yu-Jia Jian, Emily Chia-Yu Su, Hui-Ling Hsu, Jian-Jhih Chen
Abstract:
Coronary artery is the major supplier of myocardial blood flow. When fat and cholesterol are deposit in the coronary arterial wall, narrowing and stenosis of the artery occurs, which may lead to myocardial ischemia and eventually infarction. According to the World Health Organization (WHO), estimated 740 million people have died of coronary heart disease in 2015. According to Statistics from Ministry of Health and Welfare in Taiwan, heart disease (except for hypertensive diseases) ranked the second among the top 10 causes of death from 2013 to 2016, and it still shows a growing trend. According to American Heart Association (AHA), the risk factors for coronary heart disease including: age (> 65 years), sex (men to women with 2:1 ratio), obesity, diabetes, hypertension, hyperlipidemia, smoking, family history, lack of exercise and more. We have collected a dataset of 421 patients from a hospital located in northern Taiwan who received coronary computed tomography (CT) angiography. There were 300 males (71.26%) and 121 females (28.74%), with age ranging from 24 to 92 years, and a mean age of 56.3 years. Prior to coronary CT angiography, basic data of the patients, including age, gender, obesity index (BMI), diastolic blood pressure, systolic blood pressure, diabetes, hypertension, hyperlipidemia, smoking, family history of coronary heart disease and exercise habits, were collected and used as input variables. The output variable of the prediction module is the degree of coronary artery stenosis. The output variable of the prediction module is the narrow constriction of the coronary artery. In this study, the dataset was randomly divided into 80% as training set and 20% as test set. Four machine learning algorithms, including logistic regression, stepwise regression, neural network and decision tree, were incorporated to generate prediction results. We used area under curve (AUC) / accuracy (Acc.) to compare the four models, the best model is neural network, followed by stepwise logistic regression, decision tree, and logistic regression, with 0.68 / 79 %, 0.68 / 74%, 0.65 / 78%, and 0.65 / 74%, respectively. Sensitivity of neural network was 27.3%, specificity was 90.8%, stepwise Logistic regression sensitivity was 18.2%, specificity was 92.3%, decision tree sensitivity was 13.6%, specificity was 100%, logistic regression sensitivity was 27.3%, specificity 89.2%. From the result of this study, we hope to improve the accuracy by improving the module parameters or other methods in the future and we hope to solve the problem of low sensitivity by adjusting the imbalanced proportion of positive and negative data.Keywords: decision support, computed tomography, coronary artery, machine learning
Procedia PDF Downloads 2025715 Ketones Emission during Pad Printing Process
Authors: Kiurski S. Jelena, Aksentijević M. Snežana, Oros B. Ivana, Kecić S. Vesna, Djogo Z. Maja
Abstract:
The paper investigates the effect of light intensity on the formation of two ketones, acetone and methyl ethyl ketone, in working premises of five pad printing departments in Novi Sad, Serbia. Multiple linear regression analysis examined the form of interdependency concentrations of methyl ethyl ketone, acetone and light intensity in five printing presses at seven sampling points, using Statistica software package version 10th. The results show an average stacking variation investigated variable and can be presented by the general regression model: y = b0 + b1xi1 + b2xi2.Keywords: acetone, methyl ethyl ketone, multiple linear regression analysis, pad printing
Procedia PDF Downloads 3885714 Blood Pressure and Anthropometric Measurements: A Correlational Study
Authors: Abdul-Monim Batiha, Manar AlAzzam, Mohammed ALBashtawy, Loai Tawalbeh, Ahmad Tubaishat, Fadwa N. Alhalaiqa
Abstract:
Background: Obesity is the major modifiable risk factor for many chronic illnesses especially high blood pressure. Objectives: To evaluate the relationship between anthropometric indices and high blood pressure, and which one was most strongly correlated with high blood pressure in Jordanian population. Methods: A cross-sectional study was conducted with a total 622 students and workers from three Jordanian universities. Results: Nearly half of the participant are overweight (34.7%) and obese (15.4%) and hypertension was detected among 138 (22.2%) of the participants. Linear correlation was significant (p<0.01) between both systolic blood pressure and diastolic blood pressure for all anthropometric indices, except for A body shape index and diastolic blood pressure was significant at p< 0.05. Stepwise multiple linear regression analysis was used to assess the influence of age and anthropometric measurements. Conclusions: The waist circumference was the only independent predictor of hypertension, showing that this simple measurement may be an importance marker of high blood pressure in Jordanian population.Keywords: anthropometric indices, Jordan, blood pressure, cross-sectional study, obesity, hypertension, waist circumference
Procedia PDF Downloads 2615713 Predicting Bridge Pier Scour Depth with SVM
Authors: Arun Goel
Abstract:
Prediction of maximum local scour is necessary for the safety and economical design of the bridges. A number of equations have been developed over the years to predict local scour depth using laboratory data and a few pier equations have also been proposed using field data. Most of these equations are empirical in nature as indicated by the past publications. In this paper, attempts have been made to compute local depth of scour around bridge pier in dimensional and non-dimensional form by using linear regression, simple regression and SVM (Poly and Rbf) techniques along with few conventional empirical equations. The outcome of this study suggests that the SVM (Poly and Rbf) based modeling can be employed as an alternate to linear regression, simple regression and the conventional empirical equations in predicting scour depth of bridge piers. The results of present study on the basis of non-dimensional form of bridge pier scour indicates the improvement in the performance of SVM (Poly and Rbf) in comparison to dimensional form of scour.Keywords: modeling, pier scour, regression, prediction, SVM (Poly and Rbf kernels)
Procedia PDF Downloads 4235712 The Predictors of Student Engagement: Instructional Support vs Emotional Support
Authors: Tahani Salman Alangari
Abstract:
Student success can be impacted by internal factors such as their emotional well-being and external factors such as organizational support and instructional support in the classroom. This study is to identify at least one factor that forecasts student engagement. It is a cross-sectional, conducted on 6206 teachers and encompassed three years of data collection and observations of math instruction in approximately 50 schools and 300 classrooms. A multiple linear regression revealed that a model predicting student engagement from emotional support, classroom organization, and instructional support was significant. Four linear regression models were tested using hierarchical regression to examine the effects of independent variables: emotional support was the highest predictor of student engagement while instructional support was the lowest.Keywords: student engagement, emotional support, organizational support, instructional support, well-being
Procedia PDF Downloads 505711 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins
Authors: Navab Karimi, Tohid Alizadeh
Abstract:
An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.
Procedia PDF Downloads 415710 A Statistical Approach to Predict and Classify the Commercial Hatchability of Chickens Using Extrinsic Parameters of Breeders and Eggs
Authors: M. S. Wickramarachchi, L. S. Nawarathna, C. M. B. Dematawewa
Abstract:
Hatchery performance is critical for the profitability of poultry breeder operations. Some extrinsic parameters of eggs and breeders cause to increase or decrease the hatchability. This study aims to identify the affecting extrinsic parameters on the commercial hatchability of local chicken's eggs and determine the most efficient classification model with a hatchability rate greater than 90%. In this study, seven extrinsic parameters were considered: egg weight, moisture loss, breeders age, number of fertilised eggs, shell width, shell length, and shell thickness. Multiple linear regression was performed to determine the most influencing variable on hatchability. First, the correlation between each parameter and hatchability were checked. Then a multiple regression model was developed, and the accuracy of the fitted model was evaluated. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF) algorithms were applied to classify the hatchability. This grouping process was conducted using binary classification techniques. Hatchability was negatively correlated with egg weight, breeders' age, shell width, shell length, and positive correlations were identified with moisture loss, number of fertilised eggs, and shell thickness. Multiple linear regression models were more accurate than single linear models regarding the highest coefficient of determination (R²) with 94% and minimum AIC and BIC values. According to the classification results, RF, CART, and kNN had performed the highest accuracy values 0.99, 0.975, and 0.972, respectively, for the commercial hatchery process. Therefore, the RF is the most appropriate machine learning algorithm for classifying the breeder outcomes, which are economically profitable or not, in a commercial hatchery.Keywords: classification models, egg weight, fertilised eggs, multiple linear regression
Procedia PDF Downloads 565709 Mapping of Urban Micro-Climate in Lyon (France) by Integrating Complementary Predictors at Different Scales into Multiple Linear Regression Models
Authors: Lucille Alonso, Florent Renard
Abstract:
The characterizations of urban heat island (UHI) and their interactions with climate change and urban climates are the main research and public health issue, due to the increasing urbanization of the population. These solutions require a better knowledge of the UHI and micro-climate in urban areas, by combining measurements and modelling. This study is part of this topic by evaluating microclimatic conditions in dense urban areas in the Lyon Metropolitan Area (France) using a combination of data traditionally used such as topography, but also from LiDAR (Light Detection And Ranging) data, Landsat 8 satellite observation and Sentinel and ground measurements by bike. These bicycle-dependent weather data collections are used to build the database of the variable to be modelled, the air temperature, over Lyon’s hyper-center. This study aims to model the air temperature, measured during 6 mobile campaigns in Lyon in clear weather, using multiple linear regressions based on 33 explanatory variables. They are of various categories such as meteorological parameters from remote sensing, topographic variables, vegetation indices, the presence of water, humidity, bare soil, buildings, radiation, urban morphology or proximity and density to various land uses (water surfaces, vegetation, bare soil, etc.). The acquisition sources are multiple and come from the Landsat 8 and Sentinel satellites, LiDAR points, and cartographic products downloaded from an open data platform in Greater Lyon. Regarding the presence of low, medium, and high vegetation, the presence of buildings and ground, several buffers close to these factors were tested (5, 10, 20, 25, 50, 100, 200 and 500m). The buffers with the best linear correlations with air temperature for ground are 5m around the measurement points, for low and medium vegetation, and for building 50m and for high vegetation is 100m. The explanatory model of the dependent variable is obtained by multiple linear regression of the remaining explanatory variables (Pearson correlation matrix with a |r| < 0.7 and VIF with < 5) by integrating a stepwise sorting algorithm. Moreover, holdout cross-validation is performed, due to its ability to detect over-fitting of multiple regression, although multiple regression provides internal validation and randomization (80% training, 20% testing). Multiple linear regression explained, on average, 72% of the variance for the study days, with an average RMSE of only 0.20°C. The impact on the model of surface temperature in the estimation of air temperature is the most important variable. Other variables are recurrent such as distance to subway stations, distance to water areas, NDVI, digital elevation model, sky view factor, average vegetation density, or building density. Changing urban morphology influences the city's thermal patterns. The thermal atmosphere in dense urban areas can only be analysed on a microscale to be able to consider the local impact of trees, streets, and buildings. There is currently no network of fixed weather stations sufficiently deployed in central Lyon and most major urban areas. Therefore, it is necessary to use mobile measurements, followed by modelling to characterize the city's multiple thermal environments.Keywords: air temperature, LIDAR, multiple linear regression, surface temperature, urban heat island
Procedia PDF Downloads 1065708 Detecting Earnings Management via Statistical and Neural Networks Techniques
Authors: Mohammad Namazi, Mohammad Sadeghzadeh Maharluie
Abstract:
Predicting earnings management is vital for the capital market participants, financial analysts and managers. The aim of this research is attempting to respond to this query: Is there a significant difference between the regression model and neural networks’ models in predicting earnings management, and which one leads to a superior prediction of it? In approaching this question, a Linear Regression (LR) model was compared with two neural networks including Multi-Layer Perceptron (MLP), and Generalized Regression Neural Network (GRNN). The population of this study includes 94 listed companies in Tehran Stock Exchange (TSE) market from 2003 to 2011. After the results of all models were acquired, ANOVA was exerted to test the hypotheses. In general, the summary of statistical results showed that the precision of GRNN did not exhibit a significant difference in comparison with MLP. In addition, the mean square error of the MLP and GRNN showed a significant difference with the multi variable LR model. These findings support the notion of nonlinear behavior of the earnings management. Therefore, it is more appropriate for capital market participants to analyze earnings management based upon neural networks techniques, and not to adopt linear regression models.Keywords: earnings management, generalized linear regression, neural networks multi-layer perceptron, Tehran stock exchange
Procedia PDF Downloads 3935707 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data
Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone
Abstract:
This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression
Procedia PDF Downloads 905706 Factors Related to Teachers’ Analysis of Classroom Assessments
Authors: Hussain A. Alkharusi, Said S. Aldhafri, Hilal Z. Alnabhani, Muna Alkalbani
Abstract:
Analysing classroom assessments is one of the responsibilities of the teacher. It aims improving teacher’s instruction and assessment as well as student learning. The present study investigated factors that might explain variation in teachers’ practices regarding analysis of classroom assessments. The factors considered in the investigation included gender, in-service assessment training, teaching load, teaching experience, knowledge in assessment, attitude towards quantitative aspects of assessment, and self-perceived competence in analysing assessments. Participants were 246 in-service teachers in Oman. Results of a stepwise multiple linear regression analysis revealed that self-perceived competence was the only significant factor explaining the variance in teachers’ analysis of assessments. Implications for research and practice are discussed.Keywords: analysis of assessment, classroom assessment, in-service teachers, self-competence
Procedia PDF Downloads 3005705 Modeling Aeration of Sharp Crested Weirs by Using Support Vector Machines
Authors: Arun Goel
Abstract:
The present paper attempts to investigate the prediction of air entrainment rate and aeration efficiency of a free over-fall jets issuing from a triangular sharp crested weir by using regression based modelling. The empirical equations, support vector machine (polynomial and radial basis function) models and the linear regression techniques were applied on the triangular sharp crested weirs relating the air entrainment rate and the aeration efficiency to the input parameters namely drop height, discharge, and vertex angle. It was observed that there exists a good agreement between the measured values and the values obtained using empirical equations, support vector machine (Polynomial and rbf) models, and the linear regression techniques. The test results demonstrated that the SVM based (Poly & rbf) model also provided acceptable prediction of the measured values with reasonable accuracy along with empirical equations and linear regression techniques in modelling the air entrainment rate and the aeration efficiency of a free over-fall jets issuing from triangular sharp crested weir. Further sensitivity analysis has also been performed to study the impact of input parameter on the output in terms of air entrainment rate and aeration efficiency.Keywords: air entrainment rate, dissolved oxygen, weir, SVM, regression
Procedia PDF Downloads 4035704 Supervised-Component-Based Generalised Linear Regression with Multiple Explanatory Blocks: THEME-SCGLR
Authors: Bry X., Trottier C., Mortier F., Cornu G., Verron T.
Abstract:
We address component-based regularization of a Multivariate Generalized Linear Model (MGLM). A set of random responses Y is assumed to depend, through a GLM, on a set X of explanatory variables, as well as on a set T of additional covariates. X is partitioned into R conceptually homogeneous blocks X1, ... , XR , viewed as explanatory themes. Variables in each Xr are assumed many and redundant. Thus, Generalised Linear Regression (GLR) demands regularization with respect to each Xr. By contrast, variables in T are assumed selected so as to demand no regularization. Regularization is performed searching each Xr for an appropriate number of orthogonal components that both contribute to model Y and capture relevant structural information in Xr. We propose a very general criterion to measure structural relevance (SR) of a component in a block, and show how to take SR into account within a Fisher-scoring-type algorithm in order to estimate the model. We show how to deal with mixed-type explanatory variables. The method, named THEME-SCGLR, is tested on simulated data.Keywords: Component-Model, Fisher Scoring Algorithm, GLM, PLS Regression, SCGLR, SEER, THEME
Procedia PDF Downloads 3715703 Liquid Chromatography Microfluidics for Detection and Quantification of Urine Albumin Using Linear Regression Method
Authors: Patricia B. Cruz, Catrina Jean G. Valenzuela, Analyn N. Yumang
Abstract:
Nearly a hundred per million of the Filipino population is diagnosed with Chronic Kidney Disease (CKD). The early stage of CKD has no symptoms and can only be discovered once the patient undergoes urinalysis. Over the years, different methods were discovered and used for the quantification of the urinary albumin such as the immunochemical assays where most of these methods require large machinery that has a high cost in maintenance and resources, and a dipstick test which is yet to be proven and is still debated as a reliable method in detecting early stages of microalbuminuria. This research study involves the use of the liquid chromatography concept in microfluidic instruments with biosensor as a means of separation and detection respectively, and linear regression to quantify human urinary albumin. The researchers’ main objective was to create a miniature system that quantifies and detect patients’ urinary albumin while reducing the amount of volume used per five test samples. For this study, 30 urine samples of unknown albumin concentrations were tested using VITROS Analyzer and the microfluidic system for comparison. Based on the data shared by both methods, the actual vs. predicted regression were able to create a positive linear relationship with an R2 of 0.9995 and a linear equation of y = 1.09x + 0.07, indicating that the predicted values and actual values are approximately equal. Furthermore, the microfluidic instrument uses 75% less in total volume – sample and reagents combined, compared to the VITROS Analyzer per five test samples.Keywords: Chronic Kidney Disease, Linear Regression, Microfluidics, Urinary Albumin
Procedia PDF Downloads 1085702 A DFT-Based QSARs Study of Kovats Retention Indices of Adamantane Derivatives
Authors: Z. Bayat
Abstract:
A quantitative structure–property relationship (QSPR) study was performed to develop models those relate the structures of 65 Kovats retention index (RI) of adamantane derivatives. Molecular descriptors derived solely from 3D structures of the molecular compounds. The usefulness of the quantum chemical descriptors, calculated at the level of the DFT theories using 6-311+G** basis set for QSAR study of adamantane derivatives was examined. The use of descriptors calculated only from molecular structure eliminates the need to experimental determination of properties for use in the correlation and allows for the estimation of RI for molecules not yet synthesized. The prediction results are in good agreement with the experimental value. A multi-parametric equation containing maximum Four descriptors at B3LYP/6-31+G** method with good statistical qualities (R2train=0.913, Ftrain=97.67, R2test=0.770, Ftest=3.21, Q2LOO=0.895, R2adj=0.904, Q2LGO=0.844) was obtained by Multiple Linear Regression using stepwise method.Keywords: DFT, adamantane, QSAR, Kovat
Procedia PDF Downloads 3385701 Modeling Standpipe Pressure Using Multivariable Regression Analysis by Combining Drilling Parameters and a Herschel-Bulkley Model
Authors: Seydou Sinde
Abstract:
The aims of this paper are to formulate mathematical expressions that can be used to estimate the standpipe pressure (SPP). The developed formulas take into account the main factors that, directly or indirectly, affect the behavior of SPP values. Fluid rheology and well hydraulics are some of these essential factors. Mud Plastic viscosity, yield point, flow power, consistency index, flow rate, drillstring, and annular geometries are represented by the frictional pressure (Pf), which is one of the input independent parameters and is calculated, in this paper, using Herschel-Bulkley rheological model. Other input independent parameters include the rate of penetration (ROP), applied load or weight on the bit (WOB), bit revolutions per minute (RPM), bit torque (TRQ), and hole inclination and direction coupled in the hole curvature or dogleg (DL). The technique of repeating parameters and Buckingham PI theorem are used to reduce the number of the input independent parameters into the dimensionless revolutions per minute (RPMd), the dimensionless torque (TRQd), and the dogleg, which is already in the dimensionless form of radians. Multivariable linear and polynomial regression technique using PTC Mathcad Prime 4.0 is used to analyze and determine the exact relationships between the dependent parameter, which is SPP, and the remaining three dimensionless groups. Three models proved sufficiently satisfactory to estimate the standpipe pressure: multivariable linear regression model 1 containing three regression coefficients for vertical wells; multivariable linear regression model 2 containing four regression coefficients for deviated wells; and multivariable polynomial quadratic regression model containing six regression coefficients for both vertical and deviated wells. Although that the linear regression model 2 (with four coefficients) is relatively more complex and contains an additional term over the linear regression model 1 (with three coefficients), the former did not really add significant improvements to the later except for some minor values. Thus, the effect of the hole curvature or dogleg is insignificant and can be omitted from the input independent parameters without significant losses of accuracy. The polynomial quadratic regression model is considered the most accurate model due to its relatively higher accuracy for most of the cases. Data of nine wells from the Middle East were used to run the developed models with satisfactory results provided by all of them, even if the multivariable polynomial quadratic regression model gave the best and most accurate results. Development of these models is useful not only to monitor and predict, with accuracy, the values of SPP but also to early control and check for the integrity of the well hydraulics as well as to take the corrective actions should any unexpected problems appear, such as pipe washouts, jet plugging, excessive mud losses, fluid gains, kicks, etc.Keywords: standpipe, pressure, hydraulics, nondimensionalization, parameters, regression
Procedia PDF Downloads 565700 Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia
Authors: The Danh Phan
Abstract:
House price forecasting is a main topic in the real estate market research. Effective house price prediction models could not only allow home buyers and real estate agents to make better data-driven decisions but may also be beneficial for the property policymaking process. This study investigates the housing market by using machine learning techniques to analyze real historical house sale transactions in Australia. It seeks useful models which could be deployed as an application for house buyers and sellers. Data analytics show a high discrepancy between the house price in the most expensive suburbs and the most affordable suburbs in the city of Melbourne. In addition, experiments demonstrate that the combination of Stepwise and Support Vector Machine (SVM), based on the Mean Squared Error (MSE) measurement, consistently outperforms other models in terms of prediction accuracy.Keywords: house price prediction, regression trees, neural network, support vector machine, stepwise
Procedia PDF Downloads 1855699 An Analysis of the Effect of Sharia Financing and Work Relation Founding towards Non-Performing Financing in Islamic Banks in Indonesia
Authors: Muhammad Bahrul Ilmi
Abstract:
The purpose of this research is to analyze the influence of Islamic financing and work relation founding simultaneously and partially towards non-performing financing in Islamic banks. This research was regression quantitative field research, and had been done in Muammalat Indonesia Bank and Islamic Danamon Bank in 3 months. The populations of this research were 15 account officers of Muammalat Indonesia Bank and Islamic Danamon Bank in Surakarta, Indonesia. The techniques of collecting data used in this research were documentation, questionnaire, literary study and interview. Regression analysis result shows that Islamic financing and work relation founding simultaneously has positive and significant effect towards non performing financing of two Islamic Banks. It is obtained with probability value 0.003 which is less than 0.05 and F value 9.584. The analysis result of Islamic financing regression towards non performing financing shows the significant effect. It is supported by double linear regression analysis with probability value 0.001 which is less than 0.05. The regression analysis of work relation founding effect towards non-performing financing shows insignificant effect. This is shown in the double linear regression analysis with probability value 0.161 which is bigger than 0.05.Keywords: Syariah financing, work relation founding, non-performing financing (NPF), Islamic Bank
Procedia PDF Downloads 4035698 Pattern Synthesis of Nonuniform Linear Arrays Including Mutual Coupling Effects Based on Gaussian Process Regression and Genetic Algorithm
Authors: Ming Su, Ziqiang Mu
Abstract:
This paper proposes a synthesis method for nonuniform linear antenna arrays that combine Gaussian process regression (GPR) and genetic algorithm (GA). In this method, the GPR model can be used to calculate the array radiation pattern in the presence of mutual coupling effects, and then the GA is used to optimize the excitations and locations of the elements so as to generate the desired radiation pattern. In this paper, taking a 9-element nonuniform linear array as an example and the desired radiation pattern corresponding to a Chebyshev distribution as the optimization objective, optimize the excitations and locations of the elements. Finally, the optimization results are verified by electromagnetic simulation software CST, which shows that the method is effective.Keywords: nonuniform linear antenna arrays, GPR, GA, mutual coupling effects, active element pattern
Procedia PDF Downloads 805697 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation
Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski
Abstract:
Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.Keywords: bootstrap, edgeworth approximation, IID, quantile
Procedia PDF Downloads 1265696 Chemometric QSRR Evaluation of Behavior of s-Triazine Pesticides in Liquid Chromatography
Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević
Abstract:
This study considers the selection of the most suitable in silico molecular descriptors that could be used for s-triazine pesticides characterization. Suitable descriptors among topological, geometrical and physicochemical are used for quantitative structure-retention relationships (QSRR) model establishment. Established models were obtained using linear regression (LR) and multiple linear regression (MLR) analysis. In this paper, MLR models were established avoiding multicollinearity among the selected molecular descriptors. Statistical quality of established models was evaluated by standard and cross-validation statistical parameters. For detection of similarity or dissimilarity among investigated s-triazine pesticides and their classification, principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used and gave similar grouping. This study is financially supported by COST action TD1305.Keywords: chemometrics, classification analysis, molecular descriptors, pesticides, regression analysis
Procedia PDF Downloads 3545695 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values
Authors: Daniel Fundi Murithi
Abstract:
Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.Keywords: finite population total, missing data, model-based imputation, two-phase sampling
Procedia PDF Downloads 1025694 Competition between Regression Technique and Statistical Learning Models for Predicting Credit Risk Management
Authors: Chokri Slim
Abstract:
The objective of this research is attempting to respond to this question: Is there a significant difference between the regression model and statistical learning models in predicting credit risk management? A Multiple Linear Regression (MLR) model was compared with neural networks including Multi-Layer Perceptron (MLP), and a Support vector regression (SVR). The population of this study includes 50 listed Banks in Tunis Stock Exchange (TSE) market from 2000 to 2016. Firstly, we show the factors that have significant effect on the quality of loan portfolios of banks in Tunisia. Secondly, it attempts to establish that the systematic use of objective techniques and methods designed to apprehend and assess risk when considering applications for granting credit, has a positive effect on the quality of loan portfolios of banks and their future collectability. Finally, we will try to show that the bank governance has an impact on the choice of methods and techniques for analyzing and measuring the risks inherent in the banking business, including the risk of non-repayment. The results of empirical tests confirm our claims.Keywords: credit risk management, multiple linear regression, principal components analysis, artificial neural networks, support vector machines
Procedia PDF Downloads 1205693 Regression Analysis of Travel Indicators and Public Transport Usage in Urban Areas
Authors: Mehdi Moeinaddini, Zohreh Asadi-Shekari, Muhammad Zaly Shah, Amran Hamzah
Abstract:
Currently, planners try to have more green travel options to decrease economic, social and environmental problems. Therefore, this study tries to find significant urban travel factors to be used to increase the usage of alternative urban travel modes. This paper attempts to identify the relationship between prominent urban mobility indicators and daily trips by public transport in 30 cities from various parts of the world. Different travel modes, infrastructures and cost indicators were evaluated in this research as mobility indicators. The results of multi-linear regression analysis indicate that there is a significant relationship between mobility indicators and the daily usage of public transport.Keywords: green travel modes, urban travel indicators, daily trips by public transport, multi-linear regression analysis
Procedia PDF Downloads 5195692 Count Regression Modelling on Number of Migrants in Households
Authors: Tsedeke Lambore Gemecho, Ayele Taye Goshu
Abstract:
The main objective of this study is to identify the determinants of the number of international migrants in a household and to compare regression models for count response. This study is done by collecting data from total of 2288 household heads of 16 randomly sampled districts in Hadiya and Kembata-Tembaro zones of Southern Ethiopia. The Poisson mixed models, as special cases of the generalized linear mixed model, is explored to determine effects of the predictors: age of household head, farm land size, and household size. Two ethnicities Hadiya and Kembata are included in the final model as dummy variables. Stepwise variable selection has indentified four predictors: age of head, farm land size, family size and dummy variable ethnic2 (0=other, 1=Kembata). These predictors are significant at 5% significance level with count response number of migrant. The Poisson mixed model consisting of the four predictors with random effects districts. Area specific random effects are significant with the variance of about 0.5105 and standard deviation of 0.7145. The results show that the number of migrant increases with heads age, family size, and farm land size. In conclusion, there is a significantly high number of international migration per household in the area. Age of household head, family size, and farm land size are determinants that increase the number of international migrant in households. Community-based intervention is needed so as to monitor and regulate the international migration for the benefits of the society.Keywords: Poisson regression, GLM, number of migrant, Hadiya and Kembata Tembaro zones
Procedia PDF Downloads 2605691 Multiple Linear Regression for Rapid Estimation of Subsurface Resistivity from Apparent Resistivity Measurements
Authors: Sabiu Bala Muhammad, Rosli Saad
Abstract:
Multiple linear regression (MLR) models for fast estimation of true subsurface resistivity from apparent resistivity field measurements are developed and assessed in this study. The parameters investigated were apparent resistivity (ρₐ), horizontal location (X) and depth (Z) of measurement as the independent variables; and true resistivity (ρₜ) as the dependent variable. To achieve linearity in both resistivity variables, datasets were first transformed into logarithmic domain following diagnostic checks of normality of the dependent variable and heteroscedasticity to ensure accurate models. Four MLR models were developed based on hierarchical combination of the independent variables. The generated MLR coefficients were applied to another data set to estimate ρₜ values for validation. Contours of the estimated ρₜ values were plotted and compared to the observed data plots at the colour scale and blanking for visual assessment. The accuracy of the models was assessed using coefficient of determination (R²), standard error (SE) and weighted mean absolute percentage error (wMAPE). It is concluded that the MLR models can estimate ρₜ for with high level of accuracy.Keywords: apparent resistivity, depth, horizontal location, multiple linear regression, true resistivity
Procedia PDF Downloads 2445690 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size
Authors: Jude Opara, Esemokumo Perewarebo Akpos
Abstract:
This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS
Procedia PDF Downloads 2755689 Measurement Errors and Misclassifications in Covariates in Logistic Regression: Bayesian Adjustment of Main and Interaction Effects and the Sample Size Implications
Authors: Shahadut Hossain
Abstract:
Measurement errors in continuous covariates and/or misclassifications in categorical covariates are common in epidemiological studies. Regression analysis ignoring such mismeasurements seriously biases the estimated main and interaction effects of covariates on the outcome of interest. Thus, adjustments for such mismeasurements are necessary. In this research, we propose a Bayesian parametric framework for eliminating deleterious impacts of covariate mismeasurements in logistic regression. The proposed adjustment method is unified and thus can be applied to any generalized linear and non-linear regression models. Furthermore, adjustment for covariate mismeasurements requires validation data usually in the form of either gold standard measurements or replicates of the mismeasured covariates on a subset of the study population. Initial investigation shows that adequacy of such adjustment depends on the sizes of main and validation samples, especially when prevalences of the categorical covariates are low. Thus, we investigate the impact of main and validation sample sizes on the adjusted estimates, and provide a general guideline about these sample sizes based on simulation studies.Keywords: measurement errors, misclassification, mismeasurement, validation sample, Bayesian adjustment
Procedia PDF Downloads 3825688 Statistical Model of Water Quality in Estero El Macho, Machala-El Oro
Authors: Rafael Zhindon Almeida
Abstract:
Surface water quality is an important concern for the evaluation and prediction of water quality conditions. The objective of this study is to develop a statistical model that can accurately predict the water quality of the El Macho estuary in the city of Machala, El Oro province. The methodology employed in this study is of a basic type that involves a thorough search for theoretical foundations to improve the understanding of statistical modeling for water quality analysis. The research design is correlational, using a multivariate statistical model involving multiple linear regression and principal component analysis. The results indicate that water quality parameters such as fecal coliforms, biochemical oxygen demand, chemical oxygen demand, iron and dissolved oxygen exceed the allowable limits. The water of the El Macho estuary is determined to be below the required water quality criteria. The multiple linear regression model, based on chemical oxygen demand and total dissolved solids, explains 99.9% of the variance of the dependent variable. In addition, principal component analysis shows that the model has an explanatory power of 86.242%. The study successfully developed a statistical model to evaluate the water quality of the El Macho estuary. The estuary did not meet the water quality criteria, with several parameters exceeding the allowable limits. The multiple linear regression model and principal component analysis provide valuable information on the relationship between the various water quality parameters. The findings of the study emphasize the need for immediate action to improve the water quality of the El Macho estuary to ensure the preservation and protection of this valuable natural resource.Keywords: statistical modeling, water quality, multiple linear regression, principal components, statistical models
Procedia PDF Downloads 465687 Agriculture Yield Prediction Using Predictive Analytic Techniques
Authors: Nagini Sabbineni, Rajini T. V. Kanth, B. V. Kiranmayee
Abstract:
India’s economy primarily depends on agriculture yield growth and their allied agro industry products. The agriculture yield prediction is the toughest task for agricultural departments across the globe. The agriculture yield depends on various factors. Particularly countries like India, majority of agriculture growth depends on rain water, which is highly unpredictable. Agriculture growth depends on different parameters, namely Water, Nitrogen, Weather, Soil characteristics, Crop rotation, Soil moisture, Surface temperature and Rain water etc. In our paper, lot of Explorative Data Analysis is done and various predictive models were designed. Further various regression models like Linear, Multiple Linear, Non-linear models are tested for the effective prediction or the forecast of the agriculture yield for various crops in Andhra Pradesh and Telangana states.Keywords: agriculture yield growth, agriculture yield prediction, explorative data analysis, predictive models, regression models
Procedia PDF Downloads 272