Search results for: regression trees
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 934

Search results for: regression trees

814 Comparative Studies of Support Vector Regression between Reproducing Kernel and Gaussian Kernel

Authors: Wei Zhang, Su-Yan Tang, Yi-Fan Zhu, Wei-Ping Wang

Abstract:

Support vector regression (SVR) has been regarded as a state-of-the-art method for approximation and regression. The importance of kernel function, which is so-called admissible support vector kernel (SV kernel) in SVR, has motivated many studies on its composition. The Gaussian kernel (RBF) is regarded as a “best" choice of SV kernel used by non-expert in SVR, whereas there is no evidence, except for its superior performance on some practical applications, to prove the statement. Its well-known that reproducing kernel (R.K) is also a SV kernel which possesses many important properties, e.g. positive definiteness, reproducing property and composing complex R.K by simpler ones. However, there are a limited number of R.Ks with explicit forms and consequently few quantitative comparison studies in practice. In this paper, two R.Ks, i.e. SV kernels, composed by the sum and product of a translation invariant kernel in a Sobolev space are proposed. An exploratory study on the performance of SVR based general R.K is presented through a systematic comparison to that of RBF using multiple criteria and synthetic problems. The results show that the R.K is an equivalent or even better SV kernel than RBF for the problems with more input variables (more than 5, especially more than 10) and higher nonlinearity.

Keywords: admissible support vector kernel, reproducing kernel, reproducing kernel Hilbert space, support vector regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1596
813 Developing Pedotransfer Functions for Estimating Some Soil Properties using Artificial Neural Network and Multivariate Regression Approaches

Authors: Fereydoon Sarmadian, Ali Keshavarzi

Abstract:

Study of soil properties like field capacity (F.C.) and permanent wilting point (P.W.P.) play important roles in study of soil moisture retention curve. Although these parameters can be measured directly, their measurement is difficult and expensive. Pedotransfer functions (PTFs) provide an alternative by estimating soil parameters from more readily available soil data. In this investigation, 70 soil samples were collected from different horizons of 15 soil profiles located in the Ziaran region, Qazvin province, Iran. The data set was divided into two subsets for calibration (80%) and testing (20%) of the models and their normality were tested by Kolmogorov-Smirnov method. Both multivariate regression and artificial neural network (ANN) techniques were employed to develop the appropriate PTFs for predicting soil parameters using easily measurable characteristics of clay, silt, O.C, S.P, B.D and CaCO3. The performance of the multivariate regression and ANN models was evaluated using an independent test data set. In order to evaluate the models, root mean square error (RMSE) and R2 were used. The comparison of RSME for two mentioned models showed that the ANN model gives better estimates of F.C and P.W.P than the multivariate regression model. The value of RMSE and R2 derived by ANN model for F.C and P.W.P were (2.35, 0.77) and (2.83, 0.72), respectively. The corresponding values for multivariate regression model were (4.46, 0.68) and (5.21, 0.64), respectively. Results showed that ANN with five neurons in hidden layer had better performance in predicting soil properties than multivariate regression.

Keywords: Artificial neural network, Field capacity, Permanentwilting point, Pedotransfer functions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1821
812 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

Keywords: Clustering, Data analysis, Data mining, Predictive models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1951
811 Determination of Water Pollution and Water Quality with Decision Trees

Authors: Çiğdem Bakır, Mecit Yüzkat

Abstract:

With the increasing emphasis on water quality worldwide, the search for and expanding the market for new and intelligent monitoring systems has increased. The current method is the laboratory process, where samples are taken from bodies of water, and tests are carried out in laboratories. This method is time-consuming, a waste of manpower and uneconomical. To solve this problem, we used machine learning methods to detect water pollution in our study. We created decision trees with the Orange3 software used in the study and tried to determine all the factors that cause water pollution. An automatic prediction model based on water quality was developed by taking many model inputs such as water temperature, pH, transparency, conductivity, dissolved oxygen, and ammonia nitrogen with machine learning methods. The proposed approach consists of three stages: Preprocessing of the data used, feature detection and classification. We tried to determine the success of our study with different accuracy metrics and the results were presented comparatively. In addition, we achieved approximately 98% success with the decision tree.

Keywords: Decision tree, water quality, water pollution, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 266
810 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski

Abstract:

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Keywords: Bootstrap, Edgeworth approximation, independent and Identical distributed, quantile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 446
809 Estimation of Functional Response Model by Supervised Functional Principal Component Analysis

Authors: Hyon I. Paek, Sang Rim Kim, Hyon A. Ryu

Abstract:

In functional linear regression, one typical problem is to reduce dimension. Compared with multivariate linear regression, functional linear regression is regarded as an infinite-dimensional case, and the main task is to reduce dimensions of functional response and functional predictors. One common approach is to adapt functional principal component analysis (FPCA) on functional predictors and then use a few leading functional principal components (FPC) to predict the functional model. The leading FPCs estimated by the typical FPCA explain a major variation of the functional predictor, but these leading FPCs may not be mostly correlated with the functional response, so they may not be significant in the prediction for response. In this paper, we propose a supervised FPCA method for a functional response model with FPCs obtained by considering the correlation of the functional response. Our method would have a better prediction accuracy than the typical FPCA method.

Keywords: Supervised, functional principal component analysis, functional response, functional linear regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27
808 PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria

Authors: Snezhana G. Gocheva-Ilieva, Maya P. Stoimenova

Abstract:

Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.

Keywords: Cross-validation, decision tree, lagged variables, short-term forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 740
807 Methods for Data Selection in Medical Databases: The Binary Logistic Regression -Relations with the Calculated Risks

Authors: Cristina G. Dascalu, Elena Mihaela Carausu, Daniela Manuc

Abstract:

The medical studies often require different methods for parameters selection, as a second step of processing, after the database-s designing and filling with information. One common task is the selection of fields that act as risk factors using wellknown methods, in order to find the most relevant risk factors and to establish a possible hierarchy between them. Different methods are available in this purpose, one of the most known being the binary logistic regression. We will present the mathematical principles of this method and a practical example of using it in the analysis of the influence of 10 different psychiatric diagnostics over 4 different types of offences (in a database made from 289 psychiatric patients involved in different types of offences). Finally, we will make some observations about the relation between the risk factors hierarchy established through binary logistic regression and the individual risks, as well as the results of Chi-squared test. We will show that the hierarchy built using the binary logistic regression doesn-t agree with the direct order of risk factors, even if it was naturally to assume this hypothesis as being always true.

Keywords: Databases, risk factors, binary logisticregression, hierarchy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1329
806 Second Order Admissibilities in Multi-parameter Logistic Regression Model

Authors: Chie Obayashi, Hidekazu Tanaka, Yoshiji Takagi

Abstract:

In multi-parameter family of distributions, conditions for a modified maximum likelihood estimator to be second order admissible are given. Applying these results to the multi-parameter logistic regression model, it is shown that the maximum likelihood estimator is always second order inadmissible. Also, conditions for the Berkson estimator to be second order admissible are given.

Keywords: Berkson estimator, modified maximum likelihood estimator, Multi-parameter logistic regression model, second order admissibility.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1616
805 Analyzing the Factors Influencing Exclusive Breastfeeding Using the Generalized Poisson Regression Model

Authors: Cheika Jahangeer, Naushad Mamode Khan, Maleika Heenaye-Mamode Khan

Abstract:

Exclusive breastfeeding is the feeding of a baby on no other milk apart from breast milk. Exclusive breastfeeding during the first 6 months of life is of fundamental importance because it supports optimal growth and development during infancy and reduces the risk of obliterating diseases and problems. Moreover, in developed countries, exclusive breastfeeding has decreased the incidence and/or severity of diarrhea, lower respiratory infection and urinary tract infection. In this paper, we study the factors that influence exclusive breastfeeding and use the Generalized Poisson regression model to analyze the practices of exclusive breastfeeding in Mauritius. We develop two sets of quasi-likelihood equations (QLE)to estimate the parameters.

Keywords: Exclusive breastfeeding, Regression model, Quasilikelihood.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1802
804 Speaker Independent Quranic Recognizer Basedon Maximum Likelihood Linear Regression

Authors: Ehab Mourtaga, Ahmad Sharieh, Mousa Abdallah

Abstract:

An automatic speech recognition system for the formal Arabic language is needed. The Quran is the most formal spoken book in Arabic, it is spoken all over the world. In this research, an automatic speech recognizer for Quranic based speakerindependent was developed and tested. The system was developed based on the tri-phone Hidden Markov Model and Maximum Likelihood Linear Regression (MLLR). The MLLR computes a set of transformations which reduces the mismatch between an initial model set and the adaptation data. It uses the regression class tree, as well as, estimates a set of linear transformations for the mean and variance parameters of a Gaussian mixture HMM system. The 30th Chapter of the Quran, with five of the most famous readers of the Quran, was used for the training and testing of the data. The chapter includes about 2000 distinct words. The advantages of using the Quranic verses as the database in this developed recognizer are the uniqueness of the words and the high level of orderliness between verses. The level of accuracy from the tested data ranged 68 to 85%.

Keywords: Hidden Markov Model (HMM), MaximumLikelihood Linear Regression (MLLR), Quran, Regression ClassTree, Speech Recognition, Speaker-independent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1915
803 Analyzing Data on Breastfeeding Using Dispersed Statistical Models

Authors: Naushad Mamode Khan, Cheika Jahangeer, Maleika Heenaye-Mamode Khan

Abstract:

Exclusive breastfeeding is the feeding of a baby on no other milk apart from breast milk. Exclusive breastfeeding during the first 6 months of life is very important as it supports optimal growth and development during infancy and reduces the risk of obliterating diseases and problems. Moreover, it helps to reduce the incidence and/or severity of diarrhea, lower respiratory infection and urinary tract infection. In this paper, we make a survey of the factors that influence exclusive breastfeeding and use two dispersed statistical models to analyze data. The models are the Generalized Poisson regression model and the Com-Poisson regression models.

Keywords: Exclusive breastfeeding, regression model, generalized poisson, com-poisson.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1564
802 Prediction of Post Underwater Shock Properties of Polymer - Clay/Silica Hybrid Nanocomposites through Regression Models

Authors: D. Lingaraju, K. Ramji, M. Pramiladevi, U. Rajyalakshmi

Abstract:

Exploding concentrated underwater charges to damage underwater structures such as ship hulls is a part of naval warfare strategies. Adding small amounts of foreign particles (like clay or silica) of nanosize significantly improves the engineering properties of the polymers. In the present work the clay in terms 1, 2 and 3 percent by weight was surface treated with a suitable silane agent. The hybrid nanocomposite was prepared by the hand lay-up technique. Mathematical regression models have been employed for theoretical prediction. This will result in considerable savings in terms of project time, effort and cost.

Keywords: ANOVA, clay, halloysite, nanocomposites, underwater shock, regression, silica.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2193
801 Multi-Linear Regression Based Prediction of Mass Transfer by Multiple Plunging Jets

Authors: S. Deswal, M. Pal

Abstract:

The paper aims to compare the performance of vertical and inclined multiple plunging jets and to model and predict their mass transfer capacity by multi-linear regression based approach. The multiple vertical plunging jets have jet impact angle of θ = 90O; whereas, multiple inclined plunging jets have jet impact angle of θ = 60O. The results of the study suggests that mass transfer is higher for multiple jets, and inclined multiple plunging jets have up to 1.6 times higher mass transfer than vertical multiple plunging jets under similar conditions. The derived relationship, based on multi-linear regression approach, has successfully predicted the volumetric mass transfer coefficient (KLa) from operational parameters of multiple plunging jets with a correlation coefficient of 0.973, root mean square error of 0.002 and coefficient of determination of 0.946. The results suggests that predicted overall mass transfer coefficient is in good agreement with actual experimental values; thereby, suggesting the utility of derived relationship based on multi-linear regression based approach and can be successfully employed in modeling mass transfer by multiple plunging jets.

Keywords: Mass transfer, multiple plunging jets, multi-linear regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2201
800 Comparison of Artificial Neural Network and Multivariate Regression Methods in Prediction of Soil Cation Exchange Capacity

Authors: Ali Keshavarzi, Fereydoon Sarmadian

Abstract:

Investigation of soil properties like Cation Exchange Capacity (CEC) plays important roles in study of environmental reaserches as the spatial and temporal variability of this property have been led to development of indirect methods in estimation of this soil characteristic. Pedotransfer functions (PTFs) provide an alternative by estimating soil parameters from more readily available soil data. 70 soil samples were collected from different horizons of 15 soil profiles located in the Ziaran region, Qazvin province, Iran. Then, multivariate regression and neural network model (feedforward back propagation network) were employed to develop a pedotransfer function for predicting soil parameter using easily measurable characteristics of clay and organic carbon. The performance of the multivariate regression and neural network model was evaluated using a test data set. In order to evaluate the models, root mean square error (RMSE) was used. The value of RMSE and R2 derived by ANN model for CEC were 0.47 and 0.94 respectively, while these parameters for multivariate regression model were 0.65 and 0.88 respectively. Results showed that artificial neural network with seven neurons in hidden layer had better performance in predicting soil cation exchange capacity than multivariate regression.

Keywords: Easily measurable characteristics, Feed-forwardback propagation, Pedotransfer functions, CEC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2213
799 Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race

Authors: Joonas Pääkkönen

Abstract:

In sports, individuals and teams are typically interested in final rankings. Final results, such as times or distances, dictate these rankings, also known as places. Places can be further associated with ordered random variables, commonly referred to as order statistics. In this work, we introduce a simple, yet accurate order statistical ordinal regression function that predicts relay race places with changeover-times. We call this function the Fenton-Wilkinson Order Statistics model. This model is built on the following educated assumption: individual leg-times follow log-normal distributions. Moreover, our key idea is to utilize Fenton-Wilkinson approximations of changeover-times alongside an estimator for the total number of teams as in the notorious German tank problem. This original place regression function is sigmoidal and thus correctly predicts the existence of a small number of elite teams that significantly outperform the rest of the teams. Our model also describes how place increases linearly with changeover-time at the inflection point of the log-normal distribution function. With real-world data from Jukola 2019, a massive orienteering relay race, the model is shown to be highly accurate even when the size of the training set is only 5% of the whole data set. Numerical results also show that our model exhibits smaller place prediction root-mean-square-errors than linear regression, mord regression and Gaussian process regression.

Keywords: Fenton-Wilkinson approximation, German tank problem, log-normal distribution, order statistics, ordinal regression, orienteering, sports analytics, sports modeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 839
798 Drainage Prediction for Dam using Fuzzy Support Vector Regression

Authors: S. Wiriyarattanakun, A. Ruengsiriwatanakun, S. Noimanee

Abstract:

The drainage Estimating is an important factor in dam management. In this paper, we use fuzzy support vector regression (FSVR) to predict the drainage of the Sirikrit Dam at Uttaradit province, Thailand. The results show that the FSVR is a suitable method in drainage estimating.

Keywords: Drainage Estimation, Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1273
797 On Estimating the Headcount Index by Using the Logistic Regression Estimator

Authors: Encarnación Álvarez, Rosa M. García-Fernández, Juan F. Muñoz, Francisco J. Blanco-Encomienda

Abstract:

The problem of estimating a proportion has important applications in the field of economics, and in general, in many areas such as social sciences. A common application in economics is the estimation of the headcount index. In this paper, we define the general headcount index as a proportion. Furthermore, we introduce a new quantitative method for estimating the headcount index. In particular, we suggest to use the logistic regression estimator for the problem of estimating the headcount index. Assuming a real data set, results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the traditional estimator of the headcount index.

Keywords: Poverty line, poor, risk of poverty, sample, Monte Carlo simulations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2100
796 Architectural Stratification and Woody Species Diversity of a Subtropical Forest Grown in a Limestone Habitat in Okinawa Island, Japan

Authors: S. M. Feroz, K. Yoshimura, A. Hagihara

Abstract:

The forest stand consisted of four layers. The species composition between the third and the bottom layers was almost similar, whereas it was almost exclusive between the top and the lower three layers. The values of Shannon-s index H' and Pielou-s index J ' tended to increase from the bottom layer upward, except for H' -value of the top layer. The values of H' and J ' were 4.21 bit and 0.73, respectively, for the total stand. High woody species diversity of the forest depended on large trees in the upper layers, which trend was different from a subtropical evergreen broadleaf forest grown in silicate habitat in the northern part of Okinawa Island. The spatial distribution of trees was overlapped between the third and the bottom layers, whereas it was independent or slightly exclusive between the top and the lower three layers. Mean tree weight of each layer decreased from the top toward the bottom layer, whereas the corresponding tree density increased from the top downward. This relationship was analogous to the process of self-thinning plant populations.

Keywords: Canopy multi-layering, limestone habitat, mean tree weight-density relationship, species diversity, subtropical forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1229
795 System Identification Based on Stepwise Regression for Dynamic Market Representation

Authors: Alexander Efremov

Abstract:

A system for market identification (SMI) is presented. The resulting representations are multivariable dynamic demand models. The market specifics are analyzed. Appropriate models and identification techniques are chosen. Multivariate static and dynamic models are used to represent the market behavior. The steps of the first stage of SMI, named data preprocessing, are mentioned. Next, the second stage, which is the model estimation, is considered in more details. Stepwise linear regression (SWR) is used to determine the significant cross-effects and the orders of the model polynomials. The estimates of the model parameters are obtained by a numerically stable estimator. Real market data is used to analyze SMI performance. The main conclusion is related to the applicability of multivariate dynamic models for representation of market systems.

Keywords: market identification, dynamic models, stepwise regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1620
794 Two New Relative Efficiencies of Linear Weighted Regression

Authors: Shuimiao Wan, Chao Yuan, Baoguang Tian

Abstract:

In statistics parameter theory, usually the parameter estimations have two kinds, one is the least-square estimation (LSE), and the other is the best linear unbiased estimation (BLUE). Due to the determining theorem of minimum variance unbiased estimator (MVUE), the parameter estimation of BLUE in linear model is most ideal. But since the calculations are complicated or the covariance is not given, people are hardly to get the solution. Therefore, people prefer to use LSE rather than BLUE. And this substitution will take some losses. To quantize the losses, many scholars have presented many kinds of different relative efficiencies in different views. For the linear weighted regression model, this paper discusses the relative efficiencies of LSE of β to BLUE of β. It also defines two new relative efficiencies and gives their lower bounds.

Keywords: Linear weighted regression, Relative efficiency, Lower bound, Parameter estimation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2120
793 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

Authors: Jihye Jeon

Abstract:

This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.

Keywords: Multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9254
792 Predicting Protein-Protein Interactions from Protein Sequences Using Phylogenetic Profiles

Authors: Omer Nebil Yaveroglu, Tolga Can

Abstract:

In this study, a high accuracy protein-protein interaction prediction method is developed. The importance of the proposed method is that it only uses sequence information of proteins while predicting interaction. The method extracts phylogenetic profiles of proteins by using their sequence information. Combining the phylogenetic profiles of two proteins by checking existence of homologs in different species and fitting this combined profile into a statistical model, it is possible to make predictions about the interaction status of two proteins. For this purpose, we apply a collection of pattern recognition techniques on the dataset of combined phylogenetic profiles of protein pairs. Support Vector Machines, Feature Extraction using ReliefF, Naive Bayes Classification, K-Nearest Neighborhood Classification, Decision Trees, and Random Forest Classification are the methods we applied for finding the classification method that best predicts the interaction status of protein pairs. Random Forest Classification outperformed all other methods with a prediction accuracy of 76.93%

Keywords: Protein Interaction Prediction, Phylogenetic Profile, SVM , ReliefF, Decision Trees, Random Forest Classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614
791 Ensembling Adaptively Constructed Polynomial Regression Models

Authors: Gints Jekabsons

Abstract:

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Keywords: Basis function construction, heuristic search, modelensembles, polynomial regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1674
790 Harmonics Elimination in Multilevel Inverter Using Linear Fuzzy Regression

Authors: A. K. Al-Othman, H. A. Al-Mekhaizim

Abstract:

Multilevel inverters supplied from equal and constant dc sources almost don-t exist in practical applications. The variation of the dc sources affects the values of the switching angles required for each specific harmonic profile, as well as increases the difficulty of the harmonic elimination-s equations. This paper presents an extremely fast optimal solution of harmonic elimination of multilevel inverters with non-equal dc sources using Tanaka's fuzzy linear regression formulation. A set of mathematical equations describing the general output waveform of the multilevel inverter with nonequal dc sources is formulated. Fuzzy linear regression is then employed to compute the optimal solution set of switching angles.

Keywords: Multilevel converters, harmonics, pulse widthmodulation (PWM), optimal control.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1798
789 Burning Rate Response of Solid Fuels in Laminar Boundary Layer

Authors: A. M. Tahsini

Abstract:

Solid fuel transient burning behavior under oxidizer gas flow is numerically investigated. It is done using analysis of the regression rate responses to the imposed sudden and oscillatory variation at inflow properties. The conjugate problem is considered by simultaneous solution of flow and solid phase governing equations to compute the fuel regression rate. The advection upstream splitting method is used as flow computational scheme in finite volume method. The ignition phase is completely simulated to obtain the exact initial condition for response analysis. The results show that the transient burning effects which lead to the combustion instabilities and intermittent extinctions could be observed in solid fuels as the solid propellants.

Keywords: Extinction, Oscillation, Regression rate, Response, Transient burning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2366
788 A Model for Test Case Selection in the Software-Development Life Cycle

Authors: Adtha Lawanna

Abstract:

Software maintenance is one of the essential processes of Software-Development Life Cycle. The main philosophies of retaining software concern the improvement of errors, the revision of codes, the inhibition of future errors, and the development in piece and capacity. While the adjustment has been employing, the software structure has to be retested to an upsurge a level of assurance that it will be prepared due to the requirements. According to this state, the test cases must be considered for challenging the revised modules and the whole software. A concept of resolving this problem is ongoing by regression test selection such as the retest-all selections, random/ad-hoc selection and the safe regression test selection. Particularly, the traditional techniques concern a mapping between the test cases in a test suite and the lines of code it executes. However, there are not only the lines of code as one of the requirements that can affect the size of test suite but including the number of functions and faulty versions. Therefore, a model for test case selection is developed to cover those three requirements by the integral technique which can produce the smaller size of the test cases when compared with the traditional regression selection techniques.

Keywords: Software maintenance, regression test selection, test case.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1698
787 A Model for Test Case Selection in the Software-Development Life Cycle

Authors: Adtha Lawanna

Abstract:

Software maintenance is one of the essential processes of Software-Development Life Cycle. The main philosophies of retaining software concern the improvement of errors, the revision of codes, the inhibition of future errors, and the development in piece and capacity. While the adjustment has been employing, the software structure has to be retested to an upsurge a level of assurance that it will be prepared due to the requirements. According to this state, the test cases must be considered for challenging the revised modules and the whole software. A concept of resolving this problem is ongoing by regression test selection such as the retest-all selections, random/ad-hoc selection and the safe regression test selection. Particularly, the traditional techniques concern a mapping between the test cases in a test suite and the lines of code it executes. However, there are not only the lines of code as one of the requirements that can affect the size of test suite but including the number of functions and faulty versions. Therefore, a model for test case selection is developed to cover those three requirements by the integral technique which can produce the smaller size of the test cases when compared with the traditional regression selection techniques.

Keywords: Software maintenance, regression test selection, test case.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1599
786 Mathematical Modeling to Predict Surface Roughness in CNC Milling

Authors: Ab. Rashid M.F.F., Gan S.Y., Muhammad N.Y.

Abstract:

Surface roughness (Ra) is one of the most important requirements in machining process. In order to obtain better surface roughness, the proper setting of cutting parameters is crucial before the process take place. This research presents the development of mathematical model for surface roughness prediction before milling process in order to evaluate the fitness of machining parameters; spindle speed, feed rate and depth of cut. 84 samples were run in this study by using FANUC CNC Milling α-Τ14ιE. Those samples were randomly divided into two data sets- the training sets (m=60) and testing sets(m=24). ANOVA analysis showed that at least one of the population regression coefficients was not zero. Multiple Regression Method was used to determine the correlation between a criterion variable and a combination of predictor variables. It was established that the surface roughness is most influenced by the feed rate. By using Multiple Regression Method equation, the average percentage deviation of the testing set was 9.8% and 9.7% for training data set. This showed that the statistical model could predict the surface roughness with about 90.2% accuracy of the testing data set and 90.3% accuracy of the training data set.

Keywords: Surface roughness, regression analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2133
785 An Enhanced Distributed System to improve theTime Complexity of Binary Indexed Trees

Authors: Ahmed M. Elhabashy, A. Baes Mohamed, Abou El Nasr Mohamad

Abstract:

Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.

Keywords: Binary Index Tree (BIT), Least Significant Bit (LSB), Parallel Adder (PA), Very High Speed Integrated Circuits HardwareDescription Language (VHDL), Distributed Parallel Computing System(DPCS).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1773