Search results for: stepwise regression analysis.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8996

Search results for: stepwise regression analysis.

8936 The Relative Efficiency of Parameter Estimation in Linear Weighted Regression

Authors: Baoguang Tian, Nan Chen

Abstract:

A new relative efficiency in linear model in reference is instructed into the linear weighted regression, and its upper and lower bound are proposed. In the linear weighted regression model, for the best linear unbiased estimation of mean matrix respect to the least-squares estimation, two new relative efficiencies are given, and their upper and lower bounds are also studied.

Keywords: Linear weighted regression, Relative efficiency, Mean matrix, Trace.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2458
8935 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data

Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone

Abstract:

This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease dataset, the study successfully identified key factors, and the results were consistent with previous studies.

Keywords: Lyme disease, Poisson generalized linear model, Ridge regression, Lasso Regression, elastic net regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 91
8934 A Study on a Research and Development Cost-Estimation Model in Korea

Authors: Babakina Alexandra, Yong Soo Kim

Abstract:

In this study, we analyzed the factors that affect research funds using linear regression analysis to increase the effectiveness of investments in national research projects. We collected 7,916 items of data on research projects that were in the process of being finished or were completed between 2010 and 2011. Data pre-processing and visualization were performed to derive statistically significant results. We identified factors that affected funding using analysis of fit distributions and estimated increasing or decreasing tendencies based on these factors.

Keywords: R&D funding, Cost estimation, Linear regression, Preliminary feasibility study.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2234
8933 Interrelationships between Physicochemical Water Pollution Indicators: A Case Study of River Pandu

Authors: Sunita Verma , Divya Tiwari, Ajay Verma

Abstract:

Water samples were collected from river Pandu at six stations where human and animal activities were high. Composite samples were analyzed for dissolved oxygen (DO), biochemical oxygen demand (BOD), chemical oxygen demand (COD) , pH values during dry and wet seasons as well as the harmattan period. The total data points were used to establish relationships between the parameters and data were also subjected to statistical analysis and expressed as mean ± standard error of mean (SEM) at a level of significance of p<0.05. Regression analysis was carried out to establish relationships if any between studied parameters and relationships in form of scatter plots were obtained between DO/BOD, COD/DO, BOD/COD, COD/pH, BOD/pH and DO/pH. The high to moderate correlation coefficient observed, R2 ranged from 0.68 to 0.15 between these parameters.

Keywords: BOD, DO, COD, pH, Regression analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2115
8932 A Study of Panel Logit Model and Adaptive Neuro-Fuzzy Inference System in the Prediction of Financial Distress Periods

Authors: Ε. Giovanis

Abstract:

The purpose of this paper is to present two different approaches of financial distress pre-warning models appropriate for risk supervisors, investors and policy makers. We examine a sample of the financial institutions and electronic companies of Taiwan Security Exchange (TSE) market from 2002 through 2008. We present a binary logistic regression with paned data analysis. With the pooled binary logistic regression we build a model including more variables in the regression than with random effects, while the in-sample and out-sample forecasting performance is higher in random effects estimation than in pooled regression. On the other hand we estimate an Adaptive Neuro-Fuzzy Inference System (ANFIS) with Gaussian and Generalized Bell (Gbell) functions and we find that ANFIS outperforms significant Logit regressions in both in-sample and out-of-sample periods, indicating that ANFIS is a more appropriate tool for financial risk managers and for the economic policy makers in central banks and national statistical services.

Keywords: ANFIS, Binary logistic regression, Financialdistress, Panel data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2332
8931 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1055
8930 Predictors of Academic Achievement of Student ICT Teachers with Different Learning Styles

Authors: Deniz Deryakulu, Şener Büyüköztürk Hüseyin Özçınar

Abstract:

The main purpose of this study was to determine the predictors of academic achievement of student Information and Communications Technologies (ICT) teachers with different learning styles. Participants were 148 student ICT teachers from Ankara University. Participants were asked to fill out a personal information sheet, the Turkish version of Kolb-s Learning Style Inventory, Weinstein-s Learning and Study Strategies Inventory, Schommer's Epistemological Beliefs Questionnaire, and Eysenck-s Personality Questionnaire. Stepwise regression analyses showed that the statistically significant predictors of the academic achievement of the accommodators were attitudes and high school GPAs; of the divergers was anxiety; of the convergers were gender, epistemological beliefs, and motivation; and of the assimilators were gender, personality, and test strategies. Implications for ICT teaching-learning processes and teacher education are discussed.

Keywords: Academic achievement, student ICT teachers, Kolb learning styles, experiential learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2584
8929 The Relationship between Iranian EFL Learners' Multiple Intelligences and Their Performance on Grammar Tests

Authors: Rose Shayeghi, Pejman Hosseinioun

Abstract:

The Multiple Intelligences theory characterizes human intelligence as a multifaceted entity that exists in all human beings with varying degrees. The most important contribution of this theory to the field of English Language Teaching (ELT) is its role in identifying individual differences and designing more learnercentered programs. The present study aims at investigating the relationship between different elements of multiple intelligence and grammar scores. To this end, 63 female Iranian EFL learner selected from among intermediate students participated in the study. The instruments employed were a Nelson English language test, Michigan Grammar Test, and Teele Inventory for Multiple Intelligences (TIMI). The results of Pearson Product-Moment Correlation revealed a significant positive correlation between grammatical accuracy and linguistic as well as interpersonal intelligence. The results of Stepwise Multiple Regression indicated that linguistic intelligence contributed to the prediction of grammatical accuracy.

Keywords: Multiple intelligence, grammar, ELT, EFL, TIMI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2400
8928 Enhancing Predictive Accuracy in Pharmaceutical Sales Through an Ensemble Kernel Gaussian Process Regression Approach

Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf

Abstract:

This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matérn, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matérn, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.

Keywords: Gaussian Process Regression, Ensemble Kernels, Bayesian Optimization, Pharmaceutical Sales Analysis, Time Series Forecasting, Data Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 80
8927 Extended Least Squares LS–SVM

Authors: József Valyon, Gábor Horváth

Abstract:

Among neural models the Support Vector Machine (SVM) solutions are attracting increasing attention, mostly because they eliminate certain crucial questions involved by neural network construction. The main drawback of standard SVM is its high computational complexity, therefore recently a new technique, the Least Squares SVM (LS–SVM) has been introduced. In this paper we present an extended view of the Least Squares Support Vector Regression (LS–SVR), which enables us to develop new formulations and algorithms to this regression technique. Based on manipulating the linear equation set -which embodies all information about the regression in the learning process- some new methods are introduced to simplify the formulations, speed up the calculations and/or provide better results.

Keywords: Function estimation, Least–Squares Support VectorMachines, Regression, System Modeling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1993
8926 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

Keywords: Clustering, Data analysis, Data mining, Predictive models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1938
8925 Numerical Simulation of a Pressure Regulated Valve to Find Out the Characteristics of Passive Control Circuit

Authors: Binod Kumar Saha

Abstract:

The objective of the present paper is a numerical analysis of the flow forces acting on spool surfaces of a pressure regulated valve. The transient, compressible and turbulent flow structures inside the valve are simulated using ANSYS FLUENT coupled with a special UDF. Here, valve inlet pressure is varied in a stepwise manner. For every value of inlet pressure, transient analysis leads to a quasi-static flow through the valve. Spool forces are calculated based on different pressures at inlet. From this information of spool forces, pressure characteristic of the passive control circuit has been derived.

Keywords: Pressure Regulating Valve, Spool Opening, Spool Movement, Force Balance, CFD.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3848
8924 Estimating Regression Parameters in Linear Regression Model with a Censored Response Variable

Authors: Jesus Orbe, Vicente Nunez-Anton

Abstract:

In this work we study the effect of several covariates X on a censored response variable T with unknown probability distribution. In this context, most of the studies in the literature can be located in two possible general classes of regression models: models that study the effect the covariates have on the hazard function; and models that study the effect the covariates have on the censored response variable. Proposals in this paper are in the second class of models and, more specifically, on least squares based model approach. Thus, using the bootstrap estimate of the bias, we try to improve the estimation of the regression parameters by reducing their bias, for small sample sizes. Simulation results presented in the paper show that, for reasonable sample sizes and censoring levels, the bias is always smaller for the new proposals.

Keywords: Censored response variable, regression, bias.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1461
8923 Adjusted Ratio and Regression Type Estimators for Estimation of Population Mean when some Observations are missing

Authors: Nuanpan Nangsue

Abstract:

Ratio and regression type estimators have been used by previous authors to estimate a population mean for the principal variable from samples in which both auxiliary x and principal y variable data are available. However, missing data are a common problem in statistical analyses with real data. Ratio and regression type estimators have also been used for imputing values of missing y data. In this paper, six new ratio and regression type estimators are proposed for imputing values for any missing y data and estimating a population mean for y from samples with missing x and/or y data. A simulation study has been conducted to compare the six ratio and regression type estimators with a previous estimator of Rueda. Two population sizes N = 1,000 and 5,000 have been considered with sample sizes of 10% and 30% and with correlation coefficients between population variables X and Y of 0.5 and 0.8. In the simulations, 10 and 40 percent of sample y values and 10 and 40 percent of sample x values were randomly designated as missing. The new ratio and regression type estimators give similar mean absolute percentage errors that are smaller than the Rueda estimator for all cases. The new estimators give a large reduction in errors for the case of 40% missing y values and sampling fraction of 30%.

Keywords: Auxiliary variable, missing data, ratio and regression type estimators.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1719
8922 Methods for Data Selection in Medical Databases: The Binary Logistic Regression -Relations with the Calculated Risks

Authors: Cristina G. Dascalu, Elena Mihaela Carausu, Daniela Manuc

Abstract:

The medical studies often require different methods for parameters selection, as a second step of processing, after the database-s designing and filling with information. One common task is the selection of fields that act as risk factors using wellknown methods, in order to find the most relevant risk factors and to establish a possible hierarchy between them. Different methods are available in this purpose, one of the most known being the binary logistic regression. We will present the mathematical principles of this method and a practical example of using it in the analysis of the influence of 10 different psychiatric diagnostics over 4 different types of offences (in a database made from 289 psychiatric patients involved in different types of offences). Finally, we will make some observations about the relation between the risk factors hierarchy established through binary logistic regression and the individual risks, as well as the results of Chi-squared test. We will show that the hierarchy built using the binary logistic regression doesn-t agree with the direct order of risk factors, even if it was naturally to assume this hypothesis as being always true.

Keywords: Databases, risk factors, binary logisticregression, hierarchy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1313
8921 Acute Coronary Syndrome Prediction Using Data Mining Techniques- An Application

Authors: Tahseen A. Jilani, Huda Yasin, Madiha Yasin, C. Ardil

Abstract:

In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or  absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.

Keywords: Acute coronary syndrome (ACS), binary logistic regression analyses, myocardial ischemia (MI), principle component analysis, unstable angina (U.A.).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2097
8920 Support Vector Regression for Retrieval of Soil Moisture Using Bistatic Scatterometer Data at X-Band

Authors: Dileep Kumar Gupta, Rajendra Prasad, Pradeep Kumar, Varun Narayan Mishra, Ajeet Kumar Vishwakarma, Prashant Kumar Srivastava

Abstract:

An approach was evaluated for the retrieval of soil moisture of bare soil surface using bistatic scatterometer data in the angular range of 200 to 700 at VV- and HH- polarization. The microwave data was acquired by specially designed X-band (10 GHz) bistatic scatterometer. The linear regression analysis was done between scattering coefficients and soil moisture content to select the suitable incidence angle for retrieval of soil moisture content. The 250 incidence angle was found more suitable. The support vector regression analysis was used to approximate the function described by the input output relationship between the scattering coefficient and corresponding measured values of the soil moisture content. The performance of support vector regression algorithm was evaluated by comparing the observed and the estimated soil moisture content by statistical performance indices %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE). The values of %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE) were found 2.9451, 1.0986 and 0.9214 respectively at HHpolarization. At VV- polarization, the values of %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE) were found 3.6186, 0.9373 and 0.9428 respectively.

Keywords: Bistatic scatterometer, soil moisture, support vector regression, RMSE, %Bias, NSE.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3210
8919 Factors Determining Intention to Pursue Genetic Testing for People in Taiwan

Authors: Ju-Chun Chien

Abstract:

The Ottawa Charter for Health Promotion proposed that the role of health services should shift the focus from cure to prevention. Nowadays, besides having physical examinations, people could also conduct genetic tests to provide important information for diagnosing, treating, and/or preventing illnesses. However, because of the incompletion of the Chinese Genetic Database, people in Taiwan were still unfamiliar with genetic testing. The purposes of the present study were to: (1) Figure out people’s attitudes towards genetic testing. (2) Examine factors that influence people’s intention to pursue genetic testing by means of the Health Belief Model (HBM). A pilot study was conducted on 249 Taiwanese in 2017 to test the feasibility of the self-developed instrument. The reliability and construct validity of scores on the self-developed questionnaire revealed that this HBM-based questionnaire with 40 items was a well-developed instrument. A total of 542 participants were recruited and the valid participants were 535 (99%) between the ages of 20 and 86. Descriptive statistics, one-way ANOVA, two-way contingency table analysis, Pearson’s correlation, and stepwise multiple regression analysis were used in this study. The main results were that only 32 participants (6%) had already undergone genetic testing; moreover, their attitude towards genetic testing was more positive than those who did not have the experience. Compared with people who never underwent genetic tests, those who had gone for genetic testing had higher self-efficacy, greater intention to pursue genetic testing, had academic majors in health-related fields, had chronic and genetic diseases, possessed Catastrophic Illness Cards, and all of them had heard about genetic testing. The variables that best predicted people’s intention to pursue genetic testing were cues to action, self-efficacy, and perceived benefits (the three variables all correlated with one another positively at high magnitudes). To sum up, the HBM could be effective in designing and identifying the needs and priorities of the target population to pursue genetic testing.

Keywords: Genetic testing, intention to pursue genetic testing, Taiwan, health belief model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 689
8918 Effects of Hidden Unit Sizes and Autoregressive Features in Mental Task Classification

Authors: Ramaswamy Palaniappan, Nai-Jen Huan

Abstract:

Classification of electroencephalogram (EEG) signals extracted during mental tasks is a technique that is actively pursued for Brain Computer Interfaces (BCI) designs. In this paper, we compared the classification performances of univariateautoregressive (AR) and multivariate autoregressive (MAR) models for representing EEG signals that were extracted during different mental tasks. Multilayer Perceptron (MLP) neural network (NN) trained by the backpropagation (BP) algorithm was used to classify these features into the different categories representing the mental tasks. Classification performances were also compared across different mental task combinations and 2 sets of hidden units (HU): 2 to 10 HU in steps of 2 and 20 to 100 HU in steps of 20. Five different mental tasks from 4 subjects were used in the experimental study and combinations of 2 different mental tasks were studied for each subject. Three different feature extraction methods with 6th order were used to extract features from these EEG signals: AR coefficients computed with Burg-s algorithm (ARBG), AR coefficients computed with stepwise least square algorithm (ARLS) and MAR coefficients computed with stepwise least square algorithm. The best results were obtained with 20 to 100 HU using ARBG. It is concluded that i) it is important to choose the suitable mental tasks for different individuals for a successful BCI design, ii) higher HU are more suitable and iii) ARBG is the most suitable feature extraction method.

Keywords: Autoregressive, Brain-Computer Interface, Electroencephalogram, Neural Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1793
8917 The Discriminate Analysis and Relevant Model for Mapping Export Potential

Authors: Jana Gutierrez Chvalkovská, Michal Mejstřík, Matěj Urban

Abstract:

There are pending discussions over the mapping of country export potential in order to refocus export strategy of firms and its evidence-based promotion by the Export Credit Agencies (ECAs) and other permitted vehicles of governments. In this paper we develop our version of an applied model that offers “stepwise” elimination of unattractive markets. We modify and calibrate the model for the particular features of the Czech Republic and specific pilot cases where we apply an individual approach to each sector.

Keywords: Export strategy, Modeling export, Calibration, Export promotion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2411
8916 Reductions of Control Flow Graphs

Authors: Robert Gold

Abstract:

Control flow graphs are a well-known representation of the sequential control flow structure of programs with a multitude of applications. Not only single functions but also sets of functions or complete programs can be modeled by control flow graphs. In this case the size of the graphs can grow considerably and thus makes it difficult for software engineers to analyze the control flow. Graph reductions are helpful in this situation. In this paper we define reductions to subsets of nodes. Since executions of programs are represented by paths through the control flow graphs, paths should be preserved. Furthermore, the composition of reductions makes a stepwise analysis approach possible.

Keywords: Control flow graph, graph reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3482
8915 Level of Concentration in Banking Markets and Length of EU Membership

Authors: Ivan Pavic, Fran Galetic, Tomislava Pavic Kramaric

Abstract:

The purpose of this article is to analyze the degree of concentration in the banking market in EU member states as well as to determine the impact of the length of EU membership on the degree of concentration. In that sense several analysis were conducted, specifically, panel analysis, calculation of correlation coefficient and regression analysis of the impact of the length of EU membership on the degree of concentration. Panel analysis was conducted to determine whether there is a similar trend of concentration in three groups of countries - countries with a low, moderate and high level of concentration. The conducted panel analysis showed that in EU countries with a moderate level of concentration, the level of concentration decreases. The calculation of correlation showed that, to some extent, with other influential factors, the length of EU membership negatively affects the market concentration of the banking market. Using the regression analysis for investigation of the influence of the length of EU membership on the level of concentration in the banking sector in a particular country, the results reveal that there is a negative effect of the length in EU membership on market concentration, although it is not significantly influential variable.

Keywords: Banking sector, concentration, EU

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1851
8914 Estimating Bridge Deterioration for Small Data Sets Using Regression and Markov Models

Authors: Yina F. Muñoz, Alexander Paz, Hanns De La Fuente-Mella, Joaquin V. Fariña, Guilherme M. Sales

Abstract:

The primary approach for estimating bridge deterioration uses Markov-chain models and regression analysis. Traditional Markov models have problems in estimating the required transition probabilities when a small sample size is used. Often, reliable bridge data have not been taken over large periods, thus large data sets may not be available. This study presents an important change to the traditional approach by using the Small Data Method to estimate transition probabilities. The results illustrate that the Small Data Method and traditional approach both provide similar estimates; however, the former method provides results that are more conservative. That is, Small Data Method provided slightly lower than expected bridge condition ratings compared with the traditional approach. Considering that bridges are critical infrastructures, the Small Data Method, which uses more information and provides more conservative estimates, may be more appropriate when the available sample size is small. In addition, regression analysis was used to calculate bridge deterioration. Condition ratings were determined for bridge groups, and the best regression model was selected for each group. The results obtained were very similar to those obtained when using Markov chains; however, it is desirable to use more data for better results.

Keywords: Concrete bridges, deterioration, Markov chains, probability matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1424
8913 Data Mining Classification Methods Applied in Drug Design

Authors: Mária Stachová, Lukáš Sobíšek

Abstract:

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

Keywords: data mining, classification, drug design, QSAR

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2836
8912 Quality of Service Evaluation using a Combination of Fuzzy C-Means and Regression Model

Authors: Aboagela Dogman, Reza Saatchi, Samir Al-Khayatt

Abstract:

In this study, a network quality of service (QoS) evaluation system was proposed. The system used a combination of fuzzy C-means (FCM) and regression model to analyse and assess the QoS in a simulated network. Network QoS parameters of multimedia applications were intelligently analysed by FCM clustering algorithm. The QoS parameters for each FCM cluster centre were then inputted to a regression model in order to quantify the overall QoS. The proposed QoS evaluation system provided valuable information about the network-s QoS patterns and based on this information, the overall network-s QoS was effectively quantified.

Keywords: Fuzzy C-means; regression model, network quality of service

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1706
8911 A Research on Inference from Multiple Distance Variables in Hedonic Regression – Focus on Three Variables

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban context, urban nodes such as amenity or hazard will certainly affect house price, while classic hedonic analysis will employ distance variables measured from each urban nodes. However, effects from distances to facilities on house prices generally do not represent the true price of the property. Distance variables measured on the same surface are suffering a problem called multicollinearity, which is usually presented as magnitude variance and mean value in regression, errors caused by instability. In this paper, we provided a theoretical framework to identify and gather the data with less bias, and also provided specific sampling method on locating the sample region to avoid the spatial multicollinerity problem in three distance variable’s case.

Keywords: Hedonic regression, urban node, distance variables, multicollinerity, collinearity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1979
8910 Effects of Video Games and Online Chat on Mathematics Performance in High School: An Approach of Multivariate Data Analysis

Authors: Lina Wu, Wenyi Lu, Ye Li

Abstract:

Regarding heavy video game players for boys and super online chat lovers for girls as a symbolic phrase in the current adolescent culture, this project of data analysis verifies the displacement effect on deteriorating mathematics performance. To evaluate correlation or regression coefficients between a factor of playing video games or chatting online and mathematics performance compared with other factors, we use multivariate analysis technique and take gender difference into account. We find the most important reason for the negative sign of the displacement effect on mathematics performance due to students’ poor academic background. Statistical analysis methods in this project could be applied to study internet users’ academic performance from the high school education to the college education.

Keywords: Correlation coefficients, displacement effect, gender difference, multivariate analysis technique, regression coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2158
8909 Factors for Entry Timing Choices Using Principal Axis Factorial Analysis and Logistic Regression Model

Authors: Mat Isa, C. M., Mohd Saman, H., Mohd Nasir, S. R., Jaapar, A.

Abstract:

International market expansion involves a strategic process of market entry decision through which a firm expands its operation from domestic to the international domain. Hence, entry timing choices require the needs to balance the early entry risks and the problems in losing opportunities as a result of late entry into a new market. Questionnaire surveys administered to 115 Malaysian construction firms operating in 51 countries worldwide have resulted in 39.1 percent response rate. Factor analysis was used to determine the most significant factors affecting entry timing choices of the firms to penetrate the international market. A logistic regression analysis used to examine the firms’ entry timing choices, indicates that the model has correctly classified 89.5 per cent of cases as late movers. The findings reveal that the most significant factor influencing the construction firms’ choices as late movers was the firm factor related to the firm’s international experience, resources, competencies and financing capacity. The study also offers valuable information to construction firms with intention to internationalize their businesses.

Keywords: Factors, early movers, entry timing choices, late movers, Logistic Regression Model, Principal Axis Factorial Analysis, Malaysian construction firms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2220
8908 Density Estimation using Generalized Linear Model and a Linear Combination of Gaussians

Authors: Aly Farag, Ayman El-Baz, Refaat Mohamed

Abstract:

In this paper we present a novel approach for density estimation. The proposed approach is based on using the logistic regression model to get initial density estimation for the given empirical density. The empirical data does not exactly follow the logistic regression model, so, there will be a deviation between the empirical density and the density estimated using logistic regression model. This deviation may be positive and/or negative. In this paper we use a linear combination of Gaussian (LCG) with positive and negative components as a model for this deviation. Also, we will use the expectation maximization (EM) algorithm to estimate the parameters of LCG. Experiments on real images demonstrate the accuracy of our approach.

Keywords: Logistic regression model, Expectationmaximization, Segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1722
8907 Multiple Regression based Graphical Modeling for Images

Authors: Pavan S., Sridhar G., Sridhar V.

Abstract:

Super resolution is one of the commonly referred inference problems in computer vision. In the case of images, this problem is generally addressed using a graphical model framework wherein each node represents a portion of the image and the edges between the nodes represent the statistical dependencies. However, the large dimensionality of images along with the large number of possible states for a node makes the inference problem computationally intractable. In this paper, we propose a representation wherein each node can be represented as acombination of multiple regression functions. The proposed approach achieves a tradeoff between the computational complexity and inference accuracy by varying the number of regression functions for a node.

Keywords: Belief propagation, Graphical model, Regression, Super resolution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534