Search results for: panel data regression models

9437 Optimal Calculation of Partial Transmission Ratios of Four-Step Helical Gearboxes for Getting Minimal Gearbox Length

Abstract:

This paper presents a new study on the applications of optimization and regression analysis techniques for optimal calculation of partial ratios of four-step helical gearboxes for getting minimal gearbox length. In the paper, basing on the moment equilibrium condition of a mechanic system including four gear units and their regular resistance condition, models for determination of the partial ratios of the gearboxes are proposed. In particular, explicit models for calculation of the partial ratios are proposed by using regression analysis. Using these models, the determination of the partial ratios is accurate and simple.

Keywords: Gearbox design; optimal design; helical gearbox, transmission ratio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2090

9436 Geometric Modeling of Illumination on the TFT-LCD Panel using Bezier Surface

Authors: Kyong-min Lee, Moon Soo Chang, PooGyeon Park

Abstract:

In this paper, we propose a geometric modeling of illumination on the patterned image containing etching transistor. This image is captured by a commercial camera during the inspection of a TFT-LCD panel. Inspection of defect is an important process in the production of LCD panel, but the regional difference in brightness, which has a negative effect on the inspection, is due to the uneven illumination environment. In order to solve this problem, we present a geometric modeling of illumination consisting of an interpolation using the least squares method and 3D modeling using bezier surface. Our computational time, by using the sampling method, is shorter than the previous methods. Moreover, it can be further used to correct brightness in every patterned image.

Keywords: Bezier, defect, geometric modeling, illumination, inspection, LCD, panel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1855

9435 Analyzing the Impact of Spatio-Temporal Climate Variations on the Rice Crop Calendar in Pakistan

Authors: Muhammad Imran, Iqra Basit, Mobushir Riaz Khan, Sajid Rasheed Ahmad

Abstract:

The present study investigates the space-time impact of climate change on the rice crop calendar in tropical Gujranwala, Pakistan. The climate change impact was quantified through the climatic variables, whereas the existing calendar of the rice crop was compared with the phonological stages of the crop, depicted through the time series of the Normalized Difference Vegetation Index (NDVI) derived from Landsat data for the decade 2005-2015. Local maxima were applied on the time series of NDVI to compute the rice phonological stages. Panel models with fixed and cross-section fixed effects were used to establish the relation between the climatic parameters and the time-series of NDVI across villages and across rice growing periods. Results show that the climatic parameters have significant impact on the rice crop calendar. Moreover, the fixed effect model is a significant improvement over cross-sectional fixed effect models (R-squared equal to 0.673 vs. 0.0338). We conclude that high inter-annual variability of climatic variables cause high variability of NDVI, and thus, a shift in the rice crop calendar. Moreover, inter-annual (temporal) variability of the rice crop calendar is high compared to the inter-village (spatial) variability. We suggest the local rice farmers to adapt this change in the rice crop calendar.

Keywords: Landsat NDVI, panel models, temperature, rainfall.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 911

9434 Research on Static and Dynamic Behavior of New Combination of Aluminum Honeycomb Panel and Rod Single-layer Latticed Shell

Authors: Xu Chen, Zhao Caiqi

Abstract:

In addition to the advantages of light weight, resistant corrosion and ease of processing, aluminum is also applied to the long-span spatial structures. However, the elastic modulus of aluminum is lower than that of the steel. This paper combines the high performance aluminum honeycomb panel with the aluminum latticed shell, forming a new panel-and-rod composite shell structure. Through comparative analysis between the static and dynamic performance, the conclusion that the structure of composite shell is noticeably superior to the structure combined before.

Keywords: Combination of aluminum honeycomb panel and rod latticed shell, dynamic performance, response spectrum analysis, seismic properties.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1944

9433 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

Authors: Ghada A. Alfattni

Abstract:

Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates.

Keywords: Imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1314

9432 Zero Inflated Models for Overdispersed Count Data

Authors: Y. N. Phang, E. F. Loh

Abstract:

The zero inflated models are usually used in modeling count data with excess zeros where the existence of the excess zeros could be structural zeros or zeros which occur by chance. These type of data are commonly found in various disciplines such as finance, insurance, biomedical, econometrical, ecology, and health sciences which involve sex and health dental epidemiology. The most popular zero inflated models used by many researchers are zero inflated Poisson and zero inflated negative binomial models. In addition, zero inflated generalized Poisson and zero inflated double Poisson models are also discussed and found in some literature. Recently zero inflated inverse trinomial model and zero inflated strict arcsine models are advocated and proven to serve as alternative models in modeling overdispersed count data caused by excessive zeros and unobserved heterogeneity. The purpose of this paper is to review some related literature and provide a variety of examples from different disciplines in the application of zero inflated models. Different model selection methods used in model comparison are discussed.

Keywords: Overdispersed count data, model selection methods, likelihood ratio, AIC, BIC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4532

9431 Granger Causal Nexus between Financial Development and Energy Consumption: Evidence from Cross Country Panel Data

Authors: Rudra P. Pradhan

Abstract:

This paper examines the Granger causal nexus between financial development and energy consumption in the group of 35 Financial Action Task Force (FATF) Countries over the period 1988-2012. The study uses two financial development indicators such as private sector credit and stock market capitalization and seven energy consumption indicators such as coal, oil, gas, electricity, hydro-electrical, nuclear and biomass. Using panel cointegration tests, the study finds that financial development and energy consumption are cointegrated, indicating the presence of a long-run relationship between the two. Using a panel vector error correction model (VECM), the study detects both bidirectional and unidirectional causality between financial development and energy consumption. The variation of this causality is due to the use of different proxies for both financial development and energy consumption. The policy implication of this study is that economic policies should recognize the differences in the financial development-energy consumption nexus in order to maintain sustainable development in the selected 35 FATF countries.

Keywords: Financial development, energy consumption, Panel VECM, FATF countries.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1513

9430 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1174

9429 Migration and Unemployment Duration: The Case of the OECD Countries

Authors: Vincent Fromentin

Abstract:

This paper examines whether or not immigration has a positive influence on the duration of unemployment, in a macroeconomic perspective. We analyse also whether the degree of labor market integration can influence migration. The integration of immigrants into the labor market is a recurrence theme in the work on the economic consequences of immigration. However, to our knowledge, no researchers have studied the impact of immigration on unemployment duration, and vice versa. With two methodology of research (panel estimations (OLS and 2SLS) and panel cointegration techniques), we show that migration seems to influence positively the short-term unemployment and negatively long-term unemployment, for 14 OECD destination countries. In addition, immigration seems to be conditioned by the structural and institutional characteristics of the labour market.

Keywords: international migration, unemployment duration, OECD countries, panel data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1669

9428 A Study on Optimal Determination of Partial Transmission Ratios of Helical Gearboxes with Second-Step Double Gear-Sets

Authors: Vu Ngoc Pi

Abstract:

In this paper, a study on the applications of the optimization and regression techniques for optimal calculation of partial ratios of helical gearboxes with second-step double gear-sets for minimal cross section dimension is introduced. From the condition of the moment equilibrium of a mechanic system including three gear units and their regular resistance condition, models for calculation of the partial ratios of helical gearboxes with second-step double gear-sets were given. Especially, by regression analysis, explicit models for calculation of the partial ratios are introduced. These models allow determining the partial ratios accurately and simply.

Keywords: Gearbox design, optimal design, helical gearbox, transmission ratio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1641

9427 The Impact of Trade on Social Development

Authors: Umut Gunduz, Mehtap Hisarciklilar, Tolga Kaya

Abstract:

Studies revealing the positive relationship between trade and income are often criticized with the argument that “development should mean more than rising incomes". Taking this argument as a base and utilizing panel data, Davies and Quinlivan [1] have demonstrated that increases in trade are positively associated with future increases in social welfare as measured by the Human Development Index (HDI). The purpose of this study is twofold: Firstly, utilizing an income based country classification; it is aimed to investigate whether the positive association between foreign trade and HDI is valid within all country groups. Secondly, keeping the same categorization as a base; it is aimed to reveal whether the positive link between trade and HDI still exists when the income components of the index are excluded. Employing a panel data framework of 106 countries, this study reveals that the positive link between trade and human development is valid only for high and medium income countries. Moreover, the positive link between trade and human development diminishes in lower-medium income countries when only non-income components of the index are taken into consideration.

Keywords: HDI, foreign trade, development, panel data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2714

9426 Predictive Models for Compressive Strength of High Performance Fly Ash Cement Concrete for Pavements

Authors: S. M. Gupta, Vanita Aggarwal, Som Nath Sachdeva

Abstract:

The work reported through this paper is an experimental work conducted on High Performance Concrete (HPC) with super plasticizer with the aim to develop some models suitable for prediction of compressive strength of HPC mixes. In this study, the effect of varying proportions of fly ash (0% to 50% @ 10% increment) on compressive strength of high performance concrete has been evaluated. The mix designs studied were M30, M40 and M50 to compare the effect of fly ash addition on the properties of these concrete mixes. In all eighteen concrete mixes that have been designed, three were conventional concretes for three grades under discussion and fifteen were HPC with fly ash with varying percentages of fly ash. The concrete mix designing has been done in accordance with Indian standard recommended guidelines. All the concrete mixes have been studied in terms of compressive strength at 7 days, 28 days, 90 days, and 365 days. All the materials used have been kept same throughout the study to get a perfect comparison of values of results. The models for compressive strength prediction have been developed using Linear Regression method (LR), Artificial Neural Network (ANN) and Leave-One-Out Validation (LOOV) methods.

Keywords: ANN, concrete mixes, compressive strength, fly ash, high performance concrete, linear regression, strength prediction models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2077

9425 A Fuzzy Nonlinear Regression Model for Interval Type-2 Fuzzy Sets

Authors: O. Poleshchuk, E.Komarov

Abstract:

This paper presents a regression model for interval type-2 fuzzy sets based on the least squares estimation technique. Unknown coefficients are assumed to be triangular fuzzy numbers. The basic idea is to determine aggregation intervals for type-1 fuzzy sets, membership functions of whose are low membership function and upper membership function of interval type-2 fuzzy set. These aggregation intervals were called weighted intervals. Low and upper membership functions of input and output interval type-2 fuzzy sets for developed regression models are considered as piecewise linear functions.

Keywords: Interval type-2 fuzzy sets, fuzzy regression, weighted interval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2218

9424 Designing Social Care Plans Considering Cause-Effect Relationships: A Study in Scotland

Authors: Sotirios N. Raptis

Abstract:

The paper links social needs to social classes by the creation of cohorts of public services matched as causes to other ones as effects using cause-effect (CE) models. It then compares these associations using CE and typical regression methods (LR, ARMA). The paper discusses such public service groupings offered in Scotland in the long term to estimate the risk of multiple causes or effects that can ultimately reduce the healthcare cost by linking the next services to the likely causes of them. The same generic goal can be achieved using LR or ARMA and differences are discussed. The work uses Health and Social Care (H&Sc) public services data from 11 service packs offered by Public Health Services (PHS) Scotland that boil down to 110 single-attribute year series, called ’factors’. The study took place at Macmillan Cancer Support, UK and Abertay University, Dundee, from 2020 to 2023. The paper discusses CE relationships as a main method and compares sample findings with Linear Regression (LR), ARMA, to see how the services are linked. Relationships found were between smoking-related healthcare provision, mental-health-related services, and epidemiological weight in Primary-1-Education Body-Mass-Index (BMI) in children as CE models. Insurance companies and public policymakers can pack CE-linked services in plans such as those for the elderly, low-income people, in the long term. The linkage of services was confirmed allowing more accurate resource planning.

Keywords: Probability, regression, cause-effect cohorts, data frames, services, prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 57

9423 Electron-Impact Excitation of Kr 5s, 5p Levels

Authors: Alla A. Mityureva

Abstract:

The available data on the cross sections of electronimpact excitation of krypton 5s and 5p configuration levels out of the ground state are represented in convenient and compact form. The results are obtained by regression through all known published data related to this process.

Keywords: Cross section, electron excitation, krypton, regression

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1087

9422 Performance Comparison of Cooperative Banks in the EU, USA and Canada

Authors: Matěj Kuc

Abstract:

This paper compares different types of profitability measures of cooperative banks from two developed regions: the European Union and the United States of America together with Canada. We created balanced dataset of more than 200 cooperative banks covering 2011-2016 period. We made series of tests and run Random Effects estimation on panel data. We found that American and Canadian cooperatives are more profitable in terms of return on assets (ROA) and return on equity (ROE). There is no significant difference in net interest margin (NIM). Our results show that the North American cooperative banks accommodated better to the current market environment.

Keywords: Cooperative banking, panel data, profitability measures, random effects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 650

9421 Zero Inflated Strict Arcsine Regression Model

Authors: Y. N. Phang, E. F. Loh

Abstract:

Zero inflated strict arcsine model is a newly developed model which is found to be appropriate in modeling overdispersed count data. In this study, we extend zero inflated strict arcsine model to zero inflated strict arcsine regression model by taking into consideration the extra variability caused by extra zeros and covariates in count data. Maximum likelihood estimation method is used in estimating the parameters for this zero inflated strict arcsine regression model.

Keywords: Overdispersed count data, maximum likelihood estimation, simulated annealing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1755

9420 Free Fatty Acid Assessment of Crude Palm Oil Using a Non-Destructive Approach

Authors: Siti Nurhidayah Naqiah Abdull Rani, Herlina Abdul Rahim, Rashidah Ghazali, Noramli Abdul Razak

Abstract:

Near infrared (NIR) spectroscopy has always been of great interest in the food and agriculture industries. The development of prediction models has facilitated the estimation process in recent years. In this study, 110 crude palm oil (CPO) samples were used to build a free fatty acid (FFA) prediction model. 60% of the collected data were used for training purposes and the remaining 40% used for testing. The visible peaks on the NIR spectrum were at 1725 nm and 1760 nm, indicating the existence of the first overtone of C-H bands. Principal component regression (PCR) was applied to the data in order to build this mathematical prediction model. The optimal number of principal components was 10. The results showed R2=0.7147 for the training set and R2=0.6404 for the testing set.

Keywords: Palm oil, fatty acid, NIRS, regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4370

9419 Modeling Aeration of Sharp Crested Weirs by Using Support Vector Machines

Authors: Arun Goel

Abstract:

The present paper attempts to investigate the prediction of air entrainment rate and aeration efficiency of a free overfall jets issuing from a triangular sharp crested weir by using regression based modelling. The empirical equations, Support vector machine (polynomial and radial basis function) models and the linear regression techniques were applied on the triangular sharp crested weirs relating the air entrainment rate and the aeration efficiency to the input parameters namely drop height, discharge, and vertex angle. It was observed that there exists a good agreement between the measured values and the values obtained using empirical equations, Support vector machine (Polynomial and rbf) models and the linear regression techniques. The test results demonstrated that the SVM based (Poly & rbf) model also provided acceptable prediction of the measured values with reasonable accuracy along with empirical equations and linear regression techniques in modelling the air entrainment rate and the aeration efficiency of a free overfall jets issuing from triangular sharp crested weir. Further sensitivity analysis has also been performed to study the impact of input parameter on the output in terms of air entrainment rate and aeration efficiency.

Keywords: Air entrainment rate, dissolved oxygen, regression, SVM, weir.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1956

9418 An ensemble of Weighted Support Vector Machines for Ordinal Regression

Authors: Willem Waegeman, Luc Boullart

Abstract:

Instead of traditional (nominal) classification we investigate the subject of ordinal classification or ranking. An enhanced method based on an ensemble of Support Vector Machines (SVM-s) is proposed. Each binary classifier is trained with specific weights for each object in the training data set. Experiments on benchmark datasets and synthetic data indicate that the performance of our approach is comparable to state of the art kernel methods for ordinal regression. The ensemble method, which is straightforward to implement, provides a very good sensitivity-specificity trade-off for the highest and lowest rank.

Keywords: Ordinal regression, support vector machines, ensemblelearning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1642

9417 Enhancing Temporal Extrapolation of Wind Speed Using a Hybrid Technique: A Case Study in West Coast of Denmark

Authors: B. Elshafei, X. Mao

Abstract:

The demand for renewable energy is significantly increasing, major investments are being supplied to the wind power generation industry as a leading source of clean energy. The wind energy sector is entirely dependable and driven by the prediction of wind speed, which by the nature of wind is very stochastic and widely random. This s0tudy employs deep multi-fidelity Gaussian process regression, used to predict wind speeds for medium term time horizons. Data of the RUNE experiment in the west coast of Denmark were provided by the Technical University of Denmark, which represent the wind speed across the study area from the period between December 2015 and March 2016. The study aims to investigate the effect of pre-processing the data by denoising the signal using empirical wavelet transform (EWT) and engaging the vector components of wind speed to increase the number of input data layers for data fusion using deep multi-fidelity Gaussian process regression (GPR). The outcomes were compared using root mean square error (RMSE) and the results demonstrated a significant increase in the accuracy of predictions which demonstrated that using vector components of the wind speed as additional predictors exhibits more accurate predictions than strategies that ignore them, reflecting the importance of the inclusion of all sub data and pre-processing signals for wind speed forecasting models.

Keywords: Data fusion, Gaussian process regression, signal denoise, temporal extrapolation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 501

9416 Statistical Models of Network Traffic

Authors: Barath Kumar, Oliver Niggemann, Juergen Jasperneite

Abstract:

Model-based approaches have been applied successfully to a wide range of tasks such as specification, simulation, testing, and diagnosis. But one bottleneck often prevents the introduction of these ideas: Manual modeling is a non-trivial, time-consuming task. Automatically deriving models by observing and analyzing running systems is one possible way to amend this bottleneck. To derive a model automatically, some a-priori knowledge about the model structure–i.e. about the system–must exist. Such a model formalism would be used as follows: (i) By observing the network traffic, a model of the long-term system behavior could be generated automatically, (ii) Test vectors can be generated from the model, (iii) While the system is running, the model could be used to diagnose non-normal system behavior. The main contribution of this paper is the introduction of a model formalism called 'probabilistic regression automaton' suitable for the tasks mentioned above.

Keywords: Model-based approach, Probabilistic regression automata, Statistical models and Timed automata.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1539

9415 Profitability Assessment of Granite Aggregate Production and the Development of a Profit Assessment Model

Authors: Melodi Mbuyi Mata, Blessing Olamide Taiwo, Afolabi Ayodele David

Abstract:

The purpose of this research is to create empirical models for assessing the profitability of granite aggregate production in Akure, Ondo state aggregate quarries. In addition, an Artificial Neural Network (ANN) model and multivariate predicting models for granite profitability were developed in the study. A formal survey questionnaire was used to collect data for the study. The data extracted from the case study mine for this study include granite marketing operations, royalty, production costs, and mine production information. The following methods were used to achieve the goal of this study: descriptive statistics, MATLAB 2017, and SPSS16.0 software in analyzing and modeling the data collected from granite traders in the study areas. The ANN and Multi Variant Regression models' prediction accuracy was compared using a coefficient of determination (R2), Root Mean Square Error (RMSE), and mean square error (MSE). Due to the high prediction error, the model evaluation indices revealed that the ANN model was suitable for predicting generated profit in a typical quarry. More quarries in Nigeria's southwest region and other geopolitical zones should be considered to improve ANN prediction accuracy.

Keywords: National development, granite, profitability assessment, ANN models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 82

9414 Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes

Authors: Geeta Sikka, Arvinder Kaur Takkar, Moin Uddin

Abstract:

Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so as to construct quality software models. Our empirical study is based on NASA-s two public dataset. KC4 and KC1. The actual data sets of 125 cases and 2107 cases respectively, without any missing values were considered. The data set is used to create Missing at Random (MAR) data Listwise Deletion(LD), Mean Substitution(MS), Interpolation, Regression with an error term and Expectation-Maximization (EM) approaches were used to compare the effects of the various techniques.

Keywords: Missing data, Imputation, Missing Data Techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1667

9413 Modeling of Reinforcement in Concrete Beams Using Machine Learning Tools

Authors: Yogesh Aggarwal

Abstract:

The paper discusses the results obtained to predict reinforcement in singly reinforced beam using Neural Net (NN), Support Vector Machines (SVM-s) and Tree Based Models. Major advantage of SVM-s over NN is of minimizing a bound on the generalization error of model rather than minimizing a bound on mean square error over the data set as done in NN. Tree Based approach divides the problem into a small number of sub problems to reach at a conclusion. Number of data was created for different parameters of beam to calculate the reinforcement using limit state method for creation of models and validation. The results from this study suggest a remarkably good performance of tree based and SVM-s models. Further, this study found that these two techniques work well and even better than Neural Network methods. A comparison of predicted values with actual values suggests a very good correlation coefficient with all four techniques.

Keywords: Linear Regression, M5 Model Tree, Neural Network, Support Vector Machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2035

9412 Comparison of Artificial Neural Network Architectures in the Task of Tourism Time Series Forecast

Authors: João Paulo Teixeira, Paula Odete Fernandes

Abstract:

The authors have been developing several models based on artificial neural networks, linear regression models, Box- Jenkins methodology and ARIMA models to predict the time series of tourism. The time series consist in the “Monthly Number of Guest Nights in the Hotels" of one region. Several comparisons between the different type models have been experimented as well as the features used at the entrance of the models. The Artificial Neural Network (ANN) models have always had their performance at the top of the best models. Usually the feed-forward architecture was used due to their huge application and results. In this paper the author made a comparison between different architectures of the ANNs using simply the same input. Therefore, the traditional feed-forward architecture, the cascade forwards, a recurrent Elman architecture and a radial based architecture were discussed and compared based on the task of predicting the mentioned time series.

Keywords: Artificial Neural Network Architectures, time series forecast, tourism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1885

9411 Modeling Default Probabilities of the Chosen Czech Banks in the Time of the Financial Crisis

Authors: Petr Gurný

Abstract:

One of the most important tasks in the risk management is the correct determination of probability of default (PD) of particular financial subjects. In this paper a possibility of determination of financial institution’s PD according to the creditscoring models is discussed. The paper is divided into the two parts. The first part is devoted to the estimation of the three different models (based on the linear discriminant analysis, logit regression and probit regression) from the sample of almost three hundred US commercial banks. Afterwards these models are compared and verified on the control sample with the view to choose the best one. The second part of the paper is aimed at the application of the chosen model on the portfolio of three key Czech banks to estimate their present financial stability. However, it is not less important to be able to estimate the evolution of PD in the future. For this reason, the second task in this paper is to estimate the probability distribution of the future PD for the Czech banks. So, there are sampled randomly the values of particular indicators and estimated the PDs’ distribution, while it’s assumed that the indicators are distributed according to the multidimensional subordinated Lévy model (Variance Gamma model and Normal Inverse Gaussian model, particularly). Although the obtained results show that all banks are relatively healthy, there is still high chance that “a financial crisis” will occur, at least in terms of probability. This is indicated by estimation of the various quantiles in the estimated distributions. Finally, it should be noted that the applicability of the estimated model (with respect to the used data) is limited to the recessionary phase of the financial market.

Keywords: Credit-scoring Models, Multidimensional Subordinated Lévy Model, Probability of Default.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1919

9410 MMU Simulation in Hardware Simulator Based-on State Transition Models

Authors: Zhang Xiuping, Yang Guowu, Zheng Desheng

Abstract:

Embedded hardware simulator is a valuable computeraided tool for embedded application development. This paper focuses on the ARM926EJ-S MMU, builds state transition models and formally verifies critical properties for the models. The state transition models include loading instruction model, reading data model, and writing data model. The properties of the models are described by CTL specification language, and they are verified in VIS. The results obtained in VIS demonstrate that the critical properties of MMU are satisfied in the state transition models. The correct models can be used to implement the MMU component in our simulator. In the end of this paper, the experimental results show that the MMU can successfully accomplish memory access requests from CPU.

Keywords: MMU, State transition, Model, Simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617

9409 Churn Prediction: Does Technology Matter?

Authors: John Hadden, Ashutosh Tiwari, Rajkumar Roy, Dymitr Ruta

Abstract:

The aim of this paper is to identify the most suitable model for churn prediction based on three different techniques. The paper identifies the variables that affect churn in reverence of customer complaints data and provides a comparative analysis of neural networks, regression trees and regression in their capabilities of predicting customer churn.

Keywords: Churn, Decision Trees, Neural Networks, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3301

9408 Using Historical Data for Stock Prediction of a Tech Company

Authors: Sofia Stoica

Abstract:

In this paper, we use historical data to predict the stock price of a tech company. To this end, we use a dataset consisting of the stock prices over the past five years of 10 major tech companies: Adobe, Amazon, Apple, Facebook, Google, Microsoft, Netflix, Oracle, Salesforce, and Tesla. We implemented and tested three models – a linear regressor model, a k-nearest neighbor model (KNN), and a sequential neural network – and two algorithms – Multiplicative Weight Update and AdaBoost. We found that the sequential neural network performed the best, with a testing error of 0.18%. Interestingly, the linear model performed the second best with a testing error of 0.73%. These results show that using historical data is enough to obtain high accuracies, and a simple algorithm like linear regression has a performance similar to more sophisticated models while taking less time and resources to implement.

Keywords: Finance, machine learning, opening price, stock market.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 660