Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6

bootstrap Related Publications

6 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: bootstrap, error rate, diabetes risk groups, k-nearest neighbors

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1702
5 The Use of Degradation Measures to Design Reliability Test Plans

Authors: Stephen V. Crowder, Jonathan W. Lane

Abstract:

With short production development times, there is an increased need to demonstrate product reliability relatively quickly with minimal testing. In such cases there may be few if any observed failures. Thus it may be difficult to assess reliability using the traditional reliability test plans that measure only time (or cycles) to failure. For many components, degradation measures will contain important information about performance and reliability. These measures can be used to design a minimal test plan, in terms of number of units placed on test and duration of the test, necessary to demonstrate a reliability goal. In this work we present a case study involving an electronic component subject to degradation. The data, consisting of 42 degradation paths of cycles to failure, are first used to estimate a reliability function. Bootstrapping techniques are then used to perform power studies and develop a minimal reliability test plan for future production of this component. 

Keywords: degradation measure, time to failure distribution, bootstrap

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
4 Bootstrap and MLS Methods-based Individual Bioequivalence Assessment

Authors: Kongsheng Zhang, Li Ge

Abstract:

It is a one-sided hypothesis testing process for assessing bioequivalence. Bootstrap and modified large-sample(MLS) methods are considered to study individual bioequivalence(IBE), type I error and power of hypothesis tests are simulated and compared with FDA(2001). The results show that modified large-sample method is equivalent to the method of FDA(2001) .

Keywords: bootstrap, Individual bioequivalence, Bayesian bootstrap, modified large-sample

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1241
3 Comparison of Alternative Models to Predict Lean Meat Percentage of Lamb Carcasses

Authors: Vasco A. P. Cadavez, Fernando C. Monteiro

Abstract:

The objective of this study was to develop and compare alternative prediction equations of lean meat proportion (LMP) of lamb carcasses. Forty (40) male lambs, 22 of Churra Galega Bragançana Portuguese local breed and 18 of Suffolk breed were used. Lambs were slaughtered, and carcasses weighed approximately 30 min later in order to obtain hot carcass weight (HCW). After cooling at 4º C for 24-h a set of seventeen carcass measurements was recorded. The left side of carcasses was dissected into muscle, subcutaneous fat, inter-muscular fat, bone, and remainder (major blood vessels, ligaments, tendons, and thick connective tissue sheets associated with muscles), and the LMP was evaluated as the dissected muscle percentage. Prediction equations of LMP were developed, and fitting quality was evaluated through the coefficient of determination of estimation (R2 e) and standard error of estimate (SEE). Models validation was performed by k-fold crossvalidation and the coefficient of determination of prediction (R2 p) and standard error of prediction (SEP) were computed. The BT2 measurement was the best single predictor and accounted for 37.8% of the LMP variation with a SEP of 2.30%. The prediction of LMP of lamb carcasses can be based simple models, using as predictors the HCW and one fat thickness measurement.

Keywords: bootstrap, carcass, lambs, Lean meat

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1277
2 A Decision Boundary based Discretization Technique using Resampling

Authors: Taimur Qureshi, Djamel A Zighed

Abstract:

Many supervised induction algorithms require discrete data, even while real data often comes in a discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce a technique by using resampling (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether the resampling technique can lead to better discretization points, which opens up a new paradigm to construction of soft decision trees.

Keywords: discretization, Resampling, bootstrap, soft decision trees

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1095
1 Small Sample Bootstrap Confidence Intervals for Long-Memory Parameter

Authors: Jesus Orbe, Josu Arteche

Abstract:

The log periodogram regression is widely used in empirical applications because of its simplicity, since only a least squares regression is required to estimate the memory parameter, d, its good asymptotic properties and its robustness to misspecification of the short term behavior of the series. However, the asymptotic distribution is a poor approximation of the (unknown) finite sample distribution if the sample size is small. Here the finite sample performance of different nonparametric residual bootstrap procedures is analyzed when applied to construct confidence intervals. In particular, in addition to the basic residual bootstrap, the local and block bootstrap that might adequately replicate the structure that may arise in the errors of the regression are considered when the series shows weak dependence in addition to the long memory component. Bias correcting bootstrap to adjust the bias caused by that structure is also considered. Finally, the performance of the bootstrap in log periodogram regression based confidence intervals is assessed in different type of models and how its performance changes as sample size increases.

Keywords: bootstrap, confidence interval, long memory, log periodogram regression

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365