Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 18436

Search results for: bootstrap method

18436 An Application of Modified M-out-of-N Bootstrap Method to Heavy-Tailed Distributions

Authors: Hannah F. Opayinka, Adedayo A. Adepoju

Abstract:

This study is an extension of a prior study on the modification of the existing m-out-of-n (moon) bootstrap method for heavy-tailed distributions in which modified m-out-of-n (mmoon) was proposed as an alternative method to the existing moon technique. In this study, both moon and mmoon techniques were applied to two real income datasets which followed Lognormal and Pareto distributions respectively with finite variances. The performances of these two techniques were compared using Standard Error (SE) and Root Mean Square Error (RMSE). The findings showed that mmoon outperformed moon bootstrap in terms of smaller SEs and RMSEs for all the sample sizes considered in the two datasets.

Keywords: Bootstrap, income data, lognormal distribution, Pareto distribution

Procedia PDF Downloads 152

18435 Approximate Confidence Interval for Effect Size Base on Bootstrap Resampling Method

Authors: S. Phanyaem

Abstract:

This paper presents the confidence intervals for the effect size base on bootstrap resampling method. The meta-analytic confidence interval for effect size is proposed that are easy to compute. A Monte Carlo simulation study was conducted to compare the performance of the proposed confidence intervals with the existing confidence intervals. The best confidence interval method will have a coverage probability close to 0.95. Simulation results have shown that our proposed confidence intervals perform well in terms of coverage probability and expected length.

Keywords: effect size, confidence interval, bootstrap method, resampling

Procedia PDF Downloads 564

18434 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart

Authors: O. Ikpotokin

Abstract:

In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.

Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics

Procedia PDF Downloads 319

18433 Using the Bootstrap for Problems Statistics

Authors: Brahim Boukabcha, Amar Rebbouh

Abstract:

The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.

Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models

Procedia PDF Downloads 352

18432 The Contribution of Edgeworth, Bootstrap and Monte Carlo Methods in Financial Data

Authors: Edlira Donefski, Tina Donefski, Lorenc Ekonomi

Abstract:

Edgeworth Approximation, Bootstrap, and Monte Carlo Simulations have considerable impacts on achieving certain results related to different problems taken into study. In our paper, we have treated a financial case related to the effect that has the components of a cash-flow of one of the most successful businesses in the world, as the financial activity, operational activity, and investment activity to the cash and cash equivalents at the end of the three-months period. To have a better view of this case, we have created a vector autoregression model, and after that, we have generated the impulse responses in the terms of asymptotic analysis (Edgeworth Approximation), Monte Carlo Simulations, and residual bootstrap based on the standard errors of every series created. The generated results consisted of the common tendencies for the three methods applied that consequently verified the advantage of the three methods in the optimization of the model that contains many variants.

Keywords: autoregression, bootstrap, edgeworth expansion, Monte Carlo method

Procedia PDF Downloads 112

18431 Asymptotic Spectral Theory for Nonlinear Random Fields

Authors: Karima Kimouche

Abstract:

In this paper, we consider the asymptotic problems in spectral analysis of stationary causal random fields. We impose conditions only involving (conditional) moments, which are easily verifiable for a variety of nonlinear random fields. Limiting distributions of periodograms and smoothed periodogram spectral density estimates are obtained and applications to the spectral domain bootstrap are given.

Keywords: spatial nonlinear processes, spectral estimators, GMC condition, bootstrap method

Procedia PDF Downloads 421

18430 Reminiscence Therapy for Alzheimer’s Disease Restrained on Logistic Regression Based Linear Bootstrap Aggregating

Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Xianpei Li, Yanmin Yuan, Tracy Lin Huan

Abstract:

Researchers are doing enchanting research into the inherited features of Alzheimer’s disease and probable consistent therapies. In Alzheimer’s, memories are extinct in reverse order; memories formed lately are more transitory than those from formerly. Reminiscence therapy includes the conversation of past actions, trials and knowledges with another individual or set of people, frequently with the help of perceptible reminders such as photos, household and other acquainted matters from the past, music and collection of tapes. In this manuscript, the competence of reminiscence therapy for Alzheimer’s disease is measured using logistic regression based linear bootstrap aggregating. Logistic regression is used to envisage the experiential features of the patient’s memory through various therapies. Linear bootstrap aggregating shows better stability and accuracy of reminiscence therapy used in statistical classification and regression of memories related to validation therapy, supportive psychotherapy, sensory integration and simulated presence therapy.

Keywords: Alzheimer’s disease, linear bootstrap aggregating, logistic regression, reminiscence therapy

Procedia PDF Downloads 272

18429 Subfamilial Relationships within Solanaceae as Inferred from atpB-rbcL Intergenic Spacer

Authors: Syeda Qamarunnisa, Ishrat Jamil, Abid Azhar, Zabta K. Shinwari, Syed Irtifaq Ali

Abstract:

A phylogenetic analysis of family Solanaceae was conducted using sequence data from the chloroplast intergenic atpB-rbcL spacer. Sequence data was generated from 17 species representing 09 out of 14 genera of Solanaceae from Pakistan. Cladogram was constructed using maximum parsimony method and results indicate that Solanaceae is mainly divided into two subfamilies; Solanoideae and Cestroideae. Four major clades within Solanoideae represent tribes; Physaleae, Capsiceae, Datureae and Solaneae are supported by high bootstrap value and the relationships among them are not corroborating with the previous studies. The findings established that subfamily Cestroideae comprised of three genera; Cestrum, Lycium, and Nicotiana with high bootstrap support. Position of Nicotiana inferred with atpB-rbcL sequence is congruent with traditional classification, which placed the taxa in Cestroideae. In the current study Lycium unexpectedly nested with Nicotiana with 100% bootstrap support and identified as a member of tribe Nicotianeae. Expanded sampling of other genera from Pakistan could be valuable towards improving our understanding of intrafamilial relationships within Solanaceae.

Keywords: systematics, solanaceae, phylogenetics, intergenic spacer, tribes

Procedia PDF Downloads 436

18428 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski

Abstract:

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Keywords: bootstrap, edgeworth approximation, IID, quantile

Procedia PDF Downloads 126

18427 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 465

18426 Finite Sample Inferences for Weak Instrument Models

Authors: Gubhinder Kundhi, Paul Rilstone

Abstract:

It is well established that Instrumental Variable (IV) estimators in the presence of weak instruments can be poorly behaved, in particular, be quite biased in finite samples. Finite sample approximations to the distributions of these estimators are obtained using Edgeworth and Saddlepoint expansions. Departures from normality of the distributions of these estimators are analyzed using higher order analytical corrections in these expansions. In a Monte-Carlo experiment, the performance of these expansions is compared to the first order approximation and other methods commonly used in finite samples such as the bootstrap.

Keywords: bootstrap, Instrumental Variable, Edgeworth expansions, Saddlepoint expansions

Procedia PDF Downloads 282

18425 Confidence Intervals for Quantiles in the Two-Parameter Exponential Distributions with Type II Censored Data

Authors: Ayman Baklizi

Abstract:

Based on type II censored data, we consider interval estimation of the quantiles of the two-parameter exponential distribution and the difference between the quantiles of two independent two-parameter exponential distributions. We derive asymptotic intervals, Bayesian, as well as intervals based on the generalized pivot variable. We also include some bootstrap intervals in our comparisons. The performance of these intervals is investigated in terms of their coverage probabilities and expected lengths.

Keywords: asymptotic intervals, Bayes intervals, bootstrap, generalized pivot variables, two-parameter exponential distribution, quantiles

Procedia PDF Downloads 382

18424 Efficiency, Effectiveness, and Technological Change in Armed Forces: Indonesian Case

Authors: Citra Pertiwi, Muhammad Fikruzzaman Rahawarin

Abstract:

Government of Indonesia had committed to increasing its national defense the budget up to 1,5 percent of GDP. However, the budget increase does not necessarily allocate efficiently and effectively. Using Data Envelopment Analysis (DEA), the operational units of Indonesian Armed Forces are considered as a proxy to measure those two aspects. The bootstrap technique is being used as well to reduce uncertainty in the estimation. Additionally, technological change is being measured as a nonstationary component. Nearly half of the units are being estimated as fully efficient, with less than a third is considered as effective. Longer and larger sets of data might increase the robustness of the estimation in the future.

Keywords: bootstrap, effectiveness, efficiency, DEA, military, Malmquist, technological change

Procedia PDF Downloads 276

18423 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

Introduction: The problems of unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many research papers found that the performance of existing classifier tends to be biased towards the majority class. The k -nearest neighbors’ nonparametric discriminant analysis is one method that was proposed for classifying unbalanced classes with good performance. Hence, the methods of discriminant analysis are of interest to us in investigating misclassification error rates for class-imbalanced data of three diabetes risk groups. Objective: The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification application of class-imbalanced data of diabetes risk groups. Methods: Data from a healthy project for 599 staffs in a government hospital in Bangkok were obtained for the classification problem. The staffs were diagnosed into one of three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data along with the variables; diabetes risk group, age, gender, cholesterol, and BMI was analyzed and bootstrapped up to 50 and 100 samples, 599 observations per sample, for additional estimation of misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples show non-normality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. In finding the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions with three choices of (0.90:0.05:0.05), (0.80: 0.10: 0.10) or (0.70, 0.15, 0.15). Results: The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k = 3 or k = 4 and the prior probabilities of {non-risk:risk:diabetic} as {0.90:0.05:0.05} or {0.80:0.10:0.10} gave the smallest error rate of misclassification. Conclusion: The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: error rate, bootstrap, diabetes risk groups, k-nearest neighbors

Procedia PDF Downloads 406

18422 Confidence Envelopes for Parametric Model Selection Inference and Post-Model Selection Inference

Authors: I. M. L. Nadeesha Jayaweera, Adao Alex Trindade

Abstract:

In choosing a candidate model in likelihood-based modeling via an information criterion, the practitioner is often faced with the difficult task of deciding just how far up the ranked list to look. Motivated by this pragmatic necessity, we construct an uncertainty band for a generalized (model selection) information criterion (GIC), defined as a criterion for which the limit in probability is identical to that of the normalized log-likelihood. This includes common special cases such as AIC & BIC. The method starts from the asymptotic normality of the GIC for the joint distribution of the candidate models in an independent and identically distributed (IID) data framework and proceeds by deriving the (asymptotically) exact distribution of the minimum. The calculation of an upper quantile for its distribution then involves the computation of multivariate Gaussian integrals, which is amenable to efficient implementation via the R package "mvtnorm". The performance of the methodology is tested on simulated data by checking the coverage probability of nominal upper quantiles and compared to the bootstrap. Both methods give coverages close to nominal for large samples, but the bootstrap is two orders of magnitude slower. The methodology is subsequently extended to two other commonly used model structures: regression and time series. In the regression case, we derive the corresponding asymptotically exact distribution of the minimum GIC invoking Lindeberg-Feller type conditions for triangular arrays and are thus able to similarly calculate upper quantiles for its distribution via multivariate Gaussian integration. The bootstrap once again provides a default competing procedure, and we find that similar comparison performance metrics hold as for the IID case. The time series case is complicated by far more intricate asymptotic regime for the joint distribution of the model GIC statistics. Under a Gaussian likelihood, the default in most packages, one needs to derive the limiting distribution of a normalized quadratic form for a realization from a stationary series. Under conditions on the process satisfied by ARMA models, a multivariate normal limit is once again achieved. The bootstrap can, however, be employed for its computation, whence we are once again in the multivariate Gaussian integration paradigm for upper quantile evaluation. Comparisons of this bootstrap-aided semi-exact method with the full-blown bootstrap once again reveal a similar performance but faster computation speeds. One of the most difficult problems in contemporary statistical methodological research is to be able to account for the extra variability introduced by model selection uncertainty, the so-called post-model selection inference (PMSI). We explore ways in which the GIC uncertainty band can be inverted to make inferences on the parameters. This is being attempted in the IID case by pivoting the CDF of the asymptotically exact distribution of the minimum GIC. For inference one parameter at a time and a small number of candidate models, this works well, whence the attained PMSI confidence intervals are wider than the MLE-based Wald, as expected.

Keywords: model selection inference, generalized information criteria, post model selection, Asymptotic Theory

Procedia PDF Downloads 60

18421 An Aspiring Solution to the Man in the Middle Bootstrap Vulnerability

Authors: Mouad Zouina, Benaceur Outtaj

Abstract:

The proposed work falls within the context of improving data security for m-commerce systems. In this context we have placed under the light some flaws encountered in HTTPS the most used m-commerce protocol, particularly the man in the middle attack, shortly MITM. The man in the middle attack is an active listening attack. The idea of this attack is to target the handshake phase of the HTTPS protocol which is the transition from a non-secure connection to a secure connection in our case HTTP to HTTPS. This paper proposes a solution to fix those flaws based on the upgrade of HSTS standard handshake sequence using the DNSSEC standard.

Keywords: m-commerce, HTTPS, HSTS, DNSSEC, MITM bootstrap vulnerability

Procedia PDF Downloads 368

18420 Traffic Sign Recognition System Using Convolutional Neural NetworkDevineni

Authors: Devineni Vijay Bhaskar, Yendluri Raja

Abstract:

We recommend a model for traffic sign detection stranded on Convolutional Neural Networks (CNN). We first renovate the unique image into the gray scale image through with support vector machines, then use convolutional neural networks with fixed and learnable layers for revealing and understanding. The permanent layer can reduction the amount of attention areas to notice and crop the limits very close to the boundaries of traffic signs. The learnable coverings can rise the accuracy of detection significantly. Besides, we use bootstrap procedures to progress the accuracy and avoid overfitting problem. In the German Traffic Sign Detection Benchmark, we obtained modest results, with an area under the precision-recall curve (AUC) of 99.49% in the group “Risk”, and an AUC of 96.62% in the group “Obligatory”.

Keywords: convolutional neural network, support vector machine, detection, traffic signs, bootstrap procedures, precision-recall curve

Procedia PDF Downloads 84

18419 Evaluation of Transfer Capability Considering Uncertainties of System Operating Condition and System Cascading Collapse

Authors: Nur Ashida Salim, Muhammad Murtadha Othman, Ismail Musirin, Mohd Salleh Serwan

Abstract:

Over the past few decades, the power system industry in many developing and developed countries has gone through a restructuring process of the industry where they are moving towards a deregulated power industry. This situation will lead to competition among the generation and distribution companies to achieve a certain objective which is to provide quality and efficient production of electric energy, which will reduce the price of electricity. Therefore it is important to obtain an accurate value of the Available Transfer Capability (ATC) and Transmission Reliability Margin (TRM) in order to ensure the effective power transfer between areas during the occurrence of uncertainties in the system. In this paper, the TRM and ATC is determined by taking into consideration the uncertainties of the system operating condition and system cascading collapse by applying the bootstrap technique. A case study of the IEEE RTS-79 is employed to verify the robustness of the technique proposed in the determination of TRM and ATC.

Keywords: available transfer capability, bootstrap technique, cascading collapse, transmission reliability margin

Procedia PDF Downloads 372

18418 Statistical Time-Series and Neural Architecture of Malaria Patients Records in Lagos, Nigeria

Authors: Akinbo Razak Yinka, Adesanya Kehinde Kazeem, Oladokun Oluwagbenga Peter

Abstract:

Time series data are sequences of observations collected over a period of time. Such data can be used to predict health outcomes, such as disease progression, mortality, hospitalization, etc. The Statistical approach is based on mathematical models that capture the patterns and trends of the data, such as autocorrelation, seasonality, and noise, while Neural methods are based on artificial neural networks, which are computational models that mimic the structure and function of biological neurons. This paper compared both parametric and non-parametric time series models of patients treated for malaria in Maternal and Child Health Centres in Lagos State, Nigeria. The forecast methods considered linear regression, Integrated Moving Average, ARIMA and SARIMA Modeling for the parametric approach, while Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM) Network were used for the non-parametric model. The performance of each method is evaluated using the Mean Absolute Error (MAE), R-squared (R2) and Root Mean Square Error (RMSE) as criteria to determine the accuracy of each model. The study revealed that the best performance in terms of error was found in MLP, followed by the LSTM and ARIMA models. In addition, the Bootstrap Aggregating technique was used to make robust forecasts when there are uncertainties in the data.

Keywords: ARIMA, bootstrap aggregation, MLP, LSTM, SARIMA, time-series analysis

Procedia PDF Downloads 31

18417 A Heteroskedasticity Robust Test for Contemporaneous Correlation in Dynamic Panel Data Models

Authors: Andreea Halunga, Chris D. Orme, Takashi Yamagata

Abstract:

This paper proposes a heteroskedasticity-robust Breusch-Pagan test of the null hypothesis of zero cross-section (or contemporaneous) correlation in linear panel-data models, without necessarily assuming independence of the cross-sections. The procedure allows for either fixed, strictly exogenous and/or lagged dependent regressor variables, as well as quite general forms of both non-normality and heteroskedasticity in the error distribution. The asymptotic validity of the test procedure is predicated on the number of time series observations, T, being large relative to the number of cross-section units, N, in that: (i) either N is fixed as T→∞; or, (ii) N²/T→0, as both T and N diverge, jointly, to infinity. Given this, it is not expected that asymptotic theory would provide an adequate guide to finite sample performance when T/N is "small". Because of this, we also propose and establish asymptotic validity of, a number of wild bootstrap schemes designed to provide improved inference when T/N is small. Across a variety of experimental designs, a Monte Carlo study suggests that the predictions from asymptotic theory do, in fact, provide a good guide to the finite sample behaviour of the test when T is large relative to N. However, when T and N are of similar orders of magnitude, discrepancies between the nominal and empirical significance levels occur as predicted by the first-order asymptotic analysis. On the other hand, for all the experimental designs, the proposed wild bootstrap approximations do improve agreement between nominal and empirical significance levels, when T/N is small, with a recursive-design wild bootstrap scheme performing best, in general, and providing quite close agreement between the nominal and empirical significance levels of the test even when T and N are of similar size. Moreover, in comparison with the wild bootstrap "version" of the original Breusch-Pagan test our experiments indicate that the corresponding version of the heteroskedasticity-robust Breusch-Pagan test appears reliable. As an illustration, the proposed tests are applied to a dynamic growth model for a panel of 20 OECD countries.

Keywords: cross-section correlation, time-series heteroskedasticity, dynamic panel data, heteroskedasticity robust Breusch-Pagan test

Procedia PDF Downloads 403

18416 Emotion Dysregulation as Mediator between Child Abuse and Opiate Use Motives

Authors: Usha Barahmand, Ali Khazaee, Goudarz Sadeghi Hashjin

Abstract:

Coping motives are considered to be indicators of problematic substance use. The present investigation examined a model with emotional abuse as an antecedent and emotional dysregulation as a mediator leading to substance use. The intent of this study was to examine the associations between various types of childhood maltreatment and motives for substance use. The sample consisted of 72 male opiate users recruited from those enrolled for Methadone Maintenance treatment. Participants responded to measures of childhood maltreatment, emotion dysregulation, and motives for opiate use. All data were analyzed using Pearson's correlation coefficients and bootstrap analysis of mediation. Results supported the hypothesis that the experience of emotional abuse in childhood is associated with problems in regulating emotions which in turn correlates with opiate use as a way to cope with negative affect, to enhance positive effect or to obtain social rewards. Bootstrap analysis confirmed the mediating role of emotion dysregulation. Findings support the potential utility of further research into emotion dysregulation and motives as antecedents of problematic opiate use.

Keywords: childhood abuse, emotion dysregulation, motives, substance use

Procedia PDF Downloads 419

18415 Consistent Testing for an Implication of Supermodular Dominance with an Application to Verifying the Effect of Geographic Knowledge Spillover

Authors: Chung Danbi, Linton Oliver, Whang Yoon-Jae

Abstract:

Supermodularity, or complementarity, is a popular concept in economics which can characterize many objective functions such as utility, social welfare, and production functions. Further, supermodular dominance captures a preference for greater interdependence among inputs of those functions, and it can be applied to examine which input set would produce higher expected utility, social welfare, or production. Therefore, we propose and justify a consistent testing for a useful implication of supermodular dominance. We also conduct Monte Carlo simulations to explore the finite sample performance of our test, with critical values obtained from the recentered bootstrap method, with and without the selective recentering, and the subsampling method. Under various parameter settings, we confirmed that our test has reasonably good size and power performance. Finally, we apply our test to compare the geographic and distant knowledge spillover in terms of their effects on social welfare using the National Bureau of Economic Research (NBER) patent data. We expect localized citing to supermodularly dominate distant citing if the geographic knowledge spillover engenders greater social welfare than distant knowledge spillover. Taking subgroups based on firm and patent characteristics, we found that there is industry-wise and patent subclass-wise difference in the pattern of supermodular dominance between localized and distant citing. We also compare the results from analyzing different time periods to see if the development of Internet and communication technology has changed the pattern of the dominance. In addition, to appropriately deal with the sparse nature of the data, we apply high-dimensional methods to efficiently select relevant data.

Keywords: supermodularity, supermodular dominance, stochastic dominance, Monte Carlo simulation, bootstrap, subsampling

Procedia PDF Downloads 89

18414 Cross-Validation of the Data Obtained for ω-6 Linoleic and ω-3 α-Linolenic Acids Concentration of Hemp Oil Using Jackknife and Bootstrap Resampling

Authors: Vibha Devi, Shabina Khanam

Abstract:

Hemp (Cannabis sativa) possesses a rich content of ω-6 linoleic and ω-3 linolenic essential fatty acid in the ratio of 3:1, which is a rare and most desired ratio that enhances the quality of hemp oil. These components are beneficial for the development of cell and body growth, strengthen the immune system, possess anti-inflammatory action, lowering the risk of heart problem owing to its anti-clotting property and a remedy for arthritis and various disorders. The present study employs supercritical fluid extraction (SFE) approach on hemp seed at various conditions of parameters; temperature (40 - 80) °C, pressure (200 - 350) bar, flow rate (5 - 15) g/min, particle size (0.430 - 1.015) mm and amount of co-solvent (0 - 10) % of solvent flow rate through central composite design (CCD). CCD suggested 32 sets of experiments, which was carried out. As SFE process includes large number of variables, the present study recommends the application of resampling techniques for cross-validation of the obtained data. Cross-validation refits the model on each data to achieve the information regarding the error, variability, deviation etc. Bootstrap and jackknife are the most popular resampling techniques, which create a large number of data through resampling from the original dataset and analyze these data to check the validity of the obtained data. Jackknife resampling is based on the eliminating one observation from the original sample of size N without replacement. For jackknife resampling, the sample size is 31 (eliminating one observation), which is repeated by 32 times. Bootstrap is the frequently used statistical approach for estimating the sampling distribution of an estimator by resampling with replacement from the original sample. For bootstrap resampling, the sample size is 32, which was repeated by 100 times. Estimands for these resampling techniques are considered as mean, standard deviation, variation coefficient and standard error of the mean. For ω-6 linoleic acid concentration, mean value was approx. 58.5 for both resampling methods, which is the average (central value) of the sample mean of all data points. Similarly, for ω-3 linoleic acid concentration, mean was observed as 22.5 through both resampling. Variance exhibits the spread out of the data from its mean. Greater value of variance exhibits the large range of output data, which is 18 for ω-6 linoleic acid (ranging from 48.85 to 63.66 %) and 6 for ω-3 linoleic acid (ranging from 16.71 to 26.2 %). Further, low value of standard deviation (approx. 1 %), low standard error of the mean (< 0.8) and low variance coefficient (< 0.2) reflect the accuracy of the sample for prediction. All the estimator value of variance coefficients, standard deviation and standard error of the mean are found within the 95 % of confidence interval.

Keywords: resampling, supercritical fluid extraction, hemp oil, cross-validation

Procedia PDF Downloads 119

18413 Machine Learning Analysis of Eating Disorders Risk, Physical Activity and Psychological Factors in Adolescents: A Community Sample Study

Authors: Marc Toutain, Pascale Leconte, Antoine Gauthier

Abstract:

Introduction: Eating Disorders (ED), such as anorexia, bulimia, and binge eating, are psychiatric illnesses that mostly affect young people. The main symptoms concern eating (restriction, excessive food intake) and weight control behaviors (laxatives, vomiting). Psychological comorbidities (depression, executive function disorders, etc.) and problematic behaviors toward physical activity (PA) are commonly associated with ED. Acquaintances on ED risk factors are still lacking, and more community sample studies are needed to improve prevention and early detection. To our knowledge, studies are needed to specifically investigate the link between ED risk level, PA, and psychological risk factors in a community sample of adolescents. The aim of this study is to assess the relation between ED risk level, exercise (type, frequency, and motivations for engaging in exercise), and psychological factors based on the Jacobi risk factors model. We suppose that a high risk of ED will be associated with the practice of high caloric cost PA, motivations oriented to weight and shape control, and psychological disturbances. Method: An online survey destined for students has been sent to several middle schools and colleges in northwest France. This survey combined several questionnaires, the Eating Attitude Test-26 assessing ED risk; the Exercise Motivation Inventory–2 assessing motivations toward PA; the Hospital Anxiety and Depression Scale assessing anxiety and depression, the Contour Drawing Rating Scale; and the Body Esteem Scale assessing body dissatisfaction, Rosenberg Self-esteem Scale assessing self-esteem, the Exercise Dependence Scale-Revised assessing PA dependence, the Multidimensional Assessment of Interoceptive Awareness assessing interoceptive awareness and the Frost Multidimensional Perfectionism Scale assessing perfectionism. Machine learning analysis will be performed in order to constitute groups with a tree-based model clustering method, extract risk profile(s) with a bootstrap method comparison, and predict ED risk with a prediction method based on a decision tree-based model. Expected results: 1044 complete records have already been collected, and the survey will be closed at the end of May 2022. Records will be analyzed with a clustering method and a bootstrap method in order to reveal risk profile(s). Furthermore, a predictive tree decision method will be done to extract an accurate predictive model of ED risk. This analysis will confirm typical main risk factors and will give more data on presumed strong risk factors such as exercise motivations and interoceptive deficit. Furthermore, it will enlighten particular risk profiles with a strong level of proof and greatly contribute to improving the early detection of ED and contribute to a better understanding of ED risk factors.

Keywords: eating disorders, risk factors, physical activity, machine learning

Procedia PDF Downloads 54

18412 Refined Procedures for Second Order Asymptotic Theory

Authors: Gubhinder Kundhi, Paul Rilstone

Abstract:

Refined procedures for higher-order asymptotic theory for non-linear models are developed. These include a new method for deriving stochastic expansions of arbitrary order, new methods for evaluating the moments of polynomials of sample averages, a new method for deriving the approximate moments of the stochastic expansions; an application of these techniques to gather improved inferences with the weak instruments problem is considered. It is well established that Instrumental Variable (IV) estimators in the presence of weak instruments can be poorly behaved, in particular, be quite biased in finite samples. In our application, finite sample approximations to the distributions of these estimators are obtained using Edgeworth and Saddlepoint expansions. Departures from normality of the distributions of these estimators are analyzed using higher order analytical corrections in these expansions. In a Monte-Carlo experiment, the performance of these expansions is compared to the first order approximation and other methods commonly used in finite samples such as the bootstrap.

Keywords: edgeworth expansions, higher order asymptotics, saddlepoint expansions, weak instruments

Procedia PDF Downloads 249

18411 Determining Optimal Number of Trees in Random Forests

Authors: Songul Cinaroglu

Abstract:

Background: Random Forest is an efficient, multi-class machine learning method using for classification, regression and other tasks. This method is operating by constructing each tree using different bootstrap sample of the data. Determining the number of trees in random forests is an open question in the literature for studies about improving classification performance of random forests. Aim: The aim of this study is to analyze whether there is an optimal number of trees in Random Forests and how performance of Random Forests differ according to increase in number of trees using sample health data sets in R programme. Method: In this study we analyzed the performance of Random Forests as the number of trees grows and doubling the number of trees at every iteration using “random forest” package in R programme. For determining minimum and optimal number of trees we performed Mc Nemar test and Area Under ROC Curve respectively. Results: At the end of the analysis it was found that as the number of trees grows, it does not always means that the performance of the forest is better than forests which have fever trees. In other words larger number of trees only increases computational costs but not increases performance results. Conclusion: Despite general practice in using random forests is to generate large number of trees for having high performance results, this study shows that increasing number of trees doesn’t always improves performance. Future studies can compare different kinds of data sets and different performance measures to test whether Random Forest performance results change as number of trees increase or not.

Keywords: classification methods, decision trees, number of trees, random forest

Procedia PDF Downloads 370

18410 Two-Stage Hospital Efficiency Analysis Including Qualitative Evidence: A Greek Case

Authors: Panos Xenos, Milton Nektarios, John Yfantopoulos

Abstract:

Background: Policy makers, professional organizations and payers have introduced a variety of initiatives and reforms for the health systems worldwide, aimed at improving hospital efficiency. Their efforts are concentrated in two main categories: to constrain increasing healthcare costs and to enhance quality of services provided. Research Objectives: This study examines the efficiency of 112 Greek public hospitals for the year 2009, evaluates the importance of bootstrapping techniques and investigates the effect of contextual factors on hospital efficiency. Furthermore, the effect of qualitative evidence, on hospital efficiency is explored using data from 28 large hospitals. Methods: We applied Data Envelopment Analysis, augmented by bootstrapping techniques, to estimate efficiency scores. In order to measure the effect of environmental factors on hospital efficiency we used Tobit regression analysis. The significance of our models is evaluated using statistical tests to compare distributions. Results: The Kolmogorov-Smirnov test between the original and the bootstrap-corrected efficiency indicates that their distributions are significantly different (p-value<0.01). The environmental factors, that seem to influence efficiency, are Occupancy Rating and the ratio between Outpatient Visits and Inpatient Days. Results indicate that the inclusion of the quality variable in DEA modelling generates statistically significant variations in efficiency scores (p-value<0.05). Conclusions: The inclusion of quality variables and the use of bootstrap resampling in efficiency analysis impose a statistically significant effect on the distribution of efficiency scores. As a policy conclusion we highlight the importance of these methods on hospital efficiency analysis and, by implication, on healthcare resource allocation.

Keywords: hospitals, efficiency, quality, data envelopment analysis, Greek public hospital sector

Procedia PDF Downloads 276

18409 The Impact of Artificial Intelligence on Qualty Conrol and Quality

Authors: Mary Moner Botros Fanawel

Abstract:

Many companies use the statistical tool named as statistical quality control, and which can have a high cost for the companies interested on these statistical tools. The evaluation of the quality of products and services is an important topic, but the reduction of the cost of the implantation of the statistical quality control also has important benefits for the companies. For this reason, it is important to implement a economic design for the various steps included into the statistical quality control. In this paper, we describe some relevant aspects related to the economic design of a quality control chart for the proportion of defective items. They are very important because the suggested issues can reduce the cost of implementing a quality control chart for the proportion of defective items. Note that the main purpose of this chart is to evaluate and control the proportion of defective items of a production process.

Keywords: model predictive control, hierarchical control structure, genetic algorithm, water quality with DBPs objectives proportion, type I error, economic plan, distribution function bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics

Procedia PDF Downloads 10

18408 Re-Stating the Origin of Tetrapod Using Measures of Phylogenetic Support for Phylogenomic Data

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to re-investigate the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high internode certainty, relative gene support, and high gene concordance factor. The evidence stems from five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup, such as slow-evolving species, while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: novel measures of phylogenetic support for phylogenomic data, gene concordance factor confidence, relative gene support, internode certainty, origin of tetrapods

Procedia PDF Downloads 27

18407 Measures of Phylogenetic Support for Phylogenomic and the Whole Genomes of Two Lungfish Restate Lungfish and Origin of Land Vertebrates

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to reassess the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high gene support confidence with confidence intervals exceeding 95%, high internode certainty, and high gene concordance factor. The evidence stems from two datasets containing recently deciphered whole genomes of two lungfish species, as well as five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa diminishes the number of orthologues and leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction (LBA) and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: gene support confidence (GSC), origin of land vertebrates, coelacanth, two whole genomes of lungfishes, confidence intervals

Procedia PDF Downloads 47