Search results for: bootstrap analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26750

Search results for: bootstrap analysis

26750 The Contribution of Edgeworth, Bootstrap and Monte Carlo Methods in Financial Data

Authors: Edlira Donefski, Tina Donefski, Lorenc Ekonomi

Abstract:

Edgeworth Approximation, Bootstrap, and Monte Carlo Simulations have considerable impacts on achieving certain results related to different problems taken into study. In our paper, we have treated a financial case related to the effect that has the components of a cash-flow of one of the most successful businesses in the world, as the financial activity, operational activity, and investment activity to the cash and cash equivalents at the end of the three-months period. To have a better view of this case, we have created a vector autoregression model, and after that, we have generated the impulse responses in the terms of asymptotic analysis (Edgeworth Approximation), Monte Carlo Simulations, and residual bootstrap based on the standard errors of every series created. The generated results consisted of the common tendencies for the three methods applied that consequently verified the advantage of the three methods in the optimization of the model that contains many variants.

Keywords: autoregression, bootstrap, edgeworth expansion, Monte Carlo method

Procedia PDF Downloads 112
26749 An Application of Modified M-out-of-N Bootstrap Method to Heavy-Tailed Distributions

Authors: Hannah F. Opayinka, Adedayo A. Adepoju

Abstract:

This study is an extension of a prior study on the modification of the existing m-out-of-n (moon) bootstrap method for heavy-tailed distributions in which modified m-out-of-n (mmoon) was proposed as an alternative method to the existing moon technique. In this study, both moon and mmoon techniques were applied to two real income datasets which followed Lognormal and Pareto distributions respectively with finite variances. The performances of these two techniques were compared using Standard Error (SE) and Root Mean Square Error (RMSE). The findings showed that mmoon outperformed moon bootstrap in terms of smaller SEs and RMSEs for all the sample sizes considered in the two datasets.

Keywords: Bootstrap, income data, lognormal distribution, Pareto distribution

Procedia PDF Downloads 150
26748 Using the Bootstrap for Problems Statistics

Authors: Brahim Boukabcha, Amar Rebbouh

Abstract:

The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.

Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models

Procedia PDF Downloads 351
26747 Asymptotic Spectral Theory for Nonlinear Random Fields

Authors: Karima Kimouche

Abstract:

In this paper, we consider the asymptotic problems in spectral analysis of stationary causal random fields. We impose conditions only involving (conditional) moments, which are easily verifiable for a variety of nonlinear random fields. Limiting distributions of periodograms and smoothed periodogram spectral density estimates are obtained and applications to the spectral domain bootstrap are given.

Keywords: spatial nonlinear processes, spectral estimators, GMC condition, bootstrap method

Procedia PDF Downloads 421
26746 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart

Authors: O. Ikpotokin

Abstract:

In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.

Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics

Procedia PDF Downloads 317
26745 Reminiscence Therapy for Alzheimer’s Disease Restrained on Logistic Regression Based Linear Bootstrap Aggregating

Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Xianpei Li, Yanmin Yuan, Tracy Lin Huan

Abstract:

Researchers are doing enchanting research into the inherited features of Alzheimer’s disease and probable consistent therapies. In Alzheimer’s, memories are extinct in reverse order; memories formed lately are more transitory than those from formerly. Reminiscence therapy includes the conversation of past actions, trials and knowledges with another individual or set of people, frequently with the help of perceptible reminders such as photos, household and other acquainted matters from the past, music and collection of tapes. In this manuscript, the competence of reminiscence therapy for Alzheimer’s disease is measured using logistic regression based linear bootstrap aggregating. Logistic regression is used to envisage the experiential features of the patient’s memory through various therapies. Linear bootstrap aggregating shows better stability and accuracy of reminiscence therapy used in statistical classification and regression of memories related to validation therapy, supportive psychotherapy, sensory integration and simulated presence therapy.

Keywords: Alzheimer’s disease, linear bootstrap aggregating, logistic regression, reminiscence therapy

Procedia PDF Downloads 272
26744 Approximate Confidence Interval for Effect Size Base on Bootstrap Resampling Method

Authors: S. Phanyaem

Abstract:

This paper presents the confidence intervals for the effect size base on bootstrap resampling method. The meta-analytic confidence interval for effect size is proposed that are easy to compute. A Monte Carlo simulation study was conducted to compare the performance of the proposed confidence intervals with the existing confidence intervals. The best confidence interval method will have a coverage probability close to 0.95. Simulation results have shown that our proposed confidence intervals perform well in terms of coverage probability and expected length.

Keywords: effect size, confidence interval, bootstrap method, resampling

Procedia PDF Downloads 564
26743 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski

Abstract:

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Keywords: bootstrap, edgeworth approximation, IID, quantile

Procedia PDF Downloads 126
26742 Efficiency, Effectiveness, and Technological Change in Armed Forces: Indonesian Case

Authors: Citra Pertiwi, Muhammad Fikruzzaman Rahawarin

Abstract:

Government of Indonesia had committed to increasing its national defense the budget up to 1,5 percent of GDP. However, the budget increase does not necessarily allocate efficiently and effectively. Using Data Envelopment Analysis (DEA), the operational units of Indonesian Armed Forces are considered as a proxy to measure those two aspects. The bootstrap technique is being used as well to reduce uncertainty in the estimation. Additionally, technological change is being measured as a nonstationary component. Nearly half of the units are being estimated as fully efficient, with less than a third is considered as effective. Longer and larger sets of data might increase the robustness of the estimation in the future.

Keywords: bootstrap, effectiveness, efficiency, DEA, military, Malmquist, technological change

Procedia PDF Downloads 272
26741 Subfamilial Relationships within Solanaceae as Inferred from atpB-rbcL Intergenic Spacer

Authors: Syeda Qamarunnisa, Ishrat Jamil, Abid Azhar, Zabta K. Shinwari, Syed Irtifaq Ali

Abstract:

A phylogenetic analysis of family Solanaceae was conducted using sequence data from the chloroplast intergenic atpB-rbcL spacer. Sequence data was generated from 17 species representing 09 out of 14 genera of Solanaceae from Pakistan. Cladogram was constructed using maximum parsimony method and results indicate that Solanaceae is mainly divided into two subfamilies; Solanoideae and Cestroideae. Four major clades within Solanoideae represent tribes; Physaleae, Capsiceae, Datureae and Solaneae are supported by high bootstrap value and the relationships among them are not corroborating with the previous studies. The findings established that subfamily Cestroideae comprised of three genera; Cestrum, Lycium, and Nicotiana with high bootstrap support. Position of Nicotiana inferred with atpB-rbcL sequence is congruent with traditional classification, which placed the taxa in Cestroideae. In the current study Lycium unexpectedly nested with Nicotiana with 100% bootstrap support and identified as a member of tribe Nicotianeae. Expanded sampling of other genera from Pakistan could be valuable towards improving our understanding of intrafamilial relationships within Solanaceae.

Keywords: systematics, solanaceae, phylogenetics, intergenic spacer, tribes

Procedia PDF Downloads 434
26740 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

Introduction: The problems of unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many research papers found that the performance of existing classifier tends to be biased towards the majority class. The k -nearest neighbors’ nonparametric discriminant analysis is one method that was proposed for classifying unbalanced classes with good performance. Hence, the methods of discriminant analysis are of interest to us in investigating misclassification error rates for class-imbalanced data of three diabetes risk groups. Objective: The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification application of class-imbalanced data of diabetes risk groups. Methods: Data from a healthy project for 599 staffs in a government hospital in Bangkok were obtained for the classification problem. The staffs were diagnosed into one of three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data along with the variables; diabetes risk group, age, gender, cholesterol, and BMI was analyzed and bootstrapped up to 50 and 100 samples, 599 observations per sample, for additional estimation of misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples show non-normality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. In finding the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions with three choices of (0.90:0.05:0.05), (0.80: 0.10: 0.10) or (0.70, 0.15, 0.15). Results: The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k = 3 or k = 4 and the prior probabilities of {non-risk:risk:diabetic} as {0.90:0.05:0.05} or {0.80:0.10:0.10} gave the smallest error rate of misclassification. Conclusion: The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: error rate, bootstrap, diabetes risk groups, k-nearest neighbors

Procedia PDF Downloads 404
26739 Emotion Dysregulation as Mediator between Child Abuse and Opiate Use Motives

Authors: Usha Barahmand, Ali Khazaee, Goudarz Sadeghi Hashjin

Abstract:

Coping motives are considered to be indicators of problematic substance use. The present investigation examined a model with emotional abuse as an antecedent and emotional dysregulation as a mediator leading to substance use. The intent of this study was to examine the associations between various types of childhood maltreatment and motives for substance use. The sample consisted of 72 male opiate users recruited from those enrolled for Methadone Maintenance treatment. Participants responded to measures of childhood maltreatment, emotion dysregulation, and motives for opiate use. All data were analyzed using Pearson's correlation coefficients and bootstrap analysis of mediation. Results supported the hypothesis that the experience of emotional abuse in childhood is associated with problems in regulating emotions which in turn correlates with opiate use as a way to cope with negative affect, to enhance positive effect or to obtain social rewards. Bootstrap analysis confirmed the mediating role of emotion dysregulation. Findings support the potential utility of further research into emotion dysregulation and motives as antecedents of problematic opiate use.

Keywords: childhood abuse, emotion dysregulation, motives, substance use

Procedia PDF Downloads 418
26738 Finite Sample Inferences for Weak Instrument Models

Authors: Gubhinder Kundhi, Paul Rilstone

Abstract:

It is well established that Instrumental Variable (IV) estimators in the presence of weak instruments can be poorly behaved, in particular, be quite biased in finite samples. Finite sample approximations to the distributions of these estimators are obtained using Edgeworth and Saddlepoint expansions. Departures from normality of the distributions of these estimators are analyzed using higher order analytical corrections in these expansions. In a Monte-Carlo experiment, the performance of these expansions is compared to the first order approximation and other methods commonly used in finite samples such as the bootstrap.

Keywords: bootstrap, Instrumental Variable, Edgeworth expansions, Saddlepoint expansions

Procedia PDF Downloads 280
26737 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 463
26736 Confidence Intervals for Quantiles in the Two-Parameter Exponential Distributions with Type II Censored Data

Authors: Ayman Baklizi

Abstract:

Based on type II censored data, we consider interval estimation of the quantiles of the two-parameter exponential distribution and the difference between the quantiles of two independent two-parameter exponential distributions. We derive asymptotic intervals, Bayesian, as well as intervals based on the generalized pivot variable. We also include some bootstrap intervals in our comparisons. The performance of these intervals is investigated in terms of their coverage probabilities and expected lengths.

Keywords: asymptotic intervals, Bayes intervals, bootstrap, generalized pivot variables, two-parameter exponential distribution, quantiles

Procedia PDF Downloads 378
26735 An Aspiring Solution to the Man in the Middle Bootstrap Vulnerability

Authors: Mouad Zouina, Benaceur Outtaj

Abstract:

The proposed work falls within the context of improving data security for m-commerce systems. In this context we have placed under the light some flaws encountered in HTTPS the most used m-commerce protocol, particularly the man in the middle attack, shortly MITM. The man in the middle attack is an active listening attack. The idea of this attack is to target the handshake phase of the HTTPS protocol which is the transition from a non-secure connection to a secure connection in our case HTTP to HTTPS. This paper proposes a solution to fix those flaws based on the upgrade of HSTS standard handshake sequence using the DNSSEC standard.

Keywords: m-commerce, HTTPS, HSTS, DNSSEC, MITM bootstrap vulnerability

Procedia PDF Downloads 365
26734 A Heteroskedasticity Robust Test for Contemporaneous Correlation in Dynamic Panel Data Models

Authors: Andreea Halunga, Chris D. Orme, Takashi Yamagata

Abstract:

This paper proposes a heteroskedasticity-robust Breusch-Pagan test of the null hypothesis of zero cross-section (or contemporaneous) correlation in linear panel-data models, without necessarily assuming independence of the cross-sections. The procedure allows for either fixed, strictly exogenous and/or lagged dependent regressor variables, as well as quite general forms of both non-normality and heteroskedasticity in the error distribution. The asymptotic validity of the test procedure is predicated on the number of time series observations, T, being large relative to the number of cross-section units, N, in that: (i) either N is fixed as T→∞; or, (ii) N²/T→0, as both T and N diverge, jointly, to infinity. Given this, it is not expected that asymptotic theory would provide an adequate guide to finite sample performance when T/N is "small". Because of this, we also propose and establish asymptotic validity of, a number of wild bootstrap schemes designed to provide improved inference when T/N is small. Across a variety of experimental designs, a Monte Carlo study suggests that the predictions from asymptotic theory do, in fact, provide a good guide to the finite sample behaviour of the test when T is large relative to N. However, when T and N are of similar orders of magnitude, discrepancies between the nominal and empirical significance levels occur as predicted by the first-order asymptotic analysis. On the other hand, for all the experimental designs, the proposed wild bootstrap approximations do improve agreement between nominal and empirical significance levels, when T/N is small, with a recursive-design wild bootstrap scheme performing best, in general, and providing quite close agreement between the nominal and empirical significance levels of the test even when T and N are of similar size. Moreover, in comparison with the wild bootstrap "version" of the original Breusch-Pagan test our experiments indicate that the corresponding version of the heteroskedasticity-robust Breusch-Pagan test appears reliable. As an illustration, the proposed tests are applied to a dynamic growth model for a panel of 20 OECD countries.

Keywords: cross-section correlation, time-series heteroskedasticity, dynamic panel data, heteroskedasticity robust Breusch-Pagan test

Procedia PDF Downloads 401
26733 Two-Stage Hospital Efficiency Analysis Including Qualitative Evidence: A Greek Case

Authors: Panos Xenos, Milton Nektarios, John Yfantopoulos

Abstract:

Background: Policy makers, professional organizations and payers have introduced a variety of initiatives and reforms for the health systems worldwide, aimed at improving hospital efficiency. Their efforts are concentrated in two main categories: to constrain increasing healthcare costs and to enhance quality of services provided. Research Objectives: This study examines the efficiency of 112 Greek public hospitals for the year 2009, evaluates the importance of bootstrapping techniques and investigates the effect of contextual factors on hospital efficiency. Furthermore, the effect of qualitative evidence, on hospital efficiency is explored using data from 28 large hospitals. Methods: We applied Data Envelopment Analysis, augmented by bootstrapping techniques, to estimate efficiency scores. In order to measure the effect of environmental factors on hospital efficiency we used Tobit regression analysis. The significance of our models is evaluated using statistical tests to compare distributions. Results: The Kolmogorov-Smirnov test between the original and the bootstrap-corrected efficiency indicates that their distributions are significantly different (p-value<0.01). The environmental factors, that seem to influence efficiency, are Occupancy Rating and the ratio between Outpatient Visits and Inpatient Days. Results indicate that the inclusion of the quality variable in DEA modelling generates statistically significant variations in efficiency scores (p-value<0.05). Conclusions: The inclusion of quality variables and the use of bootstrap resampling in efficiency analysis impose a statistically significant effect on the distribution of efficiency scores. As a policy conclusion we highlight the importance of these methods on hospital efficiency analysis and, by implication, on healthcare resource allocation.

Keywords: hospitals, efficiency, quality, data envelopment analysis, Greek public hospital sector

Procedia PDF Downloads 273
26732 Traffic Sign Recognition System Using Convolutional Neural NetworkDevineni

Authors: Devineni Vijay Bhaskar, Yendluri Raja

Abstract:

We recommend a model for traffic sign detection stranded on Convolutional Neural Networks (CNN). We first renovate the unique image into the gray scale image through with support vector machines, then use convolutional neural networks with fixed and learnable layers for revealing and understanding. The permanent layer can reduction the amount of attention areas to notice and crop the limits very close to the boundaries of traffic signs. The learnable coverings can rise the accuracy of detection significantly. Besides, we use bootstrap procedures to progress the accuracy and avoid overfitting problem. In the German Traffic Sign Detection Benchmark, we obtained modest results, with an area under the precision-recall curve (AUC) of 99.49% in the group “Risk”, and an AUC of 96.62% in the group “Obligatory”.

Keywords: convolutional neural network, support vector machine, detection, traffic signs, bootstrap procedures, precision-recall curve

Procedia PDF Downloads 83
26731 Evaluation of Transfer Capability Considering Uncertainties of System Operating Condition and System Cascading Collapse

Authors: Nur Ashida Salim, Muhammad Murtadha Othman, Ismail Musirin, Mohd Salleh Serwan

Abstract:

Over the past few decades, the power system industry in many developing and developed countries has gone through a restructuring process of the industry where they are moving towards a deregulated power industry. This situation will lead to competition among the generation and distribution companies to achieve a certain objective which is to provide quality and efficient production of electric energy, which will reduce the price of electricity. Therefore it is important to obtain an accurate value of the Available Transfer Capability (ATC) and Transmission Reliability Margin (TRM) in order to ensure the effective power transfer between areas during the occurrence of uncertainties in the system. In this paper, the TRM and ATC is determined by taking into consideration the uncertainties of the system operating condition and system cascading collapse by applying the bootstrap technique. A case study of the IEEE RTS-79 is employed to verify the robustness of the technique proposed in the determination of TRM and ATC.

Keywords: available transfer capability, bootstrap technique, cascading collapse, transmission reliability margin

Procedia PDF Downloads 370
26730 Confidence Envelopes for Parametric Model Selection Inference and Post-Model Selection Inference

Authors: I. M. L. Nadeesha Jayaweera, Adao Alex Trindade

Abstract:

In choosing a candidate model in likelihood-based modeling via an information criterion, the practitioner is often faced with the difficult task of deciding just how far up the ranked list to look. Motivated by this pragmatic necessity, we construct an uncertainty band for a generalized (model selection) information criterion (GIC), defined as a criterion for which the limit in probability is identical to that of the normalized log-likelihood. This includes common special cases such as AIC & BIC. The method starts from the asymptotic normality of the GIC for the joint distribution of the candidate models in an independent and identically distributed (IID) data framework and proceeds by deriving the (asymptotically) exact distribution of the minimum. The calculation of an upper quantile for its distribution then involves the computation of multivariate Gaussian integrals, which is amenable to efficient implementation via the R package "mvtnorm". The performance of the methodology is tested on simulated data by checking the coverage probability of nominal upper quantiles and compared to the bootstrap. Both methods give coverages close to nominal for large samples, but the bootstrap is two orders of magnitude slower. The methodology is subsequently extended to two other commonly used model structures: regression and time series. In the regression case, we derive the corresponding asymptotically exact distribution of the minimum GIC invoking Lindeberg-Feller type conditions for triangular arrays and are thus able to similarly calculate upper quantiles for its distribution via multivariate Gaussian integration. The bootstrap once again provides a default competing procedure, and we find that similar comparison performance metrics hold as for the IID case. The time series case is complicated by far more intricate asymptotic regime for the joint distribution of the model GIC statistics. Under a Gaussian likelihood, the default in most packages, one needs to derive the limiting distribution of a normalized quadratic form for a realization from a stationary series. Under conditions on the process satisfied by ARMA models, a multivariate normal limit is once again achieved. The bootstrap can, however, be employed for its computation, whence we are once again in the multivariate Gaussian integration paradigm for upper quantile evaluation. Comparisons of this bootstrap-aided semi-exact method with the full-blown bootstrap once again reveal a similar performance but faster computation speeds. One of the most difficult problems in contemporary statistical methodological research is to be able to account for the extra variability introduced by model selection uncertainty, the so-called post-model selection inference (PMSI). We explore ways in which the GIC uncertainty band can be inverted to make inferences on the parameters. This is being attempted in the IID case by pivoting the CDF of the asymptotically exact distribution of the minimum GIC. For inference one parameter at a time and a small number of candidate models, this works well, whence the attained PMSI confidence intervals are wider than the MLE-based Wald, as expected.

Keywords: model selection inference, generalized information criteria, post model selection, Asymptotic Theory

Procedia PDF Downloads 57
26729 Statistical Time-Series and Neural Architecture of Malaria Patients Records in Lagos, Nigeria

Authors: Akinbo Razak Yinka, Adesanya Kehinde Kazeem, Oladokun Oluwagbenga Peter

Abstract:

Time series data are sequences of observations collected over a period of time. Such data can be used to predict health outcomes, such as disease progression, mortality, hospitalization, etc. The Statistical approach is based on mathematical models that capture the patterns and trends of the data, such as autocorrelation, seasonality, and noise, while Neural methods are based on artificial neural networks, which are computational models that mimic the structure and function of biological neurons. This paper compared both parametric and non-parametric time series models of patients treated for malaria in Maternal and Child Health Centres in Lagos State, Nigeria. The forecast methods considered linear regression, Integrated Moving Average, ARIMA and SARIMA Modeling for the parametric approach, while Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM) Network were used for the non-parametric model. The performance of each method is evaluated using the Mean Absolute Error (MAE), R-squared (R2) and Root Mean Square Error (RMSE) as criteria to determine the accuracy of each model. The study revealed that the best performance in terms of error was found in MLP, followed by the LSTM and ARIMA models. In addition, the Bootstrap Aggregating technique was used to make robust forecasts when there are uncertainties in the data.

Keywords: ARIMA, bootstrap aggregation, MLP, LSTM, SARIMA, time-series analysis

Procedia PDF Downloads 28
26728 Cross-Validation of the Data Obtained for ω-6 Linoleic and ω-3 α-Linolenic Acids Concentration of Hemp Oil Using Jackknife and Bootstrap Resampling

Authors: Vibha Devi, Shabina Khanam

Abstract:

Hemp (Cannabis sativa) possesses a rich content of ω-6 linoleic and ω-3 linolenic essential fatty acid in the ratio of 3:1, which is a rare and most desired ratio that enhances the quality of hemp oil. These components are beneficial for the development of cell and body growth, strengthen the immune system, possess anti-inflammatory action, lowering the risk of heart problem owing to its anti-clotting property and a remedy for arthritis and various disorders. The present study employs supercritical fluid extraction (SFE) approach on hemp seed at various conditions of parameters; temperature (40 - 80) °C, pressure (200 - 350) bar, flow rate (5 - 15) g/min, particle size (0.430 - 1.015) mm and amount of co-solvent (0 - 10) % of solvent flow rate through central composite design (CCD). CCD suggested 32 sets of experiments, which was carried out. As SFE process includes large number of variables, the present study recommends the application of resampling techniques for cross-validation of the obtained data. Cross-validation refits the model on each data to achieve the information regarding the error, variability, deviation etc. Bootstrap and jackknife are the most popular resampling techniques, which create a large number of data through resampling from the original dataset and analyze these data to check the validity of the obtained data. Jackknife resampling is based on the eliminating one observation from the original sample of size N without replacement. For jackknife resampling, the sample size is 31 (eliminating one observation), which is repeated by 32 times. Bootstrap is the frequently used statistical approach for estimating the sampling distribution of an estimator by resampling with replacement from the original sample. For bootstrap resampling, the sample size is 32, which was repeated by 100 times. Estimands for these resampling techniques are considered as mean, standard deviation, variation coefficient and standard error of the mean. For ω-6 linoleic acid concentration, mean value was approx. 58.5 for both resampling methods, which is the average (central value) of the sample mean of all data points. Similarly, for ω-3 linoleic acid concentration, mean was observed as 22.5 through both resampling. Variance exhibits the spread out of the data from its mean. Greater value of variance exhibits the large range of output data, which is 18 for ω-6 linoleic acid (ranging from 48.85 to 63.66 %) and 6 for ω-3 linoleic acid (ranging from 16.71 to 26.2 %). Further, low value of standard deviation (approx. 1 %), low standard error of the mean (< 0.8) and low variance coefficient (< 0.2) reflect the accuracy of the sample for prediction. All the estimator value of variance coefficients, standard deviation and standard error of the mean are found within the 95 % of confidence interval.

Keywords: resampling, supercritical fluid extraction, hemp oil, cross-validation

Procedia PDF Downloads 117
26727 A Study of the Depression Status of Asian American Adolescents

Authors: Selina Lin, Justin M Fan, Vincent Zhang, Cindy Chen, Daniel Lam, Jason Yan, Ning Zhang

Abstract:

Depression is one of the most common mental disorders in the United States, and past studies have shown a concerning increase in the rates of depression in youth populations over time. Furthermore, depression is an especially important issue for Asian Americans because of the anti-Asian violence taking place during the COVID-19 pandemic. While Asian American adolescents are reluctant to seek help for mental health issues, past research has found a prevalence of depressive symptoms in them that have yet to be fully investigated. There have been studies conducted to understand and observe the impacts of multifarious factors influencing the mental well-being of Asian American adolescents; however, they have been generally limited to qualitative investigation, and very few have attempted to quantitatively evaluate the relationship between depression levels and a comprehensive list of factors for those levels at the same time. To better quantify these relationships, this project investigated the prevalence of depression in Asian American teenagers mainly from the Greater Philadelphia Region, aged 12 to 19, and, with an anonymous survey, asked participants 48 multiple-choice questions pertaining to demographic information, daily behaviors, school life, family life, depression levels (quantified by the PHQ-9 assessment), school and family support against depression. Each multiple-choice question was assigned as a factor and variable for statistical and dominance analysis to determine the most influential factors on depression levels of Asian American adolescents. The results were validated via Bootstrap analysis and t-tests. While certain influential factors identified in this survey are consistent with the literature, such as parent-child relationship and peer pressure, several dominant factors were relatively overlooked in the past. These factors include the parents’ relationship with each other, the satisfaction with body image, sex identity, support from the family and support from the school. More than 25% of participants desired more support from their families and schools in handling depression issues. This study implied that it is beneficial for Asian American parents and adolescents to take programs on parents’ relationships with each other, parent-child communication, mental health, and sexual identity. A culturally inclusive school environment and more accessible mental health services would be helpful for Asian American adolescents to combat depression. This survey-based study paved the way for further investigation of effective approaches for helping Asian American adolescents against depression.

Keywords: Asian American adolescents, depression, dominance analysis, t-test, bootstrap analysis

Procedia PDF Downloads 102
26726 In silico Comparative Analysis of Chloroplast Genome (cpDNA) and Some Individual Genes (rbcL and trnH-psbA) in Pooideae Subfamily Members

Authors: Ibrahim Ilker Ozyigit, Ertugrul Filiz, Ilhan Dogan

Abstract:

An in silico analysis of Brachypodium distachyon, Triticum aestivum, Festuca arundinacea, Lolium perenne, Hordeum vulgare subsp. vulgare of the Pooideaea was performed based on complete chloroplast genomes including rbcL coding and trnH-psbA intergenic spacer regions alone to compare phylogenetic resolving power. Neighbor-joining, Minimum Evolution, and Unweighted Pair Group Method with arithmetic mean methods were used to reconstruct phylogenies with the highest bootstrap supported the obtained data from whole chloroplast genome sequence. The highest and lowest values from nucleotide diversity (π) analysis were found to be 0.315813 and 0.043495 in rbcL coding region in chloroplast genome and complete chloroplast genome, respectively. The highest transition/transversion bias (R) value was recorded as 1.384 in complete chloroplast genomes. F. arudinacea-L. perenne clade was uncovered in all phylogenies. Sequences of rbcL and trnH-psbA regions were not able to resolve the Pooideae phylogenies due to lack of genetic variation.

Keywords: chloroplast DNA, Pooideae, phylogenetic analysis, rbcL, trnH-psbA

Procedia PDF Downloads 347
26725 Machine Learning Analysis of Eating Disorders Risk, Physical Activity and Psychological Factors in Adolescents: A Community Sample Study

Authors: Marc Toutain, Pascale Leconte, Antoine Gauthier

Abstract:

Introduction: Eating Disorders (ED), such as anorexia, bulimia, and binge eating, are psychiatric illnesses that mostly affect young people. The main symptoms concern eating (restriction, excessive food intake) and weight control behaviors (laxatives, vomiting). Psychological comorbidities (depression, executive function disorders, etc.) and problematic behaviors toward physical activity (PA) are commonly associated with ED. Acquaintances on ED risk factors are still lacking, and more community sample studies are needed to improve prevention and early detection. To our knowledge, studies are needed to specifically investigate the link between ED risk level, PA, and psychological risk factors in a community sample of adolescents. The aim of this study is to assess the relation between ED risk level, exercise (type, frequency, and motivations for engaging in exercise), and psychological factors based on the Jacobi risk factors model. We suppose that a high risk of ED will be associated with the practice of high caloric cost PA, motivations oriented to weight and shape control, and psychological disturbances. Method: An online survey destined for students has been sent to several middle schools and colleges in northwest France. This survey combined several questionnaires, the Eating Attitude Test-26 assessing ED risk; the Exercise Motivation Inventory–2 assessing motivations toward PA; the Hospital Anxiety and Depression Scale assessing anxiety and depression, the Contour Drawing Rating Scale; and the Body Esteem Scale assessing body dissatisfaction, Rosenberg Self-esteem Scale assessing self-esteem, the Exercise Dependence Scale-Revised assessing PA dependence, the Multidimensional Assessment of Interoceptive Awareness assessing interoceptive awareness and the Frost Multidimensional Perfectionism Scale assessing perfectionism. Machine learning analysis will be performed in order to constitute groups with a tree-based model clustering method, extract risk profile(s) with a bootstrap method comparison, and predict ED risk with a prediction method based on a decision tree-based model. Expected results: 1044 complete records have already been collected, and the survey will be closed at the end of May 2022. Records will be analyzed with a clustering method and a bootstrap method in order to reveal risk profile(s). Furthermore, a predictive tree decision method will be done to extract an accurate predictive model of ED risk. This analysis will confirm typical main risk factors and will give more data on presumed strong risk factors such as exercise motivations and interoceptive deficit. Furthermore, it will enlighten particular risk profiles with a strong level of proof and greatly contribute to improving the early detection of ED and contribute to a better understanding of ED risk factors.

Keywords: eating disorders, risk factors, physical activity, machine learning

Procedia PDF Downloads 52
26724 Consistent Testing for an Implication of Supermodular Dominance with an Application to Verifying the Effect of Geographic Knowledge Spillover

Authors: Chung Danbi, Linton Oliver, Whang Yoon-Jae

Abstract:

Supermodularity, or complementarity, is a popular concept in economics which can characterize many objective functions such as utility, social welfare, and production functions. Further, supermodular dominance captures a preference for greater interdependence among inputs of those functions, and it can be applied to examine which input set would produce higher expected utility, social welfare, or production. Therefore, we propose and justify a consistent testing for a useful implication of supermodular dominance. We also conduct Monte Carlo simulations to explore the finite sample performance of our test, with critical values obtained from the recentered bootstrap method, with and without the selective recentering, and the subsampling method. Under various parameter settings, we confirmed that our test has reasonably good size and power performance. Finally, we apply our test to compare the geographic and distant knowledge spillover in terms of their effects on social welfare using the National Bureau of Economic Research (NBER) patent data. We expect localized citing to supermodularly dominate distant citing if the geographic knowledge spillover engenders greater social welfare than distant knowledge spillover. Taking subgroups based on firm and patent characteristics, we found that there is industry-wise and patent subclass-wise difference in the pattern of supermodular dominance between localized and distant citing. We also compare the results from analyzing different time periods to see if the development of Internet and communication technology has changed the pattern of the dominance. In addition, to appropriately deal with the sparse nature of the data, we apply high-dimensional methods to efficiently select relevant data.

Keywords: supermodularity, supermodular dominance, stochastic dominance, Monte Carlo simulation, bootstrap, subsampling

Procedia PDF Downloads 89
26723 Shiva's Dance: Crisis, Local Institutions, and Private Firms

Authors: João Pereira Dos Santos

Abstract:

The uneven spatial distribution of start-ups and their respective survival may reflect comparative advantages resulting from the local institutional background. For the first time, we explore this idea using Data Envelopment Analysis (DEA) to assess relative efficiency of Portuguese municipalities in this specific context. We depart from the related literature where expenditure is perceived as a desirable input by choosing a measure of fiscal responsibility and infrastructural variables in the first stage. Comparing results for 2006 and 2010, we find that mean performance decreased substantially with 1) the effects of the Global Financial Crisis; 2) as municipal population increases and 3) as financial independence decreases. A second stage is then computed employing a double-bootstrap procedure to evaluate how the regional context outside the control of local authorities (e.g. demographic characteristics and political preferences) impacts on efficiency.

Keywords: entrepreneurship, political economy, public finance, accountability, crisis, efficiency, Portuguese municipalities

Procedia PDF Downloads 467
26722 Re-Stating the Origin of Tetrapod Using Measures of Phylogenetic Support for Phylogenomic Data

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to re-investigate the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high internode certainty, relative gene support, and high gene concordance factor. The evidence stems from five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup, such as slow-evolving species, while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: novel measures of phylogenetic support for phylogenomic data, gene concordance factor confidence, relative gene support, internode certainty, origin of tetrapods

Procedia PDF Downloads 27
26721 Measures of Phylogenetic Support for Phylogenomic and the Whole Genomes of Two Lungfish Restate Lungfish and Origin of Land Vertebrates

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to reassess the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high gene support confidence with confidence intervals exceeding 95%, high internode certainty, and high gene concordance factor. The evidence stems from two datasets containing recently deciphered whole genomes of two lungfish species, as well as five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa diminishes the number of orthologues and leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction (LBA) and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: gene support confidence (GSC), origin of land vertebrates, coelacanth, two whole genomes of lungfishes, confidence intervals

Procedia PDF Downloads 43