Search results for: quantile
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 50

Search results for: quantile

20 Identification of Outliers in Flood Frequency Analysis: Comparison of Original and Multiple Grubbs-Beck Test

Authors: Ayesha S. Rahman, Khaled Haddad, Ataur Rahman

Abstract:

At-site flood frequency analysis is used to estimate flood quantiles when at-site record length is reasonably long. In Australia, FLIKE software has been introduced for at-site flood frequency analysis. The advantage of FLIKE is that, for a given application, the user can compare a number of most commonly adopted probability distributions and parameter estimation methods relatively quickly using a windows interface. The new version of FLIKE has been incorporated with the multiple Grubbs and Beck test which can identify multiple numbers of potentially influential low flows. This paper presents a case study considering six catchments in eastern Australia which compares two outlier identification tests (original Grubbs and Beck test and multiple Grubbs and Beck test) and two commonly applied probability distributions (Generalized Extreme Value (GEV) and Log Pearson type 3 (LP3)) using FLIKE software. It has been found that the multiple Grubbs and Beck test when used with LP3 distribution provides more accurate flood quantile estimates than when LP3 distribution is used with the original Grubbs and Beck test. Between these two methods, the differences in flood quantile estimates have been found to be up to 61% for the six study catchments. It has also been found that GEV distribution (with L moments) and LP3 distribution with the multiple Grubbs and Beck test provide quite similar results in most of the cases; however, a difference up to 38% has been noted for flood quantiles for annual exceedance probability (AEP) of 1 in 100 for one catchment. These findings need to be confirmed with a greater number of stations across other Australian states.

Keywords: floods, FLIKE, probability distributions, flood frequency, outlier

Procedia PDF Downloads 411
19 Load Forecasting in Microgrid Systems with R and Cortana Intelligence Suite

Authors: F. Lazzeri, I. Reiter

Abstract:

Energy production optimization has been traditionally very important for utilities in order to improve resource consumption. However, load forecasting is a challenging task, as there are a large number of relevant variables that must be considered, and several strategies have been used to deal with this complex problem. This is especially true also in microgrids where many elements have to adjust their performance depending on the future generation and consumption conditions. The goal of this paper is to present a solution for short-term load forecasting in microgrids, based on three machine learning experiments developed in R and web services built and deployed with different components of Cortana Intelligence Suite: Azure Machine Learning, a fully managed cloud service that enables to easily build, deploy, and share predictive analytics solutions; SQL database, a Microsoft database service for app developers; and PowerBI, a suite of business analytics tools to analyze data and share insights. Our results show that Boosted Decision Tree and Fast Forest Quantile regression methods can be very useful to predict hourly short-term consumption in microgrids; moreover, we found that for these types of forecasting models, weather data (temperature, wind, humidity and dew point) can play a crucial role in improving the accuracy of the forecasting solution. Data cleaning and feature engineering methods performed in R and different types of machine learning algorithms (Boosted Decision Tree, Fast Forest Quantile and ARIMA) will be presented, and results and performance metrics discussed.

Keywords: time-series, features engineering methods for forecasting, energy demand forecasting, Azure Machine Learning

Procedia PDF Downloads 273
18 A New Distribution and Application on the Lifetime Data

Authors: Gamze Ozel, Selen Cakmakyapan

Abstract:

We introduce a new model called the Marshall-Olkin Rayleigh distribution which extends the Rayleigh distribution using Marshall-Olkin transformation and has increasing and decreasing shapes for the hazard rate function. Various structural properties of the new distribution are derived including explicit expressions for the moments, generating and quantile function, some entropy measures, and order statistics are presented. The model parameters are estimated by the method of maximum likelihood and the observed information matrix is determined. The potentiality of the new model is illustrated by means of real life data set.

Keywords: Marshall-Olkin distribution, Rayleigh distribution, estimation, maximum likelihood

Procedia PDF Downloads 466
17 Understanding Consumption Planning Behaviors

Authors: Gaosheng Ju

Abstract:

Our empirical evidence supports a model of consumption planning behaviors with the following two characteristics. First, households formulate a rational consumption target based on their desired target, displaying a diminishing sensitivity to the discrepancy between them. Second, the established target is a reference point for their planned consumption. The diminishing sensitivity leads to opposite reactions in higher and lower quantiles of both consumption targets and consumption growth to changes in economic conditions. This phenomenon accounts for the perplexingly low correlation between consumption and other macroeconomic variables. Furthermore, the opposing movements of consumption targets offer new insights into consumption-based asset pricing.

Keywords: consumption planning, reference point, diminishing sensitivity, quantile regression, asset pricing puzzles

Procedia PDF Downloads 40
16 The Normal-Generalized Hyperbolic Secant Distribution: Properties and Applications

Authors: Hazem M. Al-Mofleh

Abstract:

In this paper, a new four-parameter univariate continuous distribution called the Normal-Generalized Hyperbolic Secant Distribution (NGHS) is defined and studied. Some general and structural distributional properties are investigated and discussed, including: central and non-central n-th moments and incomplete moments, quantile and generating functions, hazard function, Rényi and Shannon entropies, shapes: skewed right, skewed left, and symmetric, modality regions: unimodal and bimodal, maximum likelihood (MLE) estimators for the parameters. Finally, two real data sets are used to demonstrate empirically its flexibility and prove the strength of the new distribution.

Keywords: bimodality, estimation, hazard function, moments, Shannon’s entropy

Procedia PDF Downloads 310
15 Predicting Data Center Resource Usage Using Quantile Regression to Conserve Energy While Fulfilling the Service Level Agreement

Authors: Ahmed I. Alutabi, Naghmeh Dezhabad, Sudhakar Ganti

Abstract:

Data centers have been growing in size and dema nd continuously in the last two decades. Planning for the deployment of resources has been shallow and always resorted to over-provisioning. Data center operators try to maximize the availability of their services by allocating multiple of the needed resources. One resource that has been wasted, with little thought, has been energy. In recent years, programmable resource allocation has paved the way to allow for more efficient and robust data centers. In this work, we examine the predictability of resource usage in a data center environment. We use a number of models that cover a wide spectrum of machine learning categories. Then we establish a framework to guarantee the client service level agreement (SLA). Our results show that using prediction can cut energy loss by up to 55%.

Keywords: machine learning, artificial intelligence, prediction, data center, resource allocation, green computing

Procedia PDF Downloads 77
14 Modeling of System Availability and Bayesian Analysis of Bivariate Distribution

Authors: Muhammad Farooq, Ahtasham Gul

Abstract:

To meet the desired standard, it is important to monitor and analyze different engineering processes to get desired output. The bivariate distributions got a lot of attention in recent years to describe the randomness of natural as well as artificial mechanisms. In this article, a bivariate model is constructed using two independent models developed by the nesting approach to study the effect of each component on reliability for better understanding. Further, the Bayes analysis of system availability is studied by considering prior parametric variations in the failure time and repair time distributions. Basic statistical characteristics of marginal distribution, like mean median and quantile function, are discussed. We use inverse Gamma prior to study its frequentist properties by conducting Monte Carlo Markov Chain (MCMC) sampling scheme.

Keywords: reliability, system availability Weibull, inverse Lomax, Monte Carlo Markov Chain, Bayesian

Procedia PDF Downloads 48
13 Measuring Tail-Risk Spillover in the International Banking Industry

Authors: Lidia Sanchis-Marco, Antonio Rubia

Abstract:

In this paper we analyze the state-dependent risk-spillover in different economic areas. To this end, we apply the quantile regression-based methodology developed in Adams, Füss and Gropp approach to examine the spillover in conditional tails of daily returns of indices of the banking industry in the US, BRICs, Peripheral EMU, Core EMU, Scandinavia, the UK and Emerging Markets. This methodology allow us to characterize size, direction and strength of financial contagion in a network of bilateral exposures to address cross-border vulnerabilities under different states of the economy. The general evidence shows as the spillover effects are higher and more significant in volatile periods than in tranquil ones. There is evidence of tail spillovers of which much is attributable to a spillover from the US on the rest of the analyzed regions, specially on European countries. In sharp contrast, the US banking system show more financial resilience against foreign shocks.

Keywords: spillover effects, Bank Contagion, SDSVaR, expected shortfall, VaR, expectiles

Procedia PDF Downloads 470
12 Digital Transformation, Financing Microstructures, and Impact on Well-Being and Income Inequality

Authors: Koffi Sodokin

Abstract:

Financing microstructures are increasingly seen as a means of financial inclusion and improving overall well-being in developing countries. In practice, digital transformation in finance can accelerate the optimal functioning of financing microstructures, such as access by households to microfinance and microinsurance. Large households' access to finance can lead to a reduction in income inequality and an overall improvement in well-being. This paper explores the impact of access to digital finance and financing microstructures on household well-being and the reduction of income inequality. To this end, we use the propensity score matching, the double difference, and the smooth instrumental quantile regression as estimation methods with two periods of survey data. The paper uses the FinScope consumer data (2016) and the Harmonized Living Standards Measurement Study (2018) from Togo in a comparative perspective. The results indicate that access to digital finance, as a cultural game changer, and to financing microstructures improves overall household well-being and contributes significantly to reducing income inequality.

Keywords: financing microstructure, microinsurance, microfinance, digital finance, well-being, income inequality

Procedia PDF Downloads 60
11 Confidence Envelopes for Parametric Model Selection Inference and Post-Model Selection Inference

Authors: I. M. L. Nadeesha Jayaweera, Adao Alex Trindade

Abstract:

In choosing a candidate model in likelihood-based modeling via an information criterion, the practitioner is often faced with the difficult task of deciding just how far up the ranked list to look. Motivated by this pragmatic necessity, we construct an uncertainty band for a generalized (model selection) information criterion (GIC), defined as a criterion for which the limit in probability is identical to that of the normalized log-likelihood. This includes common special cases such as AIC & BIC. The method starts from the asymptotic normality of the GIC for the joint distribution of the candidate models in an independent and identically distributed (IID) data framework and proceeds by deriving the (asymptotically) exact distribution of the minimum. The calculation of an upper quantile for its distribution then involves the computation of multivariate Gaussian integrals, which is amenable to efficient implementation via the R package "mvtnorm". The performance of the methodology is tested on simulated data by checking the coverage probability of nominal upper quantiles and compared to the bootstrap. Both methods give coverages close to nominal for large samples, but the bootstrap is two orders of magnitude slower. The methodology is subsequently extended to two other commonly used model structures: regression and time series. In the regression case, we derive the corresponding asymptotically exact distribution of the minimum GIC invoking Lindeberg-Feller type conditions for triangular arrays and are thus able to similarly calculate upper quantiles for its distribution via multivariate Gaussian integration. The bootstrap once again provides a default competing procedure, and we find that similar comparison performance metrics hold as for the IID case. The time series case is complicated by far more intricate asymptotic regime for the joint distribution of the model GIC statistics. Under a Gaussian likelihood, the default in most packages, one needs to derive the limiting distribution of a normalized quadratic form for a realization from a stationary series. Under conditions on the process satisfied by ARMA models, a multivariate normal limit is once again achieved. The bootstrap can, however, be employed for its computation, whence we are once again in the multivariate Gaussian integration paradigm for upper quantile evaluation. Comparisons of this bootstrap-aided semi-exact method with the full-blown bootstrap once again reveal a similar performance but faster computation speeds. One of the most difficult problems in contemporary statistical methodological research is to be able to account for the extra variability introduced by model selection uncertainty, the so-called post-model selection inference (PMSI). We explore ways in which the GIC uncertainty band can be inverted to make inferences on the parameters. This is being attempted in the IID case by pivoting the CDF of the asymptotically exact distribution of the minimum GIC. For inference one parameter at a time and a small number of candidate models, this works well, whence the attained PMSI confidence intervals are wider than the MLE-based Wald, as expected.

Keywords: model selection inference, generalized information criteria, post model selection, Asymptotic Theory

Procedia PDF Downloads 62
10 Pricing the Risk Associated to Weather of Variable Renewable Energy Generation

Authors: Jorge M. Uribe

Abstract:

We propose a methodology for setting the price of an insurance contract targeted to manage the risk associated with weather conditions that affect variable renewable energy generation. The methodology relies on conditional quantile regressions to estimate the weather risk of a solar panel. It is illustrated using real daily radiation and weather data for three cities in Spain (Valencia, Barcelona and Madrid) from February 2/2004 to January 22/2019. We also adapt the concepts of value at risk and expected short fall from finance to this context, to provide a complete panorama of what we label as weather risk. The methodology is easy to implement and can be used by insurance companies to price a contract with the aforementioned characteristics when data about similar projects and accurate cash flow projections are lacking. Our methodology assigns a higher price to an insurance product with the stated characteristics in Madrid, compared to Valencia and Barcelona. This is consistent with Madrid showing the largest interquartile range of operational deficits and it is unrelated to the average value deficit, which illustrates the importance of our proposal.

Keywords: insurance, weather, vre, risk

Procedia PDF Downloads 120
9 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities

Authors: Retius Chifurira

Abstract:

Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.

Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities

Procedia PDF Downloads 161
8 A GIS Based Approach in District Peshawar, Pakistan for Groundwater Vulnerability Assessment Using DRASTIC Model

Authors: Syed Adnan, Javed Iqbal

Abstract:

In urban and rural areas groundwater is the most economic natural source of drinking. Groundwater resources of Pakistan are degraded due to high population growth and increased industrial development. A study was conducted in district Peshawar to assess groundwater vulnerable zones using GIS based DRASTIC model. Six input parameters (groundwater depth, groundwater recharge, aquifer material, soil type, slope and hydraulic conductivity) were used in the DRASTIC model to generate the groundwater vulnerable zones. Each parameter was divided into different ranges or media types and a subjective rating from 1-10 was assigned to each factor where 1 represented very low impact on pollution potential and 10 represented very high impact. Weight multiplier from 1-5 was used to balance and enhance the importance of each factor. The DRASTIC model scores obtained varied from 47 to 147. Using quantile classification scheme these values were reclassified into three zones i.e. low, moderate and high vulnerable zones. The areas of these zones were calculated. The final result indicated that about 400 km2, 506 km2, and 375 km2 were classified as low, moderate, and high vulnerable areas, respectively. It is recommended that the most vulnerable zones should be treated on first priority to facilitate the inhabitants for drinking purposes.

Keywords: DRASTIC model, groundwater vulnerability, GIS in groundwater, drinking sources

Procedia PDF Downloads 422
7 Corporate Socially Responsible and Financial Performance in the Tourism-Related Industries

Authors: Yu Shan Wang

Abstract:

Different from other industries, the structure of the tourism industry depends to a large degree the environmental and cultural resources. The industry has to undertake social responsibilities for its commercial behaviour. This paper refers to the seven dimensions of the KLD STATS in 1991-2011 as the indicator to CSR practices. The purpose is to investigate what CSR activities create significant impacts on accounting-based financials and firm values by delving into different CSR dimensions. Meanwhile, this paper takes into consideration S&P 500 and control variables (firm sizes and financial leverage). In fact, the commercial behavior of the tourism-related industry may result in negative impacts on the economy and the society. Therefore, this paper classifies a positive set of CSR elements and a negative set of CSR elements for the tourism-related industry in order to examine their respective effects on short-term profitability and long-term firm values. This can shed light on which CSR dimensions exhibit significant impacts on CFP better than holistic CSR indicators, and hence provide more useful information to investors and corporates. This paper uses quantile regressions to avoid the impact of outliers in the data set. This helps to offer specific information so that companies can make informed decisions.

Keywords: corporate social responsibility, CSR, firm value, tourism, corporate financial performance, CFP

Procedia PDF Downloads 252
6 Frequency Analysis Using Multiple Parameter Probability Distributions for Rainfall to Determine Suitable Probability Distribution in Pakistan

Authors: Tasir Khan, Yejuan Wang

Abstract:

The study of extreme rainfall events is very important for flood management in river basins and the design of water conservancy infrastructure. Evaluation of quantiles of annual maximum rainfall (AMRF) is required in different environmental fields, agriculture operations, renewable energy sources, climatology, and the design of different structures. Therefore, the annual maximum rainfall (AMRF) was performed at different stations in Pakistan. Multiple probability distributions, log normal (LN), generalized extreme value (GEV), Gumbel (max), and Pearson type3 (P3) were used to find out the most appropriate distributions in different stations. The L moments method was used to evaluate the distribution parameters. Anderson darling test, Kolmogorov- Smirnov test, and chi-square test showed that two distributions, namely GUM (max) and LN, were the best appropriate distributions. The quantile estimate of a multi-parameter PD offers extreme rainfall through a specific location and is therefore important for decision-makers and planners who design and construct different structures. This result provides an indication of these multi-parameter distribution consequences for the study of sites and peak flow prediction and the design of hydrological maps. Therefore, this discovery can support hydraulic structure and flood management.

Keywords: RAMSE, multiple frequency analysis, annual maximum rainfall, L-moments

Procedia PDF Downloads 53
5 Modelling Hydrological Time Series Using Wakeby Distribution

Authors: Ilaria Lucrezia Amerise

Abstract:

The statistical modelling of precipitation data for a given portion of territory is fundamental for the monitoring of climatic conditions and for Hydrogeological Management Plans (HMP). This modelling is rendered particularly complex by the changes taking place in the frequency and intensity of precipitation, presumably to be attributed to the global climate change. This paper applies the Wakeby distribution (with 5 parameters) as a theoretical reference model. The number and the quality of the parameters indicate that this distribution may be the appropriate choice for the interpolations of the hydrological variables and, moreover, the Wakeby is particularly suitable for describing phenomena producing heavy tails. The proposed estimation methods for determining the value of the Wakeby parameters are the same as those used for density functions with heavy tails. The commonly used procedure is the classic method of moments weighed with probabilities (probability weighted moments, PWM) although this has often shown difficulty of convergence, or rather, convergence to a configuration of inappropriate parameters. In this paper, we analyze the problem of the likelihood estimation of a random variable expressed through its quantile function. The method of maximum likelihood, in this case, is more demanding than in the situations of more usual estimation. The reasons for this lie, in the sampling and asymptotic properties of the estimators of maximum likelihood which improve the estimates obtained with indications of their variability and, therefore, their accuracy and reliability. These features are highly appreciated in contexts where poor decisions, attributable to an inefficient or incomplete information base, can cause serious damages.

Keywords: generalized extreme values, likelihood estimation, precipitation data, Wakeby distribution

Procedia PDF Downloads 111
4 Prenatal Lead Exposure and Postpartum Depression: An Exploratory Study of Women in Mexico

Authors: Nia McRae, Robert Wright, Ghalib Bello

Abstract:

Introduction: Postpartum depression is a prevalent mood disorder that is detrimental to the mental and physical health of mothers and their newborns. Lead (Pb) is a toxic metal that is associated with hormonal imbalance and mental impairments. The hormone changes that accompany pregnancy and childbirth may be exacerbated by Pb and increase new mothers’ susceptibility to postpartum depression. To the best of the author’s knowledge, this is the only study that investigates the association between prenatal Pb exposure and postpartum depression. Identifying risk factors can contribute to improved prevention and treatment strategies for postpartum depression. Methods: Data was derived from the Programming Research in Obesity, Growth, Environment and Social Stress (PROGRESS) study which is an ongoing longitudinal birth cohort. Postpartum depression was identified by a score of 13 or above on the 10-Item Edinburg Postnatal Depression Scale (EPDS) 6-months and 12-months postpartum. Pb was measured in the blood (BPb) in the second and third trimester and in the tibia and patella 1-month postpartum. Quantile regression models were used to assess the relationship between BPb and postpartum depression. Results: BPb in the second trimester was negatively associated with the 80th percentile of depression 6-months postpartum (β: -0.26; 95% CI: -0.51, -0.01). No significant association was found between BPb in the third trimester and depression 6-months postpartum. BPb in the third trimester exhibited an inverse relationship with the 60th percentile (β: -0.23; 95% CI: -0.41, -0.06), 70th percentile (β: -0.31; 95% CI: -0.52, -0.10), and 90th percentile of depression 12-months postpartum (β: -0.36; 95% CI: -0.69, -0.03). There was no significant association between BPb in the second trimester and depression 12-months postpartum. Bone Pb concentrations were not significantly associated with postpartum depression. Conclusion: The negative association between BPb and postpartum depression may support research which demonstrates lead is a nontherapeutic stimulant. Further research is needed to verify these results and identify effect modifiers.

Keywords: depression, lead, postpartum, prenatal

Procedia PDF Downloads 196
3 The role of Financial Development and Institutional Quality in Promoting Sustainable Development through Tourism Management

Authors: Hashim Zameer

Abstract:

Effective tourism management plays a vital role in promoting sustainability and supporting ecosystems. A common principle that has been in practice over the years is “first pollute and then clean,” indicating countries need financial resources to promote sustainability. Financial development and the tourism management both seems very important to promoting sustainable development. However, without institutional support, it is very difficult to succeed. In this context, it seems prominently significant to explore how institutional quality, tourism development, and financial development could promote sustainable development. In the past, no research explored the role of tourism development in sustainable development. Moreover, the role of financial development, natural resources, and institutional quality in sustainable development is also ignored. In this regard, this paper aims to investigate the role of tourism development, natural resources, financial development, and institutional quality in sustainable development in China. The study used time-series data from 2000–2021 and employed the Bayesian linear regression model because it is suitable for small data sets. The robustness of the findings was checked using a quantile regression approach. The results reveal that an increase in tourism expenditures stimulates the economy, creates jobs, encourages cultural exchange, and supports sustainability initiatives. Moreover, financial development and institution quality have a positive effect on sustainable development. However, reliance on natural resources can result in negative economic, social, and environmental outcomes, highlighting the need for resource diversification and management to reinforce sustainable development. These results highlight the significance of financial development, strong institutions, sustainable tourism, and careful utilization of natural resources for long-term sustainability. The study holds vital insights for policy formulation to promote sustainable tourism.

Keywords: sustainability, tourism development, financial development, institutional quality

Procedia PDF Downloads 47
2 Comparing Xbar Charts: Conventional versus Reweighted Robust Estimation Methods for Univariate Data Sets

Authors: Ece Cigdem Mutlu, Burak Alakent

Abstract:

Maintaining the quality of manufactured products at a desired level depends on the stability of process dispersion and location parameters and detection of perturbations in these parameters as promptly as possible. Shewhart control chart is the most widely used technique in statistical process monitoring to monitor the quality of products and control process mean and variability. In the application of Xbar control charts, sample standard deviation and sample mean are known to be the most efficient conventional estimators in determining process dispersion and location parameters, respectively, based on the assumption of independent and normally distributed datasets. On the other hand, there is no guarantee that the real-world data would be normally distributed. In the cases of estimated process parameters from Phase I data clouded with outliers, efficiency of traditional estimators is significantly reduced, and performance of Xbar charts are undesirably low, e.g. occasional outliers in the rational subgroups in Phase I data set may considerably affect the sample mean and standard deviation, resulting a serious delay in detection of inferior products in Phase II. For more efficient application of control charts, it is required to use robust estimators against contaminations, which may exist in Phase I. In the current study, we present a simple approach to construct robust Xbar control charts using average distance to the median, Qn-estimator of scale, M-estimator of scale with logistic psi-function in the estimation of process dispersion parameter, and Harrell-Davis qth quantile estimator, Hodge-Lehmann estimator and M-estimator of location with Huber psi-function and logistic psi-function in the estimation of process location parameter. Phase I efficiency of proposed estimators and Phase II performance of Xbar charts constructed from these estimators are compared with the conventional mean and standard deviation statistics both under normality and against diffuse-localized and symmetric-asymmetric contaminations using 50,000 Monte Carlo simulations on MATLAB. Consequently, it is found that robust estimators yield parameter estimates with higher efficiency against all types of contaminations, and Xbar charts constructed using robust estimators have higher power in detecting disturbances, compared to conventional methods. Additionally, utilizing individuals charts to screen outlier subgroups and employing different combination of dispersion and location estimators on subgroups and individual observations are found to improve the performance of Xbar charts.

Keywords: average run length, M-estimators, quality control, robust estimators

Procedia PDF Downloads 168
1 High School Gain Analytics From National Assessment Program – Literacy and Numeracy and Australian Tertiary Admission Rankin Linkage

Authors: Andrew Laming, John Hattie, Mark Wilson

Abstract:

Nine Queensland Independent high schools provided deidentified student-matched ATAR and NAPLAN data for all 1217 ATAR graduates since 2020 who also sat NAPLAN at the school. Graduating cohorts from the nine schools contained a mean 100 ATAR graduates with previous NAPLAN data from their school. Excluded were vocational students (mean=27) and any ATAR graduates without NAPLAN data (mean=20). Based on Index of Community Socio-Educational Access (ICSEA) prediction, all schools had larger that predicted proportions of their students graduating with ATARs. There were an additional 173 students not releasing their ATARs to their school (14%), requiring this data to be inferred by schools. Gain was established by first converting each student’s strongest NAPLAN domain to a statewide percentile, then subtracting this result from final ATAR. The resulting ‘percentile shift’ was corrected for plausible ATAR participation at each NAPLAN level. Strongest NAPLAN domain had the highest correlation with ATAR (R2=0.58). RESULTS School mean NAPLAN scores fitted ICSEA closely (R2=0.97). Schools achieved a mean cohort gain of two ATAR rankings, but only 66% of students gained. This ranged from 46% of top-NAPLAN decile students gaining, rising to 75% achieving gains outside the top decile. The 54% of top-decile students whose ATAR fell short of prediction lost a mean 4.0 percentiles (or 6.2 percentiles prior to correction for regression to the mean). 71% of students in smaller schools gained, compared to 63% in larger schools. NAPLAN variability in each of the 13 ICSEA1100 cohorts was 17%, with both intra-school and inter-school variation of these values extremely low (0.3% to 1.8%). Mean ATAR change between years in each school was just 1.1 ATAR ranks. This suggests consecutive school cohorts and ICSEA-similar schools share very similar distributions and outcomes over time. Quantile analysis of the NAPLAN/ATAR revealed heteroscedasticity, but splines offered little additional benefit over simple linear regression. The NAPLAN/ATAR R2 was 0.33. DISCUSSION Standardised data like NAPLAN and ATAR offer educators a simple no-cost progression metric to analyse performance in conjunction with their internal test results. Change is expressed in percentiles, or ATAR shift per student, which is layperson intuitive. Findings may also reduce ATAR/vocational stream mismatch, reveal proportions of cohorts meeting or falling short of expectation and demonstrate by how much. Finally, ‘crashed’ ATARs well below expectation are revealed, which schools can reasonably work to minimise. The percentile shift method is neither value-add nor a growth percentile. In the absence of exit NAPLAN testing, this metric is unable to discriminate academic gain from legitimate ATAR-maximizing strategies. But by controlling for ICSEA, ATAR proportion variation and student mobility, it uncovers progression to ATAR metrics which are not currently publicly available. However achieved, ATAR maximisation is a sought-after private good. So long as standardised nationwide data is available, this analysis offers useful analytics for educators and reasonable predictivity when counselling subsequent cohorts about their ATAR prospects.  

Keywords: NAPLAN, ATAR, analytics, measurement, gain, performance, data, percentile, value-added, high school, numeracy, reading comprehension, variability, regression to the mean

Procedia PDF Downloads 37