Search results for: multivariate models.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6958

Search results for: multivariate models.

6958 Regression for Doubly Inflated Multivariate Poisson Distributions

Authors: Ishapathik Das, Sumen Sen, N. Rao Chaganty, Pooja Sengupta

Abstract:

Dependent multivariate count data occur in several research studies. These data can be modeled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells, and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present a real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.

Keywords: copula, Gaussian copula, multivariate distributions, inflated distributios

Procedia PDF Downloads 129
6957 Multivariate Genome-Wide Association Studies for Identifying Additional Loci for Myopia

Authors: Qiao Fan, Xiaobo Guo, Junxian Zhu, Xiaohu Ding, Ching-Yu Cheng, Tien-Yin Wong, Mingguang He, Heping Zhang, Xueqin Wang

Abstract:

A systematic, simultaneous analysis of multiple phenotypes in genome-wide association studies (GWASs) draws a great attention to integrate the signals from single phenotypes with increased power. However, lacking an interpretable and efficient multivariate GWAS analysis impede the application of such approach. In this study, we propose to decompose the multivariate model into a series of simple univariate models. This transformation illuminates what exactly the individual trait contributes to the significant signals from the multivariate analyses. By employing our approach in the analysis of three myopia-related endophenotypes from the Singapore Malay Eye Study (SIMES), we identify novel candidate loci which were successfully validated in an independent Guangzhou Twin Eye Study (GTES).

Keywords: GWAS multivariate, multiple traits, myopia, association

Procedia PDF Downloads 193
6956 Selection of Designs in Ordinal Regression Models under Linear Predictor Misspecification

Authors: Ishapathik Das

Abstract:

The purpose of this article is to find a method of comparing designs for ordinal regression models using quantile dispersion graphs in the presence of linear predictor misspecification. The true relationship between response variable and the corresponding control variables are usually unknown. Experimenter assumes certain form of the linear predictor of the ordinal regression models. The assumed form of the linear predictor may not be correct always. Thus, the maximum likelihood estimates (MLE) of the unknown parameters of the model may be biased due to misspecification of the linear predictor. In this article, the uncertainty in the linear predictor is represented by an unknown function. An algorithm is provided to estimate the unknown function at the design points where observations are available. The unknown function is estimated at all points in the design region using multivariate parametric kriging. The comparison of the designs are based on a scalar valued function of the mean squared error of prediction (MSEP) matrix, which incorporates both variance and bias of the prediction caused by the misspecification in the linear predictor. The designs are compared using quantile dispersion graphs approach. The graphs also visually depict the robustness of the designs on the changes in the parameter values. Numerical examples are presented to illustrate the proposed methodology.

Keywords: model misspecification, multivariate kriging, multivariate logistic link, ordinal response models, quantile dispersion graphs

Procedia PDF Downloads 356
6955 A Cohort and Empirical Based Multivariate Mortality Model

Authors: Jeffrey Tzu-Hao Tsai, Yi-Shan Wong

Abstract:

This article proposes a cohort-age-period (CAP) model to characterize multi-population mortality processes using cohort, age, and period variables. Distinct from the factor-based Lee-Carter-type decomposition mortality model, this approach is empirically based and includes the age, period, and cohort variables into the equation system. The model not only provides a fruitful intuition for explaining multivariate mortality change rates but also has a better performance in forecasting future patterns. Using the US and the UK mortality data and performing ten-year out-of-sample tests, our approach shows smaller mean square errors in both countries compared to the models in the literature.

Keywords: longevity risk, stochastic mortality model, multivariate mortality rate, risk management

Procedia PDF Downloads 22
6954 Fast Bayesian Inference of Multivariate Block-Nearest Neighbor Gaussian Process (NNGP) Models for Large Data

Authors: Carlos Gonzales, Zaida Quiroz, Marcos Prates

Abstract:

Several spatial variables collected at the same location that share a common spatial distribution can be modeled simultaneously through a multivariate geostatistical model that takes into account the correlation between these variables and the spatial autocorrelation. The main goal of this model is to perform spatial prediction of these variables in the region of study. Here we focus on a geostatistical multivariate formulation that relies on sharing common spatial random effect terms. In particular, the first response variable can be modeled by a mean that incorporates a shared random spatial effect, while the other response variables depend on this shared spatial term, in addition to specific random spatial effects. Each spatial random effect is defined through a Gaussian process with a valid covariance function, but in order to improve the computational efficiency when the data are large, each Gaussian process is approximated to a Gaussian random Markov field (GRMF), specifically to the block nearest neighbor Gaussian process (Block-NNGP). This approach involves dividing the spatial domain into several dependent blocks under certain constraints, where the cross blocks allow capturing the spatial dependence on a large scale, while each individual block captures the spatial dependence on a smaller scale. The multivariate geostatistical model belongs to the class of Latent Gaussian Models; thus, to achieve fast Bayesian inference, it is used the integrated nested Laplace approximation (INLA) method. The good performance of the proposed model is shown through simulations and applications for massive data.

Keywords: Block-NNGP, geostatistics, gaussian process, GRMF, INLA, multivariate models.

Procedia PDF Downloads 58
6953 Simultaneous Determination of Six Characterizing/Quality Parameters of Biodiesels via 1H NMR and Multivariate Calibration

Authors: Gustavo G. Shimamoto, Matthieu Tubino

Abstract:

The characterization and the quality of biodiesel samples are checked by determining several parameters. Considering a large number of analysis to be performed, as well as the disadvantages of the use of toxic solvents and waste generation, multivariate calibration is suggested to reduce the number of tests. In this work, hydrogen nuclear magnetic resonance (1H NMR) spectra were used to build multivariate models, from partial least squares (PLS) regression, in order to determine simultaneously six important characterizing and/or quality parameters of biodiesels: density at 20 ºC, kinematic viscosity at 40 ºC, iodine value, acid number, oxidative stability, and water content. Biodiesels from twelve different oils sources were used in this study: babassu, brown flaxseed, canola, corn, cottonseed, macauba almond, microalgae, palm kernel, residual frying, sesame, soybean, and sunflower. 1H NMR reflects the structures of the compounds present in biodiesel samples and showed suitable correlations with the six parameters. The PLS models were constructed with latent variables between 5 and 7, the obtained values of r(cal) and r(val) were greater than 0.994 and 0.989, respectively. In addition, the models were considered suitable to predict all the six parameters for external samples, taking into account the analytical speed to perform it. Thus, the alliance between 1H NMR and PLS showed to be appropriate to characterize and evaluate the quality of biodiesels, reducing significantly analysis time, the consumption of reagents/solvents, and waste generation. Therefore, the proposed methods can be considered to adhere to the principles of green chemistry.

Keywords: biodiesel, multivariate calibration, nuclear magnetic resonance, quality parameters

Procedia PDF Downloads 496
6952 Predicting Returns Volatilities and Correlations of Stock Indices Using Multivariate Conditional Autoregressive Range and Return Models

Authors: Shay Kee Tan, Kok Haur Ng, Jennifer So-Kuen Chan

Abstract:

This paper extends the conditional autoregressive range (CARR) model to multivariate CARR (MCARR) model and further to the two-stage MCARR-return model to model and forecast volatilities, correlations and returns of multiple financial assets. The first stage model fits the scaled realised Parkinson volatility measures using individual series and their pairwise sums of indices to the MCARR model to obtain in-sample estimates and forecasts of volatilities for these individual and pairwise sum series. Then covariances are calculated to construct the fitted variance-covariance matrix of returns which are imputed into the stage-two return model to capture the heteroskedasticity of assets’ returns. We investigate different choices of mean functions to describe the volatility dynamics. Empirical applications are based on the Standard and Poor 500, Dow Jones Industrial Average and Dow Jones United States Financial Service Indices. Results show that the stage-one MCARR models using asymmetric mean functions give better in-sample model fits than those based on symmetric mean functions. They also provide better out-of-sample volatility forecasts than those using CARR models based on two robust loss functions with the scaled realised open-to-close volatility measure as the proxy for the unobserved true volatility. We also find that the stage-two return models with constant means and multivariate Student-t errors give better in-sample fits than the Baba, Engle, Kraft, and Kroner type of generalized autoregressive conditional heteroskedasticity (BEKK-GARCH) models. The estimates and forecasts of value-at-risk (VaR) and conditional VaR based on the best MCARR-return models for each asset are provided and tested using Kupiec test to confirm the accuracy of the VaR forecasts.

Keywords: range-based volatility, correlation, multivariate CARR-return model, value-at-risk, conditional value-at-risk

Procedia PDF Downloads 67
6951 Spatial Time Series Models for Rice and Cassava Yields Based on Bayesian Linear Mixed Models

Authors: Panudet Saengseedam, Nanthachai Kantanantha

Abstract:

This paper proposes a linear mixed model (LMM) with spatial effects to forecast rice and cassava yields in Thailand at the same time. A multivariate conditional autoregressive (MCAR) model is assumed to present the spatial effects. A Bayesian method is used for parameter estimation via Gibbs sampling Markov Chain Monte Carlo (MCMC). The model is applied to the rice and cassava yields monthly data which have been extracted from the Office of Agricultural Economics, Ministry of Agriculture and Cooperatives of Thailand. The results show that the proposed model has better performance in most provinces in both fitting part and validation part compared to the simple exponential smoothing and conditional auto regressive models (CAR) from our previous study.

Keywords: Bayesian method, linear mixed model, multivariate conditional autoregressive model, spatial time series

Procedia PDF Downloads 372
6950 The Modality of Multivariate Skew Normal Mixture

Authors: Bader Alruwaili, Surajit Ray

Abstract:

Finite mixtures are a flexible and powerful tool that can be used for univariate and multivariate distributions, and a wide range of research analysis has been conducted based on the multivariate normal mixture and multivariate of a t-mixture. Determining the number of modes is an important activity that, in turn, allows one to determine the number of homogeneous groups in a population. Our work currently being carried out relates to the study of the modality of the skew normal distribution in the univariate and multivariate cases. For the skew normal distribution, the aims are associated with studying the modality of the skew normal distribution and providing the ridgeline, the ridgeline elevation function, the $\Pi$ function, and the curvature function, and this will be conducive to an exploration of the number and location of mode when mixing the two components of skew normal distribution. The subsequent objective is to apply these results to the application of real world data sets, such as flow cytometry data.

Keywords: mode, modality, multivariate skew normal, finite mixture, number of mode

Procedia PDF Downloads 459
6949 Multivariate Dependent Frequency-Severity Modeling of Insurance Claims: A Vine Copula Approach

Authors: Islem Kedidi, Rihab Bedoui Bensalem, Faysal Manssouri

Abstract:

In traditional models of insurance data, the number and size of claims are assumed to be independent. Relaxing the independence assumption, this article explores the Vine copula to model dependence structure between multivariate frequency and average severity of insurance claim. To illustrate this approach, we use the Wisconsin local government property insurance fund which offers several insurance protections for motor vehicles, property and contractor’s equipment claims. Results show that the C-vine copula can better characterize the multivariate dependence structure between frequency and severity. Furthermore, we find significant dependencies especially between frequency and average severity among different coverage types.

Keywords: dependency modeling, government insurance, insurance claims, vine copula

Procedia PDF Downloads 171
6948 The Use of Boosted Multivariate Trees in Medical Decision-Making for Repeated Measurements

Authors: Ebru Turgal, Beyza Doganay Erdogan

Abstract:

Machine learning aims to model the relationship between the response and features. Medical decision-making researchers would like to make decisions about patients’ course and treatment, by examining the repeated measurements over time. Boosting approach is now being used in machine learning area for these aims as an influential tool. The aim of this study is to show the usage of multivariate tree boosting in this field. The main reason for utilizing this approach in the field of decision-making is the ease solutions of complex relationships. To show how multivariate tree boosting method can be used to identify important features and feature-time interaction, we used the data, which was collected retrospectively from Ankara University Chest Diseases Department records. Dataset includes repeated PF ratio measurements. The follow-up time is planned for 120 hours. A set of different models is tested. In conclusion, main idea of classification with weighed combination of classifiers is a reliable method which was shown with simulations several times. Furthermore, time varying variables will be taken into consideration within this concept and it could be possible to make accurate decisions about regression and survival problems.

Keywords: boosted multivariate trees, longitudinal data, multivariate regression tree, panel data

Procedia PDF Downloads 171
6947 On the Impact of Oil Price Fluctuations on Stock Markets: A Multivariate Long-Memory GARCH Framework

Authors: Manel Youssef, Lotfi Belkacem

Abstract:

This paper employs multivariate long memory GARCH models to simultaneously estimate mean and conditional variance spillover effects between oil prices and different financial markets. Since different financial assets are traded based on these market sector returns, it’s important for financial market participants to understand the volatility transmission mechanism over time and across these series in order to make optimal portfolio allocation decisions. We examine weekly returns from January 1, 2003 to November 30, 2012 and find evidence of significant transmission of shocks and volatilities between oil prices and some of the examined financial markets. The findings support the idea of cross-market hedging and sharing of common information by investors.

Keywords: oil prices, stock indices returns, oil volatility, contagion, DCC-multivariate (FI) GARCH

Procedia PDF Downloads 498
6946 Analyzing the Influence of Hydrometeorlogical Extremes, Geological Setting, and Social Demographic on Public Health

Authors: Irfan Ahmad Afip

Abstract:

This main research objective is to accurately identify the possibility for a Leptospirosis outbreak severity of a certain area based on its input features into a multivariate regression model. The research question is the possibility of an outbreak in a specific area being influenced by this feature, such as social demographics and hydrometeorological extremes. If the occurrence of an outbreak is being subjected to these features, then the epidemic severity for an area will be different depending on its environmental setting because the features will influence the possibility and severity of an outbreak. Specifically, this research objective was three-fold, namely: (a) to identify the relevant multivariate features and visualize the patterns data, (b) to develop a multivariate regression model based from the selected features and determine the possibility for Leptospirosis outbreak in an area, and (c) to compare the predictive ability of multivariate regression model and machine learning algorithms. Several secondary data features were collected locations in the state of Negeri Sembilan, Malaysia, based on the possibility it would be relevant to determine the outbreak severity in the area. The relevant features then will become an input in a multivariate regression model; a linear regression model is a simple and quick solution for creating prognostic capabilities. A multivariate regression model has proven more precise prognostic capabilities than univariate models. The expected outcome from this research is to establish a correlation between the features of social demographic and hydrometeorological with Leptospirosis bacteria; it will also become a contributor for understanding the underlying relationship between the pathogen and the ecosystem. The relationship established can be beneficial for the health department or urban planner to inspect and prepare for future outcomes in event detection and system health monitoring.

Keywords: geographical information system, hydrometeorological, leptospirosis, multivariate regression

Procedia PDF Downloads 80
6945 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate

Procedia PDF Downloads 159
6944 A Posteriori Trading-Inspired Model-Free Time Series Segmentation

Authors: Plessen Mogens Graf

Abstract:

Within the context of multivariate time series segmentation, this paper proposes a method inspired by a posteriori optimal trading. After a normalization step, time series are treated channelwise as surrogate stock prices that can be traded optimally a posteriori in a virtual portfolio holding either stock or cash. Linear transaction costs are interpreted as hyperparameters for noise filtering. Trading signals, as well as trading signals obtained on the reversed time series, are used for unsupervised channelwise labeling before a consensus over all channels is reached that determines the final segmentation time instants. The method is model-free such that no model prescriptions for segments are made. Benefits of proposed approach include simplicity, computational efficiency, and adaptability to a wide range of different shapes of time series. Performance is demonstrated on synthetic and real-world data, including a large-scale dataset comprising a multivariate time series of dimension 1000 and length 2709. Proposed method is compared to a popular model-based bottom-up approach fitting piecewise affine models and to a recent model-based top-down approach fitting Gaussian models and found to be consistently faster while producing more intuitive results in the sense of segmenting time series at peaks and valleys.

Keywords: time series segmentation, model-free, trading-inspired, multivariate data

Procedia PDF Downloads 105
6943 On-Line Data-Driven Multivariate Statistical Prediction Approach to Production Monitoring

Authors: Hyun-Woo Cho

Abstract:

Detection of incipient abnormal events in production processes is important to improve safety and reliability of manufacturing operations and reduce losses caused by failures. The construction of calibration models for predicting faulty conditions is quite essential in making decisions on when to perform preventive maintenance. This paper presents a multivariate calibration monitoring approach based on the statistical analysis of process measurement data. The calibration model is used to predict faulty conditions from historical reference data. This approach utilizes variable selection techniques, and the predictive performance of several prediction methods are evaluated using real data. The results shows that the calibration model based on supervised probabilistic model yielded best performance in this work. By adopting a proper variable selection scheme in calibration models, the prediction performance can be improved by excluding non-informative variables from their model building steps.

Keywords: calibration model, monitoring, quality improvement, feature selection

Procedia PDF Downloads 329
6942 Developing and Evaluating Clinical Risk Prediction Models for Coronary Artery Bypass Graft Surgery

Authors: Mohammadreza Mohebbi, Masoumeh Sanagou

Abstract:

The ability to predict clinical outcomes is of great importance to physicians and clinicians. A number of different methods have been used in an effort to accurately predict these outcomes. These methods include the development of scoring systems based on multivariate statistical modelling, and models involving the use of classification and regression trees. The process usually consists of two consecutive phases, namely model development and external validation. The model development phase consists of building a multivariate model and evaluating its predictive performance by examining calibration and discrimination, and internal validation. External validation tests the predictive performance of a model by assessing its calibration and discrimination in different but plausibly related patients. A motivate example focuses on prediction modeling using a sample of patients undergone coronary artery bypass graft (CABG) has been used for illustrative purpose and a set of primary considerations for evaluating prediction model studies using specific quality indicators as criteria to help stakeholders evaluate the quality of a prediction model study has been proposed.

Keywords: clinical prediction models, clinical decision rule, prognosis, external validation, model calibration, biostatistics

Procedia PDF Downloads 264
6941 Modelling Operational Risk Using Extreme Value Theory and Skew t-Copulas via Bayesian Inference

Authors: Betty Johanna Garzon Rozo, Jonathan Crook, Fernando Moreira

Abstract:

Operational risk losses are heavy tailed and are likely to be asymmetric and extremely dependent among business lines/event types. We propose a new methodology to assess, in a multivariate way, the asymmetry and extreme dependence between severity distributions, and to calculate the capital for Operational Risk. This methodology simultaneously uses (i) several parametric distributions and an alternative mix distribution (the Lognormal for the body of losses and the Generalized Pareto Distribution for the tail) via extreme value theory using SAS®, (ii) the multivariate skew t-copula applied for the first time for operational losses and (iii) Bayesian theory to estimate new n-dimensional skew t-copula models via Markov chain Monte Carlo (MCMC) simulation. This paper analyses a newly operational loss data set, SAS Global Operational Risk Data [SAS OpRisk], to model operational risk at international financial institutions. All the severity models are constructed in SAS® 9.2. We implement the procedure PROC SEVERITY and PROC NLMIXED. This paper focuses in describing this implementation.

Keywords: operational risk, loss distribution approach, extreme value theory, copulas

Procedia PDF Downloads 557
6940 Copper Price Prediction Model for Various Economic Situations

Authors: Haidy S. Ghali, Engy Serag, A. Samer Ezeldin

Abstract:

Copper is an essential raw material used in the construction industry. During the year 2021 and the first half of 2022, the global market suffered from a significant fluctuation in copper raw material prices due to the aftermath of both the COVID-19 pandemic and the Russia-Ukraine war, which exposed its consumers to an unexpected financial risk. Thereto, this paper aims to develop two ANN-LSTM price prediction models, using Python, that can forecast the average monthly copper prices traded in the London Metal Exchange; the first model is a multivariate model that forecasts the copper price of the next 1-month and the second is a univariate model that predicts the copper prices of the upcoming three months. Historical data of average monthly London Metal Exchange copper prices are collected from January 2009 till July 2022, and potential external factors are identified and employed in the multivariate model. These factors lie under three main categories: energy prices and economic indicators of the three major exporting countries of copper, depending on the data availability. Before developing the LSTM models, the collected external parameters are analyzed with respect to the copper prices using correlation and multicollinearity tests in R software; then, the parameters are further screened to select the parameters that influence the copper prices. Then, the two LSTM models are developed, and the dataset is divided into training, validation, and testing sets. The results show that the performance of the 3-Month prediction model is better than the 1-Month prediction model, but still, both models can act as predicting tools for diverse economic situations.

Keywords: copper prices, prediction model, neural network, time series forecasting

Procedia PDF Downloads 76
6939 An AK-Chart for the Non-Normal Data

Authors: Chia-Hau Liu, Tai-Yue Wang

Abstract:

Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.

Keywords: multivariate control chart, statistical process control, one-class classification method, non-normal data

Procedia PDF Downloads 391
6938 Diagonal Vector Autoregressive Models and Their Properties

Authors: Usoro Anthony E., Udoh Emediong

Abstract:

Diagonal Vector Autoregressive Models are special classes of the general vector autoregressive models identified under certain conditions, where parameters are restricted to the diagonal elements in the coefficient matrices. Variance, autocovariance, and autocorrelation properties of the upper and lower diagonal VAR models are derived. The new set of VAR models is verified with empirical data and is found to perform favourably with the general VAR models. The advantage of the diagonal models over the existing models is that the new models are parsimonious, given the reduction in the interactive coefficients of the general VAR models.

Keywords: VAR models, diagonal VAR models, variance, autocovariance, autocorrelations

Procedia PDF Downloads 80
6937 Multivariate Analysis of Spectroscopic Data for Agriculture Applications

Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman

Abstract:

In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.

Keywords: Brown rot disease, NIR spectroscopy, potato, random forest

Procedia PDF Downloads 154
6936 The Influence of Covariance Hankel Matrix Dimension on Algorithms for VARMA Models

Authors: Celina Pestano-Gabino, Concepcion Gonzalez-Concepcion, M. Candelaria Gil-Fariña

Abstract:

Some estimation methods for VARMA models, and Multivariate Time Series Models in general, rely on the use of a Hankel matrix. It is known that if the data sample is populous enough and the dimension of the Hankel matrix is unnecessarily large, this may result in an unnecessary number of computations as well as in numerical problems. In this sense, the aim of this paper is two-fold. First, we provide some theoretical results for these matrices which translate into a lower dimension for the matrices normally used in the algorithms. This contribution thus serves to improve those methods from a numerical and, presumably, statistical point of view. Second, we have chosen an estimation algorithm to illustrate in practice our improvements. The results we obtained in a simulation of VARMA models show that an increase in the size of the Hankel matrix beyond the theoretical bound proposed as valid does not necessarily lead to improved practical results. Therefore, for future research, we propose conducting similar studies using any of the linear system estimation methods that depend on Hankel matrices.

Keywords: covariances Hankel matrices, Kronecker indices, system identification, VARMA models

Procedia PDF Downloads 214
6935 The Moment of the Optimal Average Length of the Multivariate Exponentially Weighted Moving Average Control Chart for Equally Correlated Variables

Authors: Edokpa Idemudia Waziri, Salisu S. Umar

Abstract:

The Hotellng’s T^2 is a well-known statistic for detecting a shift in the mean vector of a multivariate normal distribution. Control charts based on T have been widely used in statistical process control for monitoring a multivariate process. Although it is a powerful tool, the T statistic is deficient when the shift to be detected in the mean vector of a multivariate process is small and consistent. The Multivariate Exponentially Weighted Moving Average (MEWMA) control chart is one of the control statistics used to overcome the drawback of the Hotellng’s T statistic. In this paper, the probability distribution of the Average Run Length (ARL) of the MEWMA control chart when the quality characteristics exhibit substantial cross correlation and when the process is in-control and out-of-control was derived using the Markov Chain algorithm. The derivation of the probability functions and the moments of the run length distribution were also obtained and they were consistent with some existing results for the in-control and out-of-control situation. By simulation process, the procedure identified a class of ARL for the MEWMA control when the process is in-control and out-of-control. From our study, it was observed that the MEWMA scheme is quite adequate for detecting a small shift and a good way to improve the quality of goods and services in a multivariate situation. It was also observed that as the in-control average run length ARL0¬ or the number of variables (p) increases, the optimum value of the ARL0pt increases asymptotically and as the magnitude of the shift σ increases, the optimal ARLopt decreases. Finally, we use the example from the literature to illustrate our method and demonstrate its efficiency.

Keywords: average run length, markov chain, multivariate exponentially weighted moving average, optimal smoothing parameter

Procedia PDF Downloads 389
6934 Introduction of Robust Multivariate Process Capability Indices

Authors: Behrooz Khalilloo, Hamid Shahriari, Emad Roghanian

Abstract:

Process capability indices (PCIs) are important concepts of statistical quality control and measure the capability of processes and how much processes are meeting certain specifications. An important issue in statistical quality control is parameter estimation. Under the assumption of multivariate normality, the distribution parameters, mean vector and variance-covariance matrix must be estimated, when they are unknown. Classic estimation methods like method of moment estimation (MME) or maximum likelihood estimation (MLE) makes good estimation of the population parameters when data are not contaminated. But when outliers exist in the data, MME and MLE make weak estimators of the population parameters. So we need some estimators which have good estimation in the presence of outliers. In this work robust M-estimators for estimating these parameters are used and based on robust parameter estimators, robust process capability indices are introduced. The performances of these robust estimators in the presence of outliers and their effects on process capability indices are evaluated by real and simulated multivariate data. The results indicate that the proposed robust capability indices perform much better than the existing process capability indices.

Keywords: multivariate process capability indices, robust M-estimator, outlier, multivariate quality control, statistical quality control

Procedia PDF Downloads 249
6933 A Bayesian Multivariate Microeconometric Model for Estimation of Price Elasticity of Demand

Authors: Jefferson Hernandez, Juan Padilla

Abstract:

Estimation of price elasticity of demand is a valuable tool for the task of price settling. Given its relevance, it is an active field for microeconomic and statistical research. Price elasticity in the industry of oil and gas, in particular for fuels sold in gas stations, has shown to be a challenging topic given the market and state restrictions, and underlying correlations structures between the types of fuels sold by the same gas station. This paper explores the Lotka-Volterra model for the problem for price elasticity estimation in the context of fuels; in addition, it is introduced multivariate random effects with the purpose of dealing with errors, e.g., measurement or missing data errors. In order to model the underlying correlation structures, the Inverse-Wishart, Hierarchical Half-t and LKJ distributions are studied. Here, the Bayesian paradigm through Markov Chain Monte Carlo (MCMC) algorithms for model estimation is considered. Simulation studies covering a wide range of situations were performed in order to evaluate parameter recovery for the proposed models and algorithms. Results revealed that the proposed algorithms recovered quite well all model parameters. Also, a real data set analysis was performed in order to illustrate the proposed approach.

Keywords: price elasticity, volume, correlation structures, Bayesian models

Procedia PDF Downloads 123
6932 A Non-parametric Clustering Approach for Multivariate Geostatistical Data

Authors: Francky Fouedjio

Abstract:

Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.

Keywords: clustering, geostatistics, multivariate data, non-parametric

Procedia PDF Downloads 453
6931 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart

Authors: O. Ikpotokin

Abstract:

In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.

Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics

Procedia PDF Downloads 319
6930 Combining the Dynamic Conditional Correlation and Range-GARCH Models to Improve Covariance Forecasts

Authors: Piotr Fiszeder, Marcin Fałdziński, Peter Molnár

Abstract:

The dynamic conditional correlation model of Engle (2002) is one of the most popular multivariate volatility models. However, this model is based solely on closing prices. It has been documented in the literature that the high and low price of the day can be used in an efficient volatility estimation. We, therefore, suggest a model which incorporates high and low prices into the dynamic conditional correlation framework. Empirical evaluation of this model is conducted on three datasets: currencies, stocks, and commodity exchange-traded funds. The utilisation of realized variances and covariances as proxies for true variances and covariances allows us to reach a strong conclusion that our model outperforms not only the standard dynamic conditional correlation model but also a competing range-based dynamic conditional correlation model.

Keywords: volatility, DCC model, high and low prices, range-based models, covariance forecasting

Procedia PDF Downloads 147
6929 Volatility Spillover and Hedging Effectiveness between Gold and Stock Markets: Evidence for BRICS Countries

Authors: Walid Chkili

Abstract:

This paper investigates the dynamic relationship between gold and stock markets using data for BRICS counties. For this purpose, we estimate three multivariate GARCH models (namely CCC, DCC and BEKK) for weekly stock and gold data. Our main objective is to examine time variations in conditional correlations between the two assets and to check the effectiveness use of gold as a hedge for equity markets. Empirical results reveal that dynamic conditional correlations switch between positive and negative values over the period under study. This correlation is negative during the major financial crises suggesting that gold can act as a safe haven during the major stress period of stock markets. We also evaluate the implications for portfolio diversification and hedging effectiveness for the pair gold/stock. Our findings suggest that adding gold in the stock portfolio enhance its risk-adjusted return.

Keywords: gold, financial markets, hedge, multivariate GARCH

Procedia PDF Downloads 437