Search results for: multivariate failure-time data
25395 Regression for Doubly Inflated Multivariate Poisson Distributions
Authors: Ishapathik Das, Sumen Sen, N. Rao Chaganty, Pooja Sengupta
Abstract:
Dependent multivariate count data occur in several research studies. These data can be modeled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells, and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present a real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.Keywords: copula, Gaussian copula, multivariate distributions, inflated distributios
Procedia PDF Downloads 15625394 The Modality of Multivariate Skew Normal Mixture
Authors: Bader Alruwaili, Surajit Ray
Abstract:
Finite mixtures are a flexible and powerful tool that can be used for univariate and multivariate distributions, and a wide range of research analysis has been conducted based on the multivariate normal mixture and multivariate of a t-mixture. Determining the number of modes is an important activity that, in turn, allows one to determine the number of homogeneous groups in a population. Our work currently being carried out relates to the study of the modality of the skew normal distribution in the univariate and multivariate cases. For the skew normal distribution, the aims are associated with studying the modality of the skew normal distribution and providing the ridgeline, the ridgeline elevation function, the $\Pi$ function, and the curvature function, and this will be conducive to an exploration of the number and location of mode when mixing the two components of skew normal distribution. The subsequent objective is to apply these results to the application of real world data sets, such as flow cytometry data.Keywords: mode, modality, multivariate skew normal, finite mixture, number of mode
Procedia PDF Downloads 48825393 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture
Authors: Thrivikraman Aswathi, S. Advaith
Abstract:
As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.Keywords: GAN, transformer, classification, multivariate time series
Procedia PDF Downloads 13025392 An AK-Chart for the Non-Normal Data
Authors: Chia-Hau Liu, Tai-Yue Wang
Abstract:
Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.Keywords: multivariate control chart, statistical process control, one-class classification method, non-normal data
Procedia PDF Downloads 42225391 A Non-parametric Clustering Approach for Multivariate Geostatistical Data
Authors: Francky Fouedjio
Abstract:
Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.Keywords: clustering, geostatistics, multivariate data, non-parametric
Procedia PDF Downloads 47725390 Model of Optimal Centroids Approach for Multivariate Data Classification
Authors: Pham Van Nha, Le Cam Binh
Abstract:
Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm. PSO was inspired by the natural behavior of birds and fish in migration and foraging for food. PSO is considered as a multidisciplinary optimization model that can be applied in various optimization problems. PSO’s ideas are simple and easy to understand but PSO is only applied in simple model problems. We think that in order to expand the applicability of PSO in complex problems, PSO should be described more explicitly in the form of a mathematical model. In this paper, we represent PSO in a mathematical model and apply in the multivariate data classification. First, PSOs general mathematical model (MPSO) is analyzed as a universal optimization model. Then, Model of Optimal Centroids (MOC) is proposed for the multivariate data classification. Experiments were conducted on some benchmark data sets to prove the effectiveness of MOC compared with several proposed schemes.Keywords: analysis of optimization, artificial intelligence based optimization, optimization for learning and data analysis, global optimization
Procedia PDF Downloads 20825389 Introduction of Robust Multivariate Process Capability Indices
Authors: Behrooz Khalilloo, Hamid Shahriari, Emad Roghanian
Abstract:
Process capability indices (PCIs) are important concepts of statistical quality control and measure the capability of processes and how much processes are meeting certain specifications. An important issue in statistical quality control is parameter estimation. Under the assumption of multivariate normality, the distribution parameters, mean vector and variance-covariance matrix must be estimated, when they are unknown. Classic estimation methods like method of moment estimation (MME) or maximum likelihood estimation (MLE) makes good estimation of the population parameters when data are not contaminated. But when outliers exist in the data, MME and MLE make weak estimators of the population parameters. So we need some estimators which have good estimation in the presence of outliers. In this work robust M-estimators for estimating these parameters are used and based on robust parameter estimators, robust process capability indices are introduced. The performances of these robust estimators in the presence of outliers and their effects on process capability indices are evaluated by real and simulated multivariate data. The results indicate that the proposed robust capability indices perform much better than the existing process capability indices.Keywords: multivariate process capability indices, robust M-estimator, outlier, multivariate quality control, statistical quality control
Procedia PDF Downloads 28325388 Multivariate Analysis of Spectroscopic Data for Agriculture Applications
Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman
Abstract:
In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.Keywords: Brown rot disease, NIR spectroscopy, potato, random forest
Procedia PDF Downloads 19025387 The Use of Boosted Multivariate Trees in Medical Decision-Making for Repeated Measurements
Authors: Ebru Turgal, Beyza Doganay Erdogan
Abstract:
Machine learning aims to model the relationship between the response and features. Medical decision-making researchers would like to make decisions about patients’ course and treatment, by examining the repeated measurements over time. Boosting approach is now being used in machine learning area for these aims as an influential tool. The aim of this study is to show the usage of multivariate tree boosting in this field. The main reason for utilizing this approach in the field of decision-making is the ease solutions of complex relationships. To show how multivariate tree boosting method can be used to identify important features and feature-time interaction, we used the data, which was collected retrospectively from Ankara University Chest Diseases Department records. Dataset includes repeated PF ratio measurements. The follow-up time is planned for 120 hours. A set of different models is tested. In conclusion, main idea of classification with weighed combination of classifiers is a reliable method which was shown with simulations several times. Furthermore, time varying variables will be taken into consideration within this concept and it could be possible to make accurate decisions about regression and survival problems.Keywords: boosted multivariate trees, longitudinal data, multivariate regression tree, panel data
Procedia PDF Downloads 20325386 Multi-scale Spatial and Unified Temporal Feature-fusion Network for Multivariate Time Series Anomaly Detection
Authors: Hang Yang, Jichao Li, Kewei Yang, Tianyang Lei
Abstract:
Multivariate time series anomaly detection is a significant research topic in the field of data mining, encompassing a wide range of applications across various industrial sectors such as traffic roads, financial logistics, and corporate production. The inherent spatial dependencies and temporal characteristics present in multivariate time series introduce challenges to the anomaly detection task. Previous studies have typically been based on the assumption that all variables belong to the same spatial hierarchy, neglecting the multi-level spatial relationships. To address this challenge, this paper proposes a multi-scale spatial and unified temporal feature fusion network, denoted as MSUT-Net, for multivariate time series anomaly detection. The proposed model employs a multi-level modeling approach, incorporating both temporal and spatial modules. The spatial module is designed to capture the spatial characteristics of multivariate time series data, utilizing an adaptive graph structure learning model to identify the multi-level spatial relationships between data variables and their attributes. The temporal module consists of a unified temporal processing module, which is tasked with capturing the temporal features of multivariate time series. This module is capable of simultaneously identifying temporal dependencies among different variables. Extensive testing on multiple publicly available datasets confirms that MSUT-Net achieves superior performance on the majority of datasets. Our method is able to model and accurately detect systems data with multi-level spatial relationships from a spatial-temporal perspective, providing a novel perspective for anomaly detection analysis.Keywords: data mining, industrial system, multivariate time series, anomaly detection
Procedia PDF Downloads 1525385 A Cohort and Empirical Based Multivariate Mortality Model
Authors: Jeffrey Tzu-Hao Tsai, Yi-Shan Wong
Abstract:
This article proposes a cohort-age-period (CAP) model to characterize multi-population mortality processes using cohort, age, and period variables. Distinct from the factor-based Lee-Carter-type decomposition mortality model, this approach is empirically based and includes the age, period, and cohort variables into the equation system. The model not only provides a fruitful intuition for explaining multivariate mortality change rates but also has a better performance in forecasting future patterns. Using the US and the UK mortality data and performing ten-year out-of-sample tests, our approach shows smaller mean square errors in both countries compared to the models in the literature.Keywords: longevity risk, stochastic mortality model, multivariate mortality rate, risk management
Procedia PDF Downloads 5325384 Prediction of Marine Ecosystem Changes Based on the Integrated Analysis of Multivariate Data Sets
Authors: Prozorkevitch D., Mishurov A., Sokolov K., Karsakov L., Pestrikova L.
Abstract:
The current body of knowledge about the marine environment and the dynamics of marine ecosystems includes a huge amount of heterogeneous data collected over decades. It generally includes a wide range of hydrological, biological and fishery data. Marine researchers collect these data and analyze how and why the ecosystem changes from past to present. Based on these historical records and linkages between the processes it is possible to predict future changes. Multivariate analysis of trends and their interconnection in the marine ecosystem may be used as an instrument for predicting further ecosystem evolution. A wide range of information about the components of the marine ecosystem for more than 50 years needs to be used to investigate how these arrays can help to predict the future.Keywords: barents sea ecosystem, abiotic, biotic, data sets, trends, prediction
Procedia PDF Downloads 11625383 Analyzing the Influence of Hydrometeorlogical Extremes, Geological Setting, and Social Demographic on Public Health
Authors: Irfan Ahmad Afip
Abstract:
This main research objective is to accurately identify the possibility for a Leptospirosis outbreak severity of a certain area based on its input features into a multivariate regression model. The research question is the possibility of an outbreak in a specific area being influenced by this feature, such as social demographics and hydrometeorological extremes. If the occurrence of an outbreak is being subjected to these features, then the epidemic severity for an area will be different depending on its environmental setting because the features will influence the possibility and severity of an outbreak. Specifically, this research objective was three-fold, namely: (a) to identify the relevant multivariate features and visualize the patterns data, (b) to develop a multivariate regression model based from the selected features and determine the possibility for Leptospirosis outbreak in an area, and (c) to compare the predictive ability of multivariate regression model and machine learning algorithms. Several secondary data features were collected locations in the state of Negeri Sembilan, Malaysia, based on the possibility it would be relevant to determine the outbreak severity in the area. The relevant features then will become an input in a multivariate regression model; a linear regression model is a simple and quick solution for creating prognostic capabilities. A multivariate regression model has proven more precise prognostic capabilities than univariate models. The expected outcome from this research is to establish a correlation between the features of social demographic and hydrometeorological with Leptospirosis bacteria; it will also become a contributor for understanding the underlying relationship between the pathogen and the ecosystem. The relationship established can be beneficial for the health department or urban planner to inspect and prepare for future outcomes in event detection and system health monitoring.Keywords: geographical information system, hydrometeorological, leptospirosis, multivariate regression
Procedia PDF Downloads 11525382 Multivariate Genome-Wide Association Studies for Identifying Additional Loci for Myopia
Authors: Qiao Fan, Xiaobo Guo, Junxian Zhu, Xiaohu Ding, Ching-Yu Cheng, Tien-Yin Wong, Mingguang He, Heping Zhang, Xueqin Wang
Abstract:
A systematic, simultaneous analysis of multiple phenotypes in genome-wide association studies (GWASs) draws a great attention to integrate the signals from single phenotypes with increased power. However, lacking an interpretable and efficient multivariate GWAS analysis impede the application of such approach. In this study, we propose to decompose the multivariate model into a series of simple univariate models. This transformation illuminates what exactly the individual trait contributes to the significant signals from the multivariate analyses. By employing our approach in the analysis of three myopia-related endophenotypes from the Singapore Malay Eye Study (SIMES), we identify novel candidate loci which were successfully validated in an independent Guangzhou Twin Eye Study (GTES).Keywords: GWAS multivariate, multiple traits, myopia, association
Procedia PDF Downloads 22425381 Dissimilarity-Based Coloring for Symbolic and Multivariate Data Visualization
Authors: K. Umbleja, M. Ichino, H. Yaguchi
Abstract:
In this paper, we propose a coloring method for multivariate data visualization by using parallel coordinates based on dissimilarity and tree structure information gathered during hierarchical clustering. The proposed method is an extension for proximity-based coloring that suffers from a few undesired side effects if hierarchical tree structure is not balanced tree. We describe the algorithm by assigning colors based on dissimilarity information, show the application of proposed method on three commonly used datasets, and compare the results with proximity-based coloring. We found our proposed method to be especially beneficial for symbolic data visualization where many individual objects have already been aggregated into a single symbolic object.Keywords: data visualization, dissimilarity-based coloring, proximity-based coloring, symbolic data
Procedia PDF Downloads 17025380 Multivariate Dependent Frequency-Severity Modeling of Insurance Claims: A Vine Copula Approach
Authors: Islem Kedidi, Rihab Bedoui Bensalem, Faysal Manssouri
Abstract:
In traditional models of insurance data, the number and size of claims are assumed to be independent. Relaxing the independence assumption, this article explores the Vine copula to model dependence structure between multivariate frequency and average severity of insurance claim. To illustrate this approach, we use the Wisconsin local government property insurance fund which offers several insurance protections for motor vehicles, property and contractor’s equipment claims. Results show that the C-vine copula can better characterize the multivariate dependence structure between frequency and severity. Furthermore, we find significant dependencies especially between frequency and average severity among different coverage types.Keywords: dependency modeling, government insurance, insurance claims, vine copula
Procedia PDF Downloads 20825379 Fast Bayesian Inference of Multivariate Block-Nearest Neighbor Gaussian Process (NNGP) Models for Large Data
Authors: Carlos Gonzales, Zaida Quiroz, Marcos Prates
Abstract:
Several spatial variables collected at the same location that share a common spatial distribution can be modeled simultaneously through a multivariate geostatistical model that takes into account the correlation between these variables and the spatial autocorrelation. The main goal of this model is to perform spatial prediction of these variables in the region of study. Here we focus on a geostatistical multivariate formulation that relies on sharing common spatial random effect terms. In particular, the first response variable can be modeled by a mean that incorporates a shared random spatial effect, while the other response variables depend on this shared spatial term, in addition to specific random spatial effects. Each spatial random effect is defined through a Gaussian process with a valid covariance function, but in order to improve the computational efficiency when the data are large, each Gaussian process is approximated to a Gaussian random Markov field (GRMF), specifically to the block nearest neighbor Gaussian process (Block-NNGP). This approach involves dividing the spatial domain into several dependent blocks under certain constraints, where the cross blocks allow capturing the spatial dependence on a large scale, while each individual block captures the spatial dependence on a smaller scale. The multivariate geostatistical model belongs to the class of Latent Gaussian Models; thus, to achieve fast Bayesian inference, it is used the integrated nested Laplace approximation (INLA) method. The good performance of the proposed model is shown through simulations and applications for massive data.Keywords: Block-NNGP, geostatistics, gaussian process, GRMF, INLA, multivariate models.
Procedia PDF Downloads 9725378 Effects of Video Games and Online Chat on Mathematics Performance in High School: An Approach of Multivariate Data Analysis
Authors: Lina Wu, Wenyi Lu, Ye Li
Abstract:
Regarding heavy video game players for boys and super online chat lovers for girls as a symbolic phrase in the current adolescent culture, this project of data analysis verifies the displacement effect on deteriorating mathematics performance. To evaluate correlation or regression coefficients between a factor of playing video games or chatting online and mathematics performance compared with other factors, we use multivariate analysis technique and take gender difference into account. We find the most important reason for the negative sign of the displacement effect on mathematics performance due to students’ poor academic background. Statistical analysis methods in this project could be applied to study internet users’ academic performance from the high school education to the college education.Keywords: correlation coefficients, displacement effect, multivariate analysis technique, regression coefficients
Procedia PDF Downloads 36425377 Timely Detection and Identification of Abnormalities for Process Monitoring
Authors: Hyun-Woo Cho
Abstract:
The detection and identification of multivariate manufacturing processes are quite important in order to maintain good product quality. Unusual behaviors or events encountered during its operation can have a serious impact on the process and product quality. Thus they should be detected and identified as soon as possible. This paper focused on the efficient representation of process measurement data in detecting and identifying abnormalities. This qualitative method is effective in representing fault patterns of process data. In addition, it is quite sensitive to measurement noise so that reliable outcomes can be obtained. To evaluate its performance a simulation process was utilized, and the effect of adopting linear and nonlinear methods in the detection and identification was tested with different simulation data. It has shown that the use of a nonlinear technique produced more satisfactory and more robust results for the simulation data sets. This monitoring framework can help operating personnel to detect the occurrence of process abnormalities and identify their assignable causes in an on-line or real-time basis.Keywords: detection, monitoring, identification, measurement data, multivariate techniques
Procedia PDF Downloads 23625376 Testing the Change in Correlation Structure across Markets: High-Dimensional Data
Authors: Malay Bhattacharyya, Saparya Suresh
Abstract:
The Correlation Structure associated with a portfolio is subjected to vary across time. Studying the structural breaks in the time-dependent Correlation matrix associated with a collection had been a subject of interest for a better understanding of the market movements, portfolio selection, etc. The current paper proposes a methodology for testing the change in the time-dependent correlation structure of a portfolio in the high dimensional data using the techniques of generalized inverse, singular valued decomposition and multivariate distribution theory which has not been addressed so far. The asymptotic properties of the proposed test are derived. Also, the performance and the validity of the method is tested on a real data set. The proposed test performs well for detecting the change in the dependence of global markets in the context of high dimensional data.Keywords: correlation structure, high dimensional data, multivariate distribution theory, singular valued decomposition
Procedia PDF Downloads 12525375 Multivariate Assessment of Mathematics Test Scores of Students in Qatar
Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski
Abstract:
Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.Keywords: cluster analysis, education, mathematics, profiles
Procedia PDF Downloads 12625374 The Moment of the Optimal Average Length of the Multivariate Exponentially Weighted Moving Average Control Chart for Equally Correlated Variables
Authors: Edokpa Idemudia Waziri, Salisu S. Umar
Abstract:
The Hotellng’s T^2 is a well-known statistic for detecting a shift in the mean vector of a multivariate normal distribution. Control charts based on T have been widely used in statistical process control for monitoring a multivariate process. Although it is a powerful tool, the T statistic is deficient when the shift to be detected in the mean vector of a multivariate process is small and consistent. The Multivariate Exponentially Weighted Moving Average (MEWMA) control chart is one of the control statistics used to overcome the drawback of the Hotellng’s T statistic. In this paper, the probability distribution of the Average Run Length (ARL) of the MEWMA control chart when the quality characteristics exhibit substantial cross correlation and when the process is in-control and out-of-control was derived using the Markov Chain algorithm. The derivation of the probability functions and the moments of the run length distribution were also obtained and they were consistent with some existing results for the in-control and out-of-control situation. By simulation process, the procedure identified a class of ARL for the MEWMA control when the process is in-control and out-of-control. From our study, it was observed that the MEWMA scheme is quite adequate for detecting a small shift and a good way to improve the quality of goods and services in a multivariate situation. It was also observed that as the in-control average run length ARL0¬ or the number of variables (p) increases, the optimum value of the ARL0pt increases asymptotically and as the magnitude of the shift σ increases, the optimal ARLopt decreases. Finally, we use the example from the literature to illustrate our method and demonstrate its efficiency.Keywords: average run length, markov chain, multivariate exponentially weighted moving average, optimal smoothing parameter
Procedia PDF Downloads 42225373 Multivariate Statistical Process Monitoring of Base Metal Flotation Plant Using Dissimilarity Scale-Based Singular Spectrum Analysis
Authors: Syamala Krishnannair
Abstract:
A multivariate statistical process monitoring methodology using dissimilarity scale-based singular spectrum analysis (SSA) is proposed for the detection and diagnosis of process faults in the base metal flotation plant. Process faults are detected based on the multi-level decomposition of process signals by SSA using the dissimilarity structure of the process data and the subsequent monitoring of the multiscale signals using the unified monitoring index which combines T² with SPE. Contribution plots are used to identify the root causes of the process faults. The overall results indicated that the proposed technique outperformed the conventional multivariate techniques in the detection and diagnosis of the process faults in the flotation plant.Keywords: fault detection, fault diagnosis, process monitoring, dissimilarity scale
Procedia PDF Downloads 20925372 A Data-Driven Monitoring Technique Using Combined Anomaly Detectors
Authors: Fouzi Harrou, Ying Sun, Sofiane Khadraoui
Abstract:
Anomaly detection based on Principal Component Analysis (PCA) was studied intensively and largely applied to multivariate processes with highly cross-correlated process variables. Monitoring metrics such as the Hotelling's T2 and the Q statistics are usually used in PCA-based monitoring to elucidate the pattern variations in the principal and residual subspaces, respectively. However, these metrics are ill suited to detect small faults. In this paper, the Exponentially Weighted Moving Average (EWMA) based on the Q and T statistics, T2-EWMA and Q-EWMA, were developed for detecting faults in the process mean. The performance of the proposed methods was compared with that of the conventional PCA-based fault detection method using synthetic data. The results clearly show the benefit and the effectiveness of the proposed methods over the conventional PCA method, especially for detecting small faults in highly correlated multivariate data.Keywords: data-driven method, process control, anomaly detection, dimensionality reduction
Procedia PDF Downloads 29925371 Discrimination Between Bacillus and Alicyclobacillus Isolates in Apple Juice by Fourier Transform Infrared Spectroscopy and Multivariate Analysis
Authors: Murada Alholy, Mengshi Lin, Omar Alhaj, Mahmoud Abugoush
Abstract:
Alicyclobacillus is a causative agent of spoilage in pasteurized and heat-treated apple juice products. Differentiating between this genus and the closely related Bacillus is crucially important. In this study, Fourier transform infrared spectroscopy (FT-IR) was used to identify and discriminate between four Alicyclobacillus strains and four Bacillus isolates inoculated individually into apple juice. Loading plots over the range of 1350 and 1700 cm-1 reflected the most distinctive biochemical features of Bacillus and Alicyclobacillus. Multivariate statistical methods (e.g. principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA)) were used to analyze the spectral data. Distinctive separation of spectral samples was observed. This study demonstrates that FT-IR spectroscopy in combination with multivariate analysis could serve as a rapid and effective tool for fruit juice industry to differentiate between Bacillus and Alicyclobacillus and to distinguish between species belonging to these two genera.Keywords: alicyclobacillus, bacillus, FT-IR, spectroscopy, PCA
Procedia PDF Downloads 48825370 On-Line Data-Driven Multivariate Statistical Prediction Approach to Production Monitoring
Authors: Hyun-Woo Cho
Abstract:
Detection of incipient abnormal events in production processes is important to improve safety and reliability of manufacturing operations and reduce losses caused by failures. The construction of calibration models for predicting faulty conditions is quite essential in making decisions on when to perform preventive maintenance. This paper presents a multivariate calibration monitoring approach based on the statistical analysis of process measurement data. The calibration model is used to predict faulty conditions from historical reference data. This approach utilizes variable selection techniques, and the predictive performance of several prediction methods are evaluated using real data. The results shows that the calibration model based on supervised probabilistic model yielded best performance in this work. By adopting a proper variable selection scheme in calibration models, the prediction performance can be improved by excluding non-informative variables from their model building steps.Keywords: calibration model, monitoring, quality improvement, feature selection
Procedia PDF Downloads 35625369 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart
Authors: O. Ikpotokin
Abstract:
In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics
Procedia PDF Downloads 34725368 Qualitative Data Analysis for Health Care Services
Authors: Taner Ersoz, Filiz Ersoz
Abstract:
This study was designed enable application of multivariate technique in the interpretation of categorical data for measuring health care services satisfaction in Turkey. The data was collected from a total of 17726 respondents. The establishment of the sample group and collection of the data were carried out by a joint team from The Ministry of Health and Turkish Statistical Institute (Turk Stat) of Turkey. The multiple correspondence analysis (MCA) was used on the data of 2882 respondents who answered the questionnaire in full. The multiple correspondence analysis indicated that, in the evaluation of health services females, public employees, younger and more highly educated individuals were more concerned and complainant than males, private sector employees, older and less educated individuals. Overall 53 % of the respondents were pleased with the improvements in health care services in the past three years. This study demonstrates the public consciousness in health services and health care satisfaction in Turkey. It was found that most the respondents were pleased with the improvements in health care services over the past three years. Awareness of health service quality increases with education levels. Older individuals and males would appear to have lower expectancies in health services.Keywords: multiple correspondence analysis, multivariate categorical data, health care services, health satisfaction survey
Procedia PDF Downloads 24225367 A Semiparametric Approach to Estimate the Mode of Continuous Multivariate Data
Authors: Tiee-Jian Wu, Chih-Yuan Hsu
Abstract:
Mode estimation is an important task, because it has applications to data from a wide variety of sources. We propose a semi-parametric approach to estimate the mode of an unknown continuous multivariate density function. Our approach is based on a weighted average of a parametric density estimate using the Box-Cox transform and a non-parametric kernel density estimate. Our semi-parametric mode estimate improves both the parametric- and non-parametric- mode estimates. Specifically, our mode estimate solves the non-consistency problem of parametric mode estimates (at large sample sizes) and reduces the variability of non-parametric mode estimates (at small sample sizes). The performance of our method at practical sample sizes is demonstrated by simulation examples and two real examples from the fields of climatology and image recognition.Keywords: Box-Cox transform, density estimation, mode seeking, semiparametric method
Procedia PDF Downloads 28525366 Use of Multivariate Statistical Techniques for Water Quality Monitoring Network Assessment, Case of Study: Jequetepeque River Basin
Authors: Jose Flores, Nadia Gamboa
Abstract:
A proper water quality management requires the establishment of a monitoring network. Therefore, evaluation of the efficiency of water quality monitoring networks is needed to ensure high-quality data collection of critical quality chemical parameters. Unfortunately, in some Latin American countries water quality monitoring programs are not sustainable in terms of recording historical data or environmentally representative sites wasting time, money and valuable information. In this study, multivariate statistical techniques, such as principal components analysis (PCA) and hierarchical cluster analysis (HCA), are applied for identifying the most significant monitoring sites as well as critical water quality parameters in the monitoring network of the Jequetepeque River basin, in northern Peru. The Jequetepeque River basin, like others in Peru, shows socio-environmental conflicts due to economical activities developed in this area. Water pollution by trace elements in the upper part of the basin is mainly related with mining activity, and agricultural land lost due to salinization is caused by the extensive use of groundwater in the lower part of the basin. Since the 1980s, the water quality in the basin has been non-continuously assessed by public and private organizations, and recently the National Water Authority had established permanent water quality networks in 45 basins in Peru. Despite many countries use multivariate statistical techniques for assessing water quality monitoring networks, those instruments have never been applied for that purpose in Peru. For this reason, the main contribution of this study is to demonstrate that application of the multivariate statistical techniques could serve as an instrument that allows the optimization of monitoring networks using least number of monitoring sites as well as the most significant water quality parameters, which would reduce costs concerns and improve the water quality management in Peru. Main socio-economical activities developed and the principal stakeholders related to the water management in the basin are also identified. Finally, water quality management programs will also be discussed in terms of their efficiency and sustainability.Keywords: PCA, HCA, Jequetepeque, multivariate statistical
Procedia PDF Downloads 355