Search results for: multivariate geostatistical analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 28042

Search results for: multivariate geostatistical analysis

28042 A Non-parametric Clustering Approach for Multivariate Geostatistical Data

Authors: Francky Fouedjio

Abstract:

Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.

Keywords: clustering, geostatistics, multivariate data, non-parametric

Procedia PDF Downloads 477
28041 Fast Bayesian Inference of Multivariate Block-Nearest Neighbor Gaussian Process (NNGP) Models for Large Data

Authors: Carlos Gonzales, Zaida Quiroz, Marcos Prates

Abstract:

Several spatial variables collected at the same location that share a common spatial distribution can be modeled simultaneously through a multivariate geostatistical model that takes into account the correlation between these variables and the spatial autocorrelation. The main goal of this model is to perform spatial prediction of these variables in the region of study. Here we focus on a geostatistical multivariate formulation that relies on sharing common spatial random effect terms. In particular, the first response variable can be modeled by a mean that incorporates a shared random spatial effect, while the other response variables depend on this shared spatial term, in addition to specific random spatial effects. Each spatial random effect is defined through a Gaussian process with a valid covariance function, but in order to improve the computational efficiency when the data are large, each Gaussian process is approximated to a Gaussian random Markov field (GRMF), specifically to the block nearest neighbor Gaussian process (Block-NNGP). This approach involves dividing the spatial domain into several dependent blocks under certain constraints, where the cross blocks allow capturing the spatial dependence on a large scale, while each individual block captures the spatial dependence on a smaller scale. The multivariate geostatistical model belongs to the class of Latent Gaussian Models; thus, to achieve fast Bayesian inference, it is used the integrated nested Laplace approximation (INLA) method. The good performance of the proposed model is shown through simulations and applications for massive data.

Keywords: Block-NNGP, geostatistics, gaussian process, GRMF, INLA, multivariate models.

Procedia PDF Downloads 97
28040 Geostatistical Analysis of Contamination of Soils in an Urban Area in Ghana

Authors: S. K. Appiah, E. N. Aidoo, D. Asamoah Owusu, M. W. Nuonabuor

Abstract:

Urbanization remains one of the unique predominant factors which is linked to the destruction of urban environment and its associated cases of soil contamination by heavy metals through the natural and anthropogenic activities. These activities are important sources of toxic heavy metals such as arsenic (As), cadmium (Cd), chromium (Cr), copper (Cu), iron (Fe), manganese (Mn), and lead (Pb), nickel (Ni) and zinc (Zn). Often, these heavy metals lead to increased levels in some areas due to the impact of atmospheric deposition caused by their proximity to industrial plants or the indiscriminately burning of substances. Information gathered on potentially hazardous levels of these heavy metals in soils leads to establish serious health and urban agriculture implications. However, characterization of spatial variations of soil contamination by heavy metals in Ghana is limited. Kumasi is a Metropolitan city in Ghana, West Africa and is challenged with the recent spate of deteriorating soil quality due to rapid economic development and other human activities such as “Galamsey”, illegal mining operations within the metropolis. The paper seeks to use both univariate and multivariate geostatistical techniques to assess the spatial distribution of heavy metals in soils and the potential risk associated with ingestion of sources of soil contamination in the Metropolis. Geostatistical tools have the ability to detect changes in correlation structure and how a good knowledge of the study area can help to explain the different scales of variation detected. To achieve this task, point referenced data on heavy metals measured from topsoil samples in a previous study, were collected at various locations. Linear models of regionalisation and coregionalisation were fitted to all experimental semivariograms to describe the spatial dependence between the topsoil heavy metals at different spatial scales, which led to ordinary kriging and cokriging at unsampled locations and production of risk maps of soil contamination by these heavy metals. Results obtained from both the univariate and multivariate semivariogram models showed strong spatial dependence with range of autocorrelations ranging from 100 to 300 meters. The risk maps produced show strong spatial heterogeneity for almost all the soil heavy metals with extremely risk of contamination found close to areas with commercial and industrial activities. Hence, ongoing pollution interventions should be geared towards these highly risk areas for efficient management of soil contamination to avert further pollution in the metropolis.

Keywords: coregionalization, heavy metals, multivariate geostatistical analysis, soil contamination, spatial distribution

Procedia PDF Downloads 300
28039 The Comparison of Joint Simulation and Estimation Methods for the Geometallurgical Modeling

Authors: Farzaneh Khorram

Abstract:

This paper endeavors to construct a block model to assess grinding energy consumption (CCE) and pinpoint blocks with the highest potential for energy usage during the grinding process within a specified region. Leveraging geostatistical techniques, particularly joint estimation, or simulation, based on geometallurgical data from various mineral processing stages, our objective is to forecast CCE across the study area. The dataset encompasses variables obtained from 2754 drill samples and a block model comprising 4680 blocks. The initial analysis encompassed exploratory data examination, variography, multivariate analysis, and the delineation of geological and structural units. Subsequent analysis involved the assessment of contacts between these units and the estimation of CCE via cokriging, considering its correlation with SPI. The selection of blocks exhibiting maximum CCE holds paramount importance for cost estimation, production planning, and risk mitigation. The study conducted exploratory data analysis on lithology, rock type, and failure variables, revealing seamless boundaries between geometallurgical units. Simulation methods, such as Plurigaussian and Turning band, demonstrated more realistic outcomes compared to cokriging, owing to the inherent characteristics of geometallurgical data and the limitations of kriging methods.

Keywords: geometallurgy, multivariate analysis, plurigaussian, turning band method, cokriging

Procedia PDF Downloads 70
28038 Multivariate Genome-Wide Association Studies for Identifying Additional Loci for Myopia

Authors: Qiao Fan, Xiaobo Guo, Junxian Zhu, Xiaohu Ding, Ching-Yu Cheng, Tien-Yin Wong, Mingguang He, Heping Zhang, Xueqin Wang

Abstract:

A systematic, simultaneous analysis of multiple phenotypes in genome-wide association studies (GWASs) draws a great attention to integrate the signals from single phenotypes with increased power. However, lacking an interpretable and efficient multivariate GWAS analysis impede the application of such approach. In this study, we propose to decompose the multivariate model into a series of simple univariate models. This transformation illuminates what exactly the individual trait contributes to the significant signals from the multivariate analyses. By employing our approach in the analysis of three myopia-related endophenotypes from the Singapore Malay Eye Study (SIMES), we identify novel candidate loci which were successfully validated in an independent Guangzhou Twin Eye Study (GTES).

Keywords: GWAS multivariate, multiple traits, myopia, association

Procedia PDF Downloads 224
28037 Multivariate Analysis of Spectroscopic Data for Agriculture Applications

Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman

Abstract:

In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.

Keywords: Brown rot disease, NIR spectroscopy, potato, random forest

Procedia PDF Downloads 190
28036 The Modality of Multivariate Skew Normal Mixture

Authors: Bader Alruwaili, Surajit Ray

Abstract:

Finite mixtures are a flexible and powerful tool that can be used for univariate and multivariate distributions, and a wide range of research analysis has been conducted based on the multivariate normal mixture and multivariate of a t-mixture. Determining the number of modes is an important activity that, in turn, allows one to determine the number of homogeneous groups in a population. Our work currently being carried out relates to the study of the modality of the skew normal distribution in the univariate and multivariate cases. For the skew normal distribution, the aims are associated with studying the modality of the skew normal distribution and providing the ridgeline, the ridgeline elevation function, the $\Pi$ function, and the curvature function, and this will be conducive to an exploration of the number and location of mode when mixing the two components of skew normal distribution. The subsequent objective is to apply these results to the application of real world data sets, such as flow cytometry data.

Keywords: mode, modality, multivariate skew normal, finite mixture, number of mode

Procedia PDF Downloads 488
28035 Regression for Doubly Inflated Multivariate Poisson Distributions

Authors: Ishapathik Das, Sumen Sen, N. Rao Chaganty, Pooja Sengupta

Abstract:

Dependent multivariate count data occur in several research studies. These data can be modeled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells, and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present a real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.

Keywords: copula, Gaussian copula, multivariate distributions, inflated distributios

Procedia PDF Downloads 156
28034 Assessing the Influence of Station Density on Geostatistical Prediction of Groundwater Levels in a Semi-arid Watershed of Karnataka

Authors: Sakshi Dhumale, Madhushree C., Amba Shetty

Abstract:

The effect of station density on the geostatistical prediction of groundwater levels is of critical importance to ensure accurate and reliable predictions. Monitoring station density directly impacts the accuracy and reliability of geostatistical predictions by influencing the model's ability to capture localized variations and small-scale features in groundwater levels. This is particularly crucial in regions with complex hydrogeological conditions and significant spatial heterogeneity. Insufficient station density can result in larger prediction uncertainties, as the model may struggle to adequately represent the spatial variability and correlation patterns of the data. On the other hand, an optimal distribution of monitoring stations enables effective coverage of the study area and captures the spatial variability of groundwater levels more comprehensively. In this study, we investigate the effect of station density on the predictive performance of groundwater levels using the geostatistical technique of Ordinary Kriging. The research utilizes groundwater level data collected from 121 observation wells within the semi-arid Berambadi watershed, gathered over a six-year period (2010-2015) from the Indian Institute of Science (IISc), Bengaluru. The dataset is partitioned into seven subsets representing varying sampling densities, ranging from 15% (12 wells) to 100% (121 wells) of the total well network. The results obtained from different monitoring networks are compared against the existing groundwater monitoring network established by the Central Ground Water Board (CGWB). The findings of this study demonstrate that higher station densities significantly enhance the accuracy of geostatistical predictions for groundwater levels. The increased number of monitoring stations enables improved interpolation accuracy and captures finer-scale variations in groundwater levels. These results shed light on the relationship between station density and the geostatistical prediction of groundwater levels, emphasizing the importance of appropriate station densities to ensure accurate and reliable predictions. The insights gained from this study have practical implications for designing and optimizing monitoring networks, facilitating effective groundwater level assessments, and enabling sustainable management of groundwater resources.

Keywords: station density, geostatistical prediction, groundwater levels, monitoring networks, interpolation accuracy, spatial variability

Procedia PDF Downloads 58
28033 Geostatistical Models to Correct Salinity of Soils from Landsat Satellite Sensor: Application to the Oran Region, Algeria

Authors: Dehni Abdellatif, Lounis Mourad

Abstract:

The new approach of applied spatial geostatistics in materials sciences, agriculture accuracy, agricultural statistics, permitted an apprehension of managing and monitoring the water and groundwater qualities in a relationship with salt-affected soil. The anterior experiences concerning data acquisition, spatial-preparation studies on optical and multispectral data has facilitated the integration of correction models of electrical conductivity related with soils temperature (horizons of soils). For tomography apprehension, this physical parameter has been extracted from calibration of the thermal band (LANDSAT ETM+6) with a radiometric correction. Our study area is Oran region (Northern West of Algeria). Different spectral indices are determined such as salinity and sodicity index, the Combined Spectral Reflectance Index (CSRI), Normalized Difference Vegetation Index (NDVI), emissivity, Albedo, and Sodium Adsorption Ratio (SAR). The approach of geostatistical modeling of electrical conductivity (salinity), appears to be a useful decision support system for estimating corrected electrical resistivity related to the temperature of surface soils, according to the conversion models by substitution, the reference temperature at 25°C (where hydrochemical data are collected with this constraint). The Brightness temperatures extracted from satellite reflectance (LANDSAT ETM+) are used in consistency models to estimate electrical resistivity. The confusions that arise from the effects of salt stress and water stress removed followed by seasonal application of the geostatistical analysis in Geographic Information System (GIS) techniques investigation and monitoring the variation of the electrical conductivity in the alluvial aquifer of Es-Sénia for the salt-affected soil.

Keywords: geostatistical modelling, landsat, brightness temperature, conductivity

Procedia PDF Downloads 441
28032 Neonatal Mortality, Infant Mortality, and Under-five Mortality Rates in the Provinces of Zimbabwe: A Geostatistical and Spatial Analysis of Public Health Policy Provisions

Authors: Jevonte Abioye, Dylan Savary

Abstract:

The aim of this research is to present a disaggregated geostatistical analysis of the subnational provincial trends of child mortality variation in Zimbabwe from a child health policy perspective. Soon after gaining independence in 1980, the government embarked on efforts towards promoting equitable health care, namely through the provision of primary health care. Government intervention programmes brought hope and promise, but achieving equity in primary health care coverage was hindered by previous existing disparities in maternal health care disproportionately concentrated in urban settings to the detriment of rural communities. The article highlights policies and programs adopted by the government during the millennium development goals period between 1990-2015 as a response to the inequities that characterised the country’s maternal health care. A longitudinal comparative method for a spatial variation on child mortality rates across provinces is developed based on geostatistical analysis. Cross-sectional and time-series data was extracted from the World Health Organisation (WHO) global health observatory data repository, demographic health survey reports, and previous academic and technical publications. Results suggest that although health care policy was uniform across provinces, not all provinces received the same antenatal and perinatal services. Accordingly, provincial rates of child mortality growth between 1994 and 2015 varied significantly. Evidence on the trends of child mortality rates and maternal health policies in Zimbabwe can be valuable for public child health policy planning and public service delivery design both in Zimbabwe and across developing countries pursuing the sustainable development agenda.

Keywords: antenatal care, perinatal care, infant mortality rate, neonatal mortality rate, under-five mortality rate, millennium development goals, sustainable development agenda

Procedia PDF Downloads 203
28031 Effects of Video Games and Online Chat on Mathematics Performance in High School: An Approach of Multivariate Data Analysis

Authors: Lina Wu, Wenyi Lu, Ye Li

Abstract:

Regarding heavy video game players for boys and super online chat lovers for girls as a symbolic phrase in the current adolescent culture, this project of data analysis verifies the displacement effect on deteriorating mathematics performance. To evaluate correlation or regression coefficients between a factor of playing video games or chatting online and mathematics performance compared with other factors, we use multivariate analysis technique and take gender difference into account. We find the most important reason for the negative sign of the displacement effect on mathematics performance due to students’ poor academic background. Statistical analysis methods in this project could be applied to study internet users’ academic performance from the high school education to the college education.

Keywords: correlation coefficients, displacement effect, multivariate analysis technique, regression coefficients

Procedia PDF Downloads 364
28030 Model of Optimal Centroids Approach for Multivariate Data Classification

Authors: Pham Van Nha, Le Cam Binh

Abstract:

Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm. PSO was inspired by the natural behavior of birds and fish in migration and foraging for food. PSO is considered as a multidisciplinary optimization model that can be applied in various optimization problems. PSO’s ideas are simple and easy to understand but PSO is only applied in simple model problems. We think that in order to expand the applicability of PSO in complex problems, PSO should be described more explicitly in the form of a mathematical model. In this paper, we represent PSO in a mathematical model and apply in the multivariate data classification. First, PSOs general mathematical model (MPSO) is analyzed as a universal optimization model. Then, Model of Optimal Centroids (MOC) is proposed for the multivariate data classification. Experiments were conducted on some benchmark data sets to prove the effectiveness of MOC compared with several proposed schemes.

Keywords: analysis of optimization, artificial intelligence based optimization, optimization for learning and data analysis, global optimization

Procedia PDF Downloads 208
28029 Discrimination Between Bacillus and Alicyclobacillus Isolates in Apple Juice by Fourier Transform Infrared Spectroscopy and Multivariate Analysis

Authors: Murada Alholy, Mengshi Lin, Omar Alhaj, Mahmoud Abugoush

Abstract:

Alicyclobacillus is a causative agent of spoilage in pasteurized and heat-treated apple juice products. Differentiating between this genus and the closely related Bacillus is crucially important. In this study, Fourier transform infrared spectroscopy (FT-IR) was used to identify and discriminate between four Alicyclobacillus strains and four Bacillus isolates inoculated individually into apple juice. Loading plots over the range of 1350 and 1700 cm-1 reflected the most distinctive biochemical features of Bacillus and Alicyclobacillus. Multivariate statistical methods (e.g. principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA)) were used to analyze the spectral data. Distinctive separation of spectral samples was observed. This study demonstrates that FT-IR spectroscopy in combination with multivariate analysis could serve as a rapid and effective tool for fruit juice industry to differentiate between Bacillus and Alicyclobacillus and to distinguish between species belonging to these two genera.

Keywords: alicyclobacillus, bacillus, FT-IR, spectroscopy, PCA

Procedia PDF Downloads 488
28028 Application of Bayesian Model Averaging and Geostatistical Output Perturbation to Generate Calibrated Ensemble Weather Forecast

Authors: Muhammad Luthfi, Sutikno Sutikno, Purhadi Purhadi

Abstract:

Weather forecast has necessarily been improved to provide the communities an accurate and objective prediction as well. To overcome such issue, the numerical-based weather forecast was extensively developed to reduce the subjectivity of forecast. Yet the Numerical Weather Predictions (NWPs) outputs are unfortunately issued without taking dynamical weather behavior and local terrain features into account. Thus, NWPs outputs are not able to accurately forecast the weather quantities, particularly for medium and long range forecast. The aim of this research is to aid and extend the development of ensemble forecast for Meteorology, Climatology, and Geophysics Agency of Indonesia. Ensemble method is an approach combining various deterministic forecast to produce more reliable one. However, such forecast is biased and uncalibrated due to its underdispersive or overdispersive nature. As one of the parametric methods, Bayesian Model Averaging (BMA) generates the calibrated ensemble forecast and constructs predictive PDF for specified period. Such method is able to utilize ensemble of any size but does not take spatial correlation into account. Whereas space dependencies involve the site of interest and nearby site, influenced by dynamic weather behavior. Meanwhile, Geostatistical Output Perturbation (GOP) reckons the spatial correlation to generate future weather quantities, though merely built by a single deterministic forecast, and is able to generate an ensemble of any size as well. This research conducts both BMA and GOP to generate the calibrated ensemble forecast for the daily temperature at few meteorological sites nearby Indonesia international airport.

Keywords: Bayesian Model Averaging, ensemble forecast, geostatistical output perturbation, numerical weather prediction, temperature

Procedia PDF Downloads 280
28027 Multivariate Statistical Process Monitoring of Base Metal Flotation Plant Using Dissimilarity Scale-Based Singular Spectrum Analysis

Authors: Syamala Krishnannair

Abstract:

A multivariate statistical process monitoring methodology using dissimilarity scale-based singular spectrum analysis (SSA) is proposed for the detection and diagnosis of process faults in the base metal flotation plant. Process faults are detected based on the multi-level decomposition of process signals by SSA using the dissimilarity structure of the process data and the subsequent monitoring of the multiscale signals using the unified monitoring index which combines T² with SPE. Contribution plots are used to identify the root causes of the process faults. The overall results indicated that the proposed technique outperformed the conventional multivariate techniques in the detection and diagnosis of the process faults in the flotation plant.

Keywords: fault detection, fault diagnosis, process monitoring, dissimilarity scale

Procedia PDF Downloads 209
28026 Spatial and Geostatistical Analysis of Surficial Soils of the Contiguous United States

Authors: Rachel Hetherington, Chad Deering, Ann Maclean, Snehamoy Chatterjee

Abstract:

The U.S. Geological Survey conducted a soil survey and subsequent mineralogical and geochemical analyses of over 4800 samples taken across the contiguous United States between the years 2007 and 2013. At each location, samples were taken from the top 5 cm, the A-horizon, and the C-horizon. Many studies have looked at the correlation between the mineralogical and geochemical content of soils and influencing factors such as parent lithology, climate, soil type, and age, but it seems little has been done in relation to quantifying and assessing the correlation between elements in the soil on a national scale. GIS was used for the mapping and multivariate interpolation of over 40 major and trace elements for surficial soils (0-5 cm depth). Qualitative analysis of the spatial distribution across the U.S. shows distinct patterns amongst elements both within the same periodic groups and within different periodic groups, and therefore with different behavioural characteristics. Results show the emergence of 4 main patterns of high concentration areas: vertically along the west coast, a C-shape formed through the states around Utah and northern Arizona, a V-shape through the Midwest and connecting to the Appalachians, and along the Appalachians. The Band Collection Statistics tool in GIS was used to quantitatively analyse the geochemical raster datasets and calculate a correlation matrix. Patterns emerged, which were not identified in qualitative analysis, many of which are also amongst elements with very different characteristics. Preliminary results show 41 element pairings with a strong positive correlation ( ≥ 0.75). Both qualitative and quantitative analyses on this scale could increase knowledge on the relationships between element distribution and behaviour in surficial soils of the U.S.

Keywords: correlation matrix, geochemical analyses, spatial distribution of elements, surficial soils

Procedia PDF Downloads 126
28025 A Multivariate Statistical Approach for Water Quality Assessment of River Hindon, India

Authors: Nida Rizvi, Deeksha Katyal, Varun Joshi

Abstract:

River Hindon is an important river catering the demand of highly populated rural and industrial cluster of western Uttar Pradesh, India. Water quality of river Hindon is deteriorating at an alarming rate due to various industrial, municipal and agricultural activities. The present study aimed at identifying the pollution sources and quantifying the degree to which these sources are responsible for the deteriorating water quality of the river. Various water quality parameters, like pH, temperature, electrical conductivity, total dissolved solids, total hardness, calcium, chloride, nitrate, sulphate, biological oxygen demand, chemical oxygen demand and total alkalinity were assessed. Water quality data obtained from eight study sites for one year has been subjected to the two multivariate techniques, namely, principal component analysis and cluster analysis. Principal component analysis was applied with the aim to find out spatial variability and to identify the sources responsible for the water quality of the river. Three Varifactors were obtained after varimax rotation of initial principal components using principal component analysis. Cluster analysis was carried out to classify sampling stations of certain similarity, which grouped eight different sites into two clusters. The study reveals that the anthropogenic influence (municipal, industrial, waste water and agricultural runoff) was the major source of river water pollution. Thus, this study illustrates the utility of multivariate statistical techniques for analysis and elucidation of multifaceted data sets, recognition of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

Keywords: cluster analysis, multivariate statistical techniques, river Hindon, water quality

Procedia PDF Downloads 466
28024 Multi-scale Spatial and Unified Temporal Feature-fusion Network for Multivariate Time Series Anomaly Detection

Authors: Hang Yang, Jichao Li, Kewei Yang, Tianyang Lei

Abstract:

Multivariate time series anomaly detection is a significant research topic in the field of data mining, encompassing a wide range of applications across various industrial sectors such as traffic roads, financial logistics, and corporate production. The inherent spatial dependencies and temporal characteristics present in multivariate time series introduce challenges to the anomaly detection task. Previous studies have typically been based on the assumption that all variables belong to the same spatial hierarchy, neglecting the multi-level spatial relationships. To address this challenge, this paper proposes a multi-scale spatial and unified temporal feature fusion network, denoted as MSUT-Net, for multivariate time series anomaly detection. The proposed model employs a multi-level modeling approach, incorporating both temporal and spatial modules. The spatial module is designed to capture the spatial characteristics of multivariate time series data, utilizing an adaptive graph structure learning model to identify the multi-level spatial relationships between data variables and their attributes. The temporal module consists of a unified temporal processing module, which is tasked with capturing the temporal features of multivariate time series. This module is capable of simultaneously identifying temporal dependencies among different variables. Extensive testing on multiple publicly available datasets confirms that MSUT-Net achieves superior performance on the majority of datasets. Our method is able to model and accurately detect systems data with multi-level spatial relationships from a spatial-temporal perspective, providing a novel perspective for anomaly detection analysis.

Keywords: data mining, industrial system, multivariate time series, anomaly detection

Procedia PDF Downloads 15
28023 An AK-Chart for the Non-Normal Data

Authors: Chia-Hau Liu, Tai-Yue Wang

Abstract:

Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.

Keywords: multivariate control chart, statistical process control, one-class classification method, non-normal data

Procedia PDF Downloads 422
28022 Irrigation Water Quality Evaluation Based on Multivariate Statistical Analysis: A Case Study of Jiaokou Irrigation District

Authors: Panpan Xu, Qiying Zhang, Hui Qian

Abstract:

Groundwater is main source of water supply in the Guanzhong Basin, China. To investigate the quality of groundwater for agricultural purposes in Jiaokou Irrigation District located in the east of the Guanzhong Basin, 141 groundwater samples were collected for analysis of major ions (K+, Na+, Mg2+, Ca2+, SO42-, Cl-, HCO3-, and CO32-), pH, and total dissolved solids (TDS). Sodium percentage (Na%), residual sodium carbonate (RSC), magnesium hazard (MH), and potential salinity (PS) were applied for irrigation water quality assessment. In addition, multivariate statistical techniques were used to identify the underlying hydrogeochemical processes. Results show that the content of TDS mainly depends on Cl-, Na+, Mg2+, and SO42-, and the HCO3- content is generally high except for the eastern sand area. These are responsible for complex hydrogeochemical processes, such as dissolution of carbonate minerals (dolomite and calcite), gypsum, halite, and silicate minerals, the cation exchange, as well as evaporation and concentration. The average evaluation levels of Na%, RSC, MH, and PS for irrigation water quality are doubtful, good, unsuitable, and injurious to unsatisfactory, respectively. Therefore, it is necessary for decision makers to comprehensively consider the indicators and thus reasonably evaluate the irrigation water quality.

Keywords: irrigation water quality, multivariate statistical analysis, groundwater, hydrogeochemical process

Procedia PDF Downloads 141
28021 Geostatistical Simulation of Carcinogenic Industrial Effluent on the Irrigated Soil and Groundwater, District Sheikhupura, Pakistan

Authors: Asma Shaheen, Javed Iqbal

Abstract:

The water resources are depleting due to an intrusion of industrial pollution. There are clusters of industries including leather tanning, textiles, batteries, and chemical causing contamination. These industries use bulk quantity of water and discharge it with toxic effluents. The penetration of heavy metals through irrigation from industrial effluent has toxic effect on soil and groundwater. There was strong positive significant correlation between all the heavy metals in three media of industrial effluent, soil and groundwater (P < 0.001). The metal to the metal association was supported by dendrograms using cluster analysis. The geospatial variability was assessed by using geographically weighted regression (GWR) and pollution model to identify the simulation of carcinogenic elements in soil and groundwater. The principal component analysis identified the metals source, 48.8% variation in factor 1 have significant loading for sodium (Na), calcium (Ca), magnesium (Mg), iron (Fe), chromium (Cr), nickel (Ni), lead (Pb) and zinc (Zn) of tannery effluent-based process. In soil and groundwater, the metals have significant loading in factor 1 representing more than half of the total variation with 51.3 % and 53.6 % respectively which showed that pollutants in soil and water were driven by industrial effluent. The cumulative eigen values for the three media were also found to be greater than 1 representing significant clustering of related heavy metals. The results showed that heavy metals from industrial processes are seeping up toxic trace metals in the soil and groundwater. The poisonous pollutants from heavy metals turned the fresh resources of groundwater into unusable water. The availability of fresh water for irrigation and domestic use is being alarming.

Keywords: groundwater, geostatistical, heavy metals, industrial effluent

Procedia PDF Downloads 229
28020 Applying Multivariate and Univariate Analysis of Variance on Socioeconomic, Health, and Security Variables in Jordan

Authors: Faisal G. Khamis, Ghaleb A. El-Refae

Abstract:

Many researchers have studied socioeconomic, health, and security variables in the developed countries; however, very few studies used multivariate analysis in developing countries. The current study contributes to the scarce literature about the determinants of the variance in socioeconomic, health, and security factors. Questions raised were whether the independent variables (IVs) of governorate and year impact the socioeconomic, health, and security dependent variables (DVs) in Jordan, whether the marginal mean of each DV in each governorate and in each year is significant, which governorates are similar in difference means of each DV, and whether these DVs vary. The main objectives were to determine the source of variances in DVs, collectively and separately, testing which governorates are similar and which diverge for each DV. The research design was time series and cross-sectional analysis. The main hypotheses are that IVs affect DVs collectively and separately. Multivariate and univariate analyses of variance were carried out to test these hypotheses. The population of 12 governorates in Jordan and the available data of 15 years (2000–2015) accrued from several Jordanian statistical yearbooks. We investigated the effect of two factors of governorate and year on the four DVs of divorce rate, mortality rate, unemployment percentage, and crime rate. All DVs were transformed to multivariate normal distribution. We calculated descriptive statistics for each DV. Based on the multivariate analysis of variance, we found a significant effect in IVs on DVs with p < .001. Based on the univariate analysis, we found a significant effect of IVs on each DV with p < .001, except the effect of the year factor on unemployment was not significant with p = .642. The grand and marginal means of each DV in each governorate and each year were significant based on a 95% confidence interval. Most governorates are not similar in DVs with p < .001. We concluded that the two factors produce significant effects on DVs, collectively and separately. Based on these findings, the government can distribute its financial and physical resources to governorates more efficiently. By identifying the sources of variance that contribute to the variation in DVs, insights can help inform focused variation prevention efforts.

Keywords: ANOVA, crime, divorce, governorate, hypothesis test, Jordan, MANOVA, means, mortality, unemployment, year

Procedia PDF Downloads 275
28019 The Moment of the Optimal Average Length of the Multivariate Exponentially Weighted Moving Average Control Chart for Equally Correlated Variables

Authors: Edokpa Idemudia Waziri, Salisu S. Umar

Abstract:

The Hotellng’s T^2 is a well-known statistic for detecting a shift in the mean vector of a multivariate normal distribution. Control charts based on T have been widely used in statistical process control for monitoring a multivariate process. Although it is a powerful tool, the T statistic is deficient when the shift to be detected in the mean vector of a multivariate process is small and consistent. The Multivariate Exponentially Weighted Moving Average (MEWMA) control chart is one of the control statistics used to overcome the drawback of the Hotellng’s T statistic. In this paper, the probability distribution of the Average Run Length (ARL) of the MEWMA control chart when the quality characteristics exhibit substantial cross correlation and when the process is in-control and out-of-control was derived using the Markov Chain algorithm. The derivation of the probability functions and the moments of the run length distribution were also obtained and they were consistent with some existing results for the in-control and out-of-control situation. By simulation process, the procedure identified a class of ARL for the MEWMA control when the process is in-control and out-of-control. From our study, it was observed that the MEWMA scheme is quite adequate for detecting a small shift and a good way to improve the quality of goods and services in a multivariate situation. It was also observed that as the in-control average run length ARL0¬ or the number of variables (p) increases, the optimum value of the ARL0pt increases asymptotically and as the magnitude of the shift σ increases, the optimal ARLopt decreases. Finally, we use the example from the literature to illustrate our method and demonstrate its efficiency.

Keywords: average run length, markov chain, multivariate exponentially weighted moving average, optimal smoothing parameter

Procedia PDF Downloads 422
28018 Neutral Heavy Scalar Searches via Standard Model Gauge Boson Decays at the Large Hadron Electron Collider with Multivariate Techniques

Authors: Luigi Delle Rose, Oliver Fischer, Ahmed Hammad

Abstract:

In this article, we study the prospects of the proposed Large Hadron electron Collider (LHeC) in the search for heavy neutral scalar particles. We consider a minimal model with one additional complex scalar singlet that interacts with the Standard Model (SM) via mixing with the Higgs doublet, giving rise to an SM-like Higgs boson and a heavy scalar particle. Both scalar particles are produced via vector boson fusion and can be tested via their decays into pairs of SM particles, analogously to the SM Higgs boson. Using multivariate techniques, we show that the LHeC is sensitive to heavy scalars with masses between 200 and 800 GeV down to scalar mixing of order 0.01.

Keywords: beyond the standard model, large hadron electron collider, multivariate analysis, scalar singlet

Procedia PDF Downloads 137
28017 A Cohort and Empirical Based Multivariate Mortality Model

Authors: Jeffrey Tzu-Hao Tsai, Yi-Shan Wong

Abstract:

This article proposes a cohort-age-period (CAP) model to characterize multi-population mortality processes using cohort, age, and period variables. Distinct from the factor-based Lee-Carter-type decomposition mortality model, this approach is empirically based and includes the age, period, and cohort variables into the equation system. The model not only provides a fruitful intuition for explaining multivariate mortality change rates but also has a better performance in forecasting future patterns. Using the US and the UK mortality data and performing ten-year out-of-sample tests, our approach shows smaller mean square errors in both countries compared to the models in the literature.

Keywords: longevity risk, stochastic mortality model, multivariate mortality rate, risk management

Procedia PDF Downloads 53
28016 Introduction of Robust Multivariate Process Capability Indices

Authors: Behrooz Khalilloo, Hamid Shahriari, Emad Roghanian

Abstract:

Process capability indices (PCIs) are important concepts of statistical quality control and measure the capability of processes and how much processes are meeting certain specifications. An important issue in statistical quality control is parameter estimation. Under the assumption of multivariate normality, the distribution parameters, mean vector and variance-covariance matrix must be estimated, when they are unknown. Classic estimation methods like method of moment estimation (MME) or maximum likelihood estimation (MLE) makes good estimation of the population parameters when data are not contaminated. But when outliers exist in the data, MME and MLE make weak estimators of the population parameters. So we need some estimators which have good estimation in the presence of outliers. In this work robust M-estimators for estimating these parameters are used and based on robust parameter estimators, robust process capability indices are introduced. The performances of these robust estimators in the presence of outliers and their effects on process capability indices are evaluated by real and simulated multivariate data. The results indicate that the proposed robust capability indices perform much better than the existing process capability indices.

Keywords: multivariate process capability indices, robust M-estimator, outlier, multivariate quality control, statistical quality control

Procedia PDF Downloads 283
28015 Simultaneous Determination of Six Characterizing/Quality Parameters of Biodiesels via 1H NMR and Multivariate Calibration

Authors: Gustavo G. Shimamoto, Matthieu Tubino

Abstract:

The characterization and the quality of biodiesel samples are checked by determining several parameters. Considering a large number of analysis to be performed, as well as the disadvantages of the use of toxic solvents and waste generation, multivariate calibration is suggested to reduce the number of tests. In this work, hydrogen nuclear magnetic resonance (1H NMR) spectra were used to build multivariate models, from partial least squares (PLS) regression, in order to determine simultaneously six important characterizing and/or quality parameters of biodiesels: density at 20 ºC, kinematic viscosity at 40 ºC, iodine value, acid number, oxidative stability, and water content. Biodiesels from twelve different oils sources were used in this study: babassu, brown flaxseed, canola, corn, cottonseed, macauba almond, microalgae, palm kernel, residual frying, sesame, soybean, and sunflower. 1H NMR reflects the structures of the compounds present in biodiesel samples and showed suitable correlations with the six parameters. The PLS models were constructed with latent variables between 5 and 7, the obtained values of r(cal) and r(val) were greater than 0.994 and 0.989, respectively. In addition, the models were considered suitable to predict all the six parameters for external samples, taking into account the analytical speed to perform it. Thus, the alliance between 1H NMR and PLS showed to be appropriate to characterize and evaluate the quality of biodiesels, reducing significantly analysis time, the consumption of reagents/solvents, and waste generation. Therefore, the proposed methods can be considered to adhere to the principles of green chemistry.

Keywords: biodiesel, multivariate calibration, nuclear magnetic resonance, quality parameters

Procedia PDF Downloads 539
28014 Prediction of Marine Ecosystem Changes Based on the Integrated Analysis of Multivariate Data Sets

Authors: Prozorkevitch D., Mishurov A., Sokolov K., Karsakov L., Pestrikova L.

Abstract:

The current body of knowledge about the marine environment and the dynamics of marine ecosystems includes a huge amount of heterogeneous data collected over decades. It generally includes a wide range of hydrological, biological and fishery data. Marine researchers collect these data and analyze how and why the ecosystem changes from past to present. Based on these historical records and linkages between the processes it is possible to predict future changes. Multivariate analysis of trends and their interconnection in the marine ecosystem may be used as an instrument for predicting further ecosystem evolution. A wide range of information about the components of the marine ecosystem for more than 50 years needs to be used to investigate how these arrays can help to predict the future.

Keywords: barents sea ecosystem, abiotic, biotic, data sets, trends, prediction

Procedia PDF Downloads 116
28013 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart

Authors: O. Ikpotokin

Abstract:

In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.

Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics

Procedia PDF Downloads 347