Search results for: multivariate chemometric
691 Statistical Discrimination of Blue Ballpoint Pen Inks by Diamond Attenuated Total Reflectance (ATR) FTIR
Authors: Mohamed Izzharif Abdul Halim, Niamh Nic Daeid
Abstract:
Determining the source of pen inks used on a variety of documents is impartial for forensic document examiners. The examination of inks is often performed to differentiate between inks in order to evaluate the authenticity of a document. A ballpoint pen ink consists of synthetic dyes in (acidic and/or basic), pigments (organic and/or inorganic) and a range of additives. Inks of similar color may consist of different composition and are frequently the subjects of forensic examinations. This study emphasizes on blue ballpoint pen inks available in the market because it is reported that approximately 80% of questioned documents analysis involving ballpoint pen ink. Analytical techniques such as thin layer chromatography, high-performance liquid chromatography, UV-vis spectroscopy, luminescence spectroscopy and infrared spectroscopy have been used in the analysis of ink samples. In this study, application of Diamond Attenuated Total Reflectance (ATR) FTIR is straightforward but preferable in forensic science as it offers no sample preparation and minimal analysis time. The data obtained from these techniques were further analyzed using multivariate chemometric methods which enable extraction of more information based on the similarities and differences among samples in a dataset. It was indicated that some pens from the same manufactures can be similar in composition, however, discrete types can be significantly different.Keywords: ATR FTIR, ballpoint, multivariate chemometric, PCA
Procedia PDF Downloads 458690 Chemometric Estimation of Inhibitory Activity of Benzimidazole Derivatives by Linear Least Squares and Artificial Neural Networks Modelling
Authors: Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević, Lidija R. Jevrić, Stela Jokić
Abstract:
The subject of this paper is to correlate antibacterial behavior of benzimidazole derivatives with their molecular characteristics using chemometric QSAR (Quantitative Structure–Activity Relationships) approach. QSAR analysis has been carried out on the inhibitory activity of benzimidazole derivatives against Staphylococcus aureus. The data were processed by linear least squares (LLS) and artificial neural network (ANN) procedures. The LLS mathematical models have been developed as a calibration models for prediction of the inhibitory activity. The quality of the models was validated by leave one out (LOO) technique and by using external data set. High agreement between experimental and predicted inhibitory acivities indicated the good quality of the derived models. These results are part of the CMST COST Action No. CM1306 "Understanding Movement and Mechanism in Molecular Machines".Keywords: Antibacterial, benzimidazoles, chemometric, QSAR.
Procedia PDF Downloads 317689 Chemometric-Based Voltammetric Method for Analysis of Vitamins and Heavy Metals in Honey Samples
Authors: Marwa A. A. Ragab, Amira F. El-Yazbi, Amr El-Hawiet
Abstract:
The analysis of heavy metals in honey samples is crucial. When found in honey, they denote environmental pollution. Some of these heavy metals as lead either present at low or high concentrations are considered to be toxic. Other heavy metals, for example, copper and zinc, if present at low concentrations, they considered safe even vital minerals. On the contrary, if they present at high concentrations, they are toxic. Their voltammetric determination in honey represents a challenge due to the presence of other electro-active components as vitamins, which may overlap with the peaks of the metal, hindering their accurate and precise determination. The simultaneous analysis of some vitamins: nicotinic acid (B3) and riboflavin (B2), and heavy metals: lead, cadmium, and zinc, in honey samples, was addressed. The analysis was done in 0.1 M Potassium Chloride (KCl) using a hanging mercury drop electrode (HMDE), followed by chemometric manipulation of the voltammetric data using the derivative method. Then the derivative data were convoluted using discrete Fourier functions. The proposed method allowed the simultaneous analysis of vitamins and metals though their varied responses and sensitivities. Although their peaks were overlapped, the proposed chemometric method allowed their accurate and precise analysis. After the chemometric treatment of the data, metals were successfully quantified at low levels in the presence of vitamins (1: 2000). The heavy metals limit of detection (LOD) values after the chemometric treatment of data decreased by more than 60% than those obtained from the direct voltammetric method. The method applicability was tested by analyzing the selected metals and vitamins in real honey samples obtained from different botanical origins.Keywords: chemometrics, overlapped voltammetric peaks, derivative and convoluted derivative methods, metals and vitamins
Procedia PDF Downloads 151688 Quantitative Structure–Activity Relationship Analysis of Some Benzimidazole Derivatives by Linear Multivariate Method
Authors: Strahinja Z. Kovačević, Lidija R. Jevrić, Sanja O. Podunavac Kuzmanović
Abstract:
The relationship between antibacterial activity of eighteen different substituted benzimidazole derivatives and their molecular characteristics was studied using chemometric QSAR (Quantitative Structure–Activity Relationships) approach. QSAR analysis has been carried out on inhibitory activity towards Staphylococcus aureus, by using molecular descriptors, as well as minimal inhibitory activity (MIC). Molecular descriptors were calculated from the optimized structures. Principal component analysis (PCA) followed by hierarchical cluster analysis (HCA) and multiple linear regression (MLR) was performed in order to select molecular descriptors that best describe the antibacterial behavior of the compounds investigated, and to determine the similarities between molecules. The HCA grouped the molecules in separated clusters which have the similar inhibitory activity. PCA showed very similar classification of molecules as the HCA, and displayed which descriptors contribute to that classification. MLR equations, that represent MIC as a function of the in silico molecular descriptors were established. The statistical significance of the estimated models was confirmed by standard statistical measures and cross-validation parameters (SD = 0.0816, F = 46.27, R = 0.9791, R2CV = 0.8266, R2adj = 0.9379, PRESS = 0.1116). These parameters indicate the possibility of application of the established chemometric models in prediction of the antibacterial behaviour of studied derivatives and structurally very similar compounds.Keywords: antibacterial, benzimidazole, molecular descriptors, QSAR
Procedia PDF Downloads 364687 Regression for Doubly Inflated Multivariate Poisson Distributions
Authors: Ishapathik Das, Sumen Sen, N. Rao Chaganty, Pooja Sengupta
Abstract:
Dependent multivariate count data occur in several research studies. These data can be modeled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells, and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present a real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.Keywords: copula, Gaussian copula, multivariate distributions, inflated distributios
Procedia PDF Downloads 156686 Process Optimization of Mechanochemical Synthesis for the Production of 4,4 Bipyridine Based MOFS using Twin Screw Extrusion and Multivariate Analysis
Authors: Ahmed Metawea, Rodrigo Soto, Majeida Kharejesh, Gavin Walker, Ahmad B. Albadarin
Abstract:
In this study, towards a green approach, we have investigated the effect of operating conditions of solvent assessed twin-screw extruder (TSE) for the production of 4, 4-bipyridine (1-dimensional coordinated polymer (1D)) based coordinated polymer using cobalt nitrate as a metal precursor with molar ratio 1:1. Different operating parameters such as solvent percentage, screw speed and feeding rate are considered. The resultant product is characterized using offline characterization methods, namely Powder X-ray diffraction (PXRD), Raman spectroscopy and scanning electron microscope (SEM) in order to investigate the product purity and surface morphology. A lower feeding rate increased the product’s quality as more resident time was provided for the reaction to take place. The most important influencing factor was the amount of liquid added. The addition of water helped in facilitating the reaction inside the TSE by increasing the surface area of the reaction for particlesKeywords: MOFS, multivariate analysis, process optimization, chemometric
Procedia PDF Downloads 160685 The Modality of Multivariate Skew Normal Mixture
Authors: Bader Alruwaili, Surajit Ray
Abstract:
Finite mixtures are a flexible and powerful tool that can be used for univariate and multivariate distributions, and a wide range of research analysis has been conducted based on the multivariate normal mixture and multivariate of a t-mixture. Determining the number of modes is an important activity that, in turn, allows one to determine the number of homogeneous groups in a population. Our work currently being carried out relates to the study of the modality of the skew normal distribution in the univariate and multivariate cases. For the skew normal distribution, the aims are associated with studying the modality of the skew normal distribution and providing the ridgeline, the ridgeline elevation function, the $\Pi$ function, and the curvature function, and this will be conducive to an exploration of the number and location of mode when mixing the two components of skew normal distribution. The subsequent objective is to apply these results to the application of real world data sets, such as flow cytometry data.Keywords: mode, modality, multivariate skew normal, finite mixture, number of mode
Procedia PDF Downloads 490684 Chemical Study of Volatile Organic Compounds (VOCS) from Xylopia aromatica (LAM.) Mart (Annonaceae)
Authors: Vanessa G. P. Severino, JOÃO Gabriel M. Junqueira, Michelle N. G. do Nascimento, Francisco W. B. Aquino, João B. Fernandes, Ana P. Terezan
Abstract:
The scientific interest in analyzing VOCs represents a significant modern research field as a result of importance in most branches of the present life and industry. Therefore it is extremely important to investigate, identify and isolate volatile substances, since they can be used in different areas, such as food, medicine, cosmetics, perfumery, aromatherapy, pesticides, repellents and other household products through methods for extracting volatile constituents, such as solid phase microextraction (SPME), hydrodistillation (HD), solvent extraction (SE), Soxhlet extraction, supercritical fluid extraction (SFE), stream distillation (SD) and vacuum distillation (VD). The Chemometrics is an area of chemistry that uses statistical and mathematical tools for the planning and optimization of the experimental conditions, and to extract relevant chemical information multivariate chemical data. In this context, the focus of this work was the study of the chemical VOCs by SPME of the specie X. aromatica, in search of constituents that can be used in the industrial sector as well as in food, cosmetics and perfumery, since these areas industrial has a considerable role. In addition, by chemometric analysis, we sought to maximize the answers of this research, in order to search for the largest number of compounds. The investigation of flowers from X. aromatica in vitro and in alive mode proved consistent, but certain factors supposed influence the composition of metabolites, and the chemometric analysis strengthened the analysis. Thus, the study of the chemical composition of X. aromatica contributed to the VOCs knowledge of the species and a possible application.Keywords: chemometrics, flowers, HS-SPME, Xylopia aromatica
Procedia PDF Downloads 362683 Chemometric Analysis of Raw Milk Quality Originating from Conventional and Organic Dairy Farming in AP Vojvodina, Serbia
Authors: Sanja Podunavac-Kuzmanović, Denis Kučević, Strahinja Kovačević, Milica Karadžić, Lidija Jevrić
Abstract:
The present study describes the application of chemometric methods in analysis of milk samples which were collected in a conventional dairy farm and an organic dairy farm in AP Vojvodina, Republic of Serbia. The chemometric analysis included the application of univariate regression modeling and Analysis of Variance (ANOVA) method. The ANOVA was used in order to determine the differences in fatty acids content in the milk samples from conventional and organic farm. The results of the ANOVA testing indicate that there is a highly statistically significant difference between the content of fatty acid (saturated fatty acid vs. unsaturated fatty acids) in different dairy farming. Besides, the linear univariate models have been obtained as a result of modeling the linear relationships between the milk fat content and saturated fatty acids content, and the linear relationships between the milk fat content and unsaturated fatty acids content. The models obtained on the basis of the milk samples which originate from the organic farming are statistically better than the models based on the milk samples from conventional farming.Keywords: hemometrics, milk, organic farming, quality control
Procedia PDF Downloads 240682 Simultaneous Determination of Methotrexate and Aspirin Using Fourier Transform Convolution Emission Data under Non-Parametric Linear Regression Method
Authors: Marwa A. A. Ragab, Hadir M. Maher, Eman I. El-Kimary
Abstract:
Co-administration of methotrexate (MTX) and aspirin (ASP) can cause a pharmacokinetic interaction and a subsequent increase in blood MTX concentrations which may increase the risk of MTX toxicity. Therefore, it is important to develop a sensitive, selective, accurate and precise method for their simultaneous determination in urine. A new hybrid chemometric method has been applied to the emission response data of the two drugs. Spectrofluorimetric method for determination of MTX through measurement of its acid-degradation product, 4-amino-4-deoxy-10-methylpteroic acid (4-AMP), was developed. Moreover, the acid-catalyzed degradation reaction enables the spectrofluorimetric determination of ASP through the formation of its active metabolite salicylic acid (SA). The proposed chemometric method deals with convolution of emission data using 8-points sin xi polynomials (discrete Fourier functions) after the derivative treatment of these emission data. The first and second derivative curves (D1 & D2) were obtained first then convolution of these curves was done to obtain first and second derivative under Fourier functions curves (D1/FF) and (D2/FF). This new application was used for the resolution of the overlapped emission bands of the degradation products of both drugs to allow their simultaneous indirect determination in human urine. Not only this chemometric approach was applied to the emission data but also the obtained data were subjected to non-parametric linear regression analysis (Theil’s method). The proposed method was fully validated according to the ICH guidelines and it yielded linearity ranges as follows: 0.05-0.75 and 0.5-2.5 µg mL-1 for MTX and ASP respectively. It was found that the non-parametric method was superior over the parametric one in the simultaneous determination of MTX and ASP after the chemometric treatment of the emission spectra of their degradation products. The work combines the advantages of derivative and convolution using discrete Fourier function together with the reliability and efficacy of the non-parametric analysis of data. The achieved sensitivity along with the low values of LOD (0.01 and 0.06 µg mL-1) and LOQ (0.04 and 0.2 µg mL-1) for MTX and ASP respectively, by the second derivative under Fourier functions (D2/FF) were promising and guarantee its application for monitoring the two drugs in patients’ urine samples.Keywords: chemometrics, emission curves, derivative, convolution, Fourier transform, human urine, non-parametric regression, Theil’s method
Procedia PDF Downloads 430681 Multivariate Genome-Wide Association Studies for Identifying Additional Loci for Myopia
Authors: Qiao Fan, Xiaobo Guo, Junxian Zhu, Xiaohu Ding, Ching-Yu Cheng, Tien-Yin Wong, Mingguang He, Heping Zhang, Xueqin Wang
Abstract:
A systematic, simultaneous analysis of multiple phenotypes in genome-wide association studies (GWASs) draws a great attention to integrate the signals from single phenotypes with increased power. However, lacking an interpretable and efficient multivariate GWAS analysis impede the application of such approach. In this study, we propose to decompose the multivariate model into a series of simple univariate models. This transformation illuminates what exactly the individual trait contributes to the significant signals from the multivariate analyses. By employing our approach in the analysis of three myopia-related endophenotypes from the Singapore Malay Eye Study (SIMES), we identify novel candidate loci which were successfully validated in an independent Guangzhou Twin Eye Study (GTES).Keywords: GWAS multivariate, multiple traits, myopia, association
Procedia PDF Downloads 225680 Dow Polyols near Infrared Chemometric Model Reduction Based on Clustering: Reducing Thirty Global Hydroxyl Number (OH) Models to Less Than Five
Authors: Wendy Flory, Kazi Czarnecki, Matthijs Mercy, Mark Joswiak, Mary Beth Seasholtz
Abstract:
Polyurethane Materials are present in a wide range of industrial segments such as Furniture, Building and Construction, Composites, Automotive, Electronics, and more. Dow is one of the leaders for the manufacture of the two main raw materials, Isocyanates and Polyols used to produce polyurethane products. Dow is also a key player for the manufacture of Polyurethane Systems/Formulations designed for targeted applications. In 1990, the first analytical chemometric models were developed and deployed for use in the Dow QC labs of the polyols business for the quantification of OH, water, cloud point, and viscosity. Over the years many models have been added; there are now over 140 models for quantification and hundreds for product identification, too many to be reasonable for support. There are 29 global models alone for the quantification of OH across > 70 products at many sites. An attempt was made to consolidate these into a single model. While the consolidated model proved good statistics across the entire range of OH, several products had a bias by ASTM E1655 with individual product validation. This project summary will show the strategy for global model updates for OH, to reduce the number of models for quantification from over 140 to 5 or less using chemometric methods. In order to gain an understanding of the best product groupings, we identify clusters by reducing spectra to a few dimensions via Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). Results from these cluster analyses and a separate validation set allowed dow to reduce the number of models for predicting OH from 29 to 3 without loss of accuracy.Keywords: hydroxyl, global model, model maintenance, near infrared, polyol
Procedia PDF Downloads 136679 Discriminating Between Energy Drinks and Sports Drinks Based on Their Chemical Properties Using Chemometric Methods
Authors: Robert Cazar, Nathaly Maza
Abstract:
Energy drinks and sports drinks are quite popular among young adults and teenagers worldwide. Some concerns regarding their health effects – particularly those of the energy drinks - have been raised based on scientific findings. Differentiating between these two types of drinks by means of their chemical properties seems to be an instructive task. Chemometrics provides the most appropriate strategy to do so. In this study, a discrimination analysis of the energy and sports drinks has been carried out applying chemometric methods. A set of eleven samples of available commercial brands of drinks – seven energy drinks and four sports drinks – were collected. Each sample was characterized by eight chemical variables (carbohydrates, energy, sugar, sodium, pH, degrees Brix, density, and citric acid). The data set was standardized and examined by exploratory chemometric techniques such as clustering and principal component analysis. As a preliminary step, a variable selection was carried out by inspecting the variable correlation matrix. It was detected that some variables are redundant, so they can be safely removed, leaving only five variables that are sufficient for this analysis. They are sugar, sodium, pH, density, and citric acid. Then, a hierarchical clustering `employing the average – linkage criterion and using the Euclidian distance metrics was performed. It perfectly separates the two types of drinks since the resultant dendogram, cut at the 25% similarity level, assorts the samples in two well defined groups, one of them containing the energy drinks and the other one the sports drinks. Further assurance of the complete discrimination is provided by the principal component analysis. The projection of the data set on the first two principal components – which retain the 71% of the data information – permits to visualize the distribution of the samples in the two groups identified in the clustering stage. Since the first principal component is the discriminating one, the inspection of its loadings consents to characterize such groups. The energy drinks group possesses medium to high values of density, citric acid, and sugar. The sports drinks group, on the other hand, exhibits low values of those variables. In conclusion, the application of chemometric methods on a data set that features some chemical properties of a number of energy and sports drinks provides an accurate, dependable way to discriminate between these two types of beverages.Keywords: chemometrics, clustering, energy drinks, principal component analysis, sports drinks
Procedia PDF Downloads 109678 Evaluation of Newly Synthesized Steroid Derivatives Using In silico Molecular Descriptors and Chemometric Techniques
Authors: Milica Ž. Karadžić, Lidija R. Jevrić, Sanja Podunavac-Kuzmanović, Strahinja Z. Kovačević, Anamarija I. Mandić, Katarina Penov-Gaši, Andrea R. Nikolić, Aleksandar M. Oklješa
Abstract:
This study considered selection of the in silico molecular descriptors and the models for newly synthesized steroid derivatives description and their characterization using chemometric techniques. Multiple linear regression (MLR) models were established and gave the best molecular descriptors for quantitative structure-retention relationship (QSRR) modeling of the retention of the investigated molecules. MLR models were without multicollinearity among the selected molecular descriptors according to the variance inflation factor (VIF) values. Used molecular descriptors were ranked using generalized pair correlation method (GPCM). In this method, the significant difference between independent variables can be noticed regardless almost equal correlation between dependent variable. Generated MLR models were statistically and cross-validated and the best models were kept. Models were ranked using sum of ranking differences (SRD) method. According to this method, the most consistent QSRR model can be found and similarity or dissimilarity between the models could be noticed. In this study, SRD was performed using average values of experimentally observed data as a golden standard. Chemometric analysis was conducted in order to characterize newly synthesized steroid derivatives for further investigation regarding their potential biological activity and further synthesis. This article is based upon work from COST Action (CM1105), supported by COST (European Cooperation in Science and Technology).Keywords: generalized pair correlation method, molecular descriptors, regression analysis, steroids, sum of ranking differences
Procedia PDF Downloads 348677 Development and Validation of First Derivative Method and Artificial Neural Network for Simultaneous Spectrophotometric Determination of Two Closely Related Antioxidant Nutraceuticals in Their Binary Mixture”
Authors: Mohamed Korany, Azza Gazy, Essam Khamis, Marwa Adel, Miranda Fawzy
Abstract:
Background: Two new, simple and specific methods; First, a Zero-crossing first-derivative technique and second, a chemometric-assisted spectrophotometric artificial neural network (ANN) were developed and validated in accordance with ICH guidelines. Both methods were used for the simultaneous estimation of the two closely related antioxidant nutraceuticals ; Coenzyme Q10 (Q) ; also known as Ubidecarenone or Ubiquinone-10, and Vitamin E (E); alpha-tocopherol acetate, in their pharmaceutical binary mixture. Results: For first method: By applying the first derivative, both Q and E were alternatively determined; each at the zero-crossing of the other. The D1 amplitudes of Q and E, at 285 nm and 235 nm respectively, were recorded and correlated to their concentrations. The calibration curve is linear over the concentration range of 10-60 and 5.6-70 μg mL-1 for Q and E, respectively. For second method: ANN (as a multivariate calibration method) was developed and applied for the simultaneous determination of both analytes. A training set (or a concentration set) of 90 different synthetic mixtures containing Q and E, in wide concentration ranges between 0-100 µg/mL and 0-556 µg/mL respectively, were prepared in ethanol. The absorption spectra of the training sets were recorded in the spectral region of 230–300 nm. A Gradient Descend Back Propagation ANN chemometric calibration was computed by relating the concentration sets (x-block) to their corresponding absorption data (y-block). Another set of 45 synthetic mixtures of the two drugs, in defined range, was used to validate the proposed network. Neither chemical separation, preparation stage nor mathematical graphical treatment were required. Conclusions: The proposed methods were successfully applied for the assay of Q and E in laboratory prepared mixtures and combined pharmaceutical tablet with excellent recoveries. The ANN method was superior over the derivative technique as the former determined both drugs in the non-linear experimental conditions. It also offers rapidity, high accuracy, effort and money saving. Moreover, no need for an analyst for its application. Although the ANN technique needed a large training set, it is the method of choice in the routine analysis of Q and E tablet. No interference was observed from common pharmaceutical additives. The results of the two methods were compared togetherKeywords: coenzyme Q10, vitamin E, chemometry, quantitative analysis, first derivative spectrophotometry, artificial neural network
Procedia PDF Downloads 447676 An AK-Chart for the Non-Normal Data
Authors: Chia-Hau Liu, Tai-Yue Wang
Abstract:
Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.Keywords: multivariate control chart, statistical process control, one-class classification method, non-normal data
Procedia PDF Downloads 423675 Multivariate Analysis of Spectroscopic Data for Agriculture Applications
Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman
Abstract:
In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.Keywords: Brown rot disease, NIR spectroscopy, potato, random forest
Procedia PDF Downloads 190674 The Moment of the Optimal Average Length of the Multivariate Exponentially Weighted Moving Average Control Chart for Equally Correlated Variables
Authors: Edokpa Idemudia Waziri, Salisu S. Umar
Abstract:
The Hotellng’s T^2 is a well-known statistic for detecting a shift in the mean vector of a multivariate normal distribution. Control charts based on T have been widely used in statistical process control for monitoring a multivariate process. Although it is a powerful tool, the T statistic is deficient when the shift to be detected in the mean vector of a multivariate process is small and consistent. The Multivariate Exponentially Weighted Moving Average (MEWMA) control chart is one of the control statistics used to overcome the drawback of the Hotellng’s T statistic. In this paper, the probability distribution of the Average Run Length (ARL) of the MEWMA control chart when the quality characteristics exhibit substantial cross correlation and when the process is in-control and out-of-control was derived using the Markov Chain algorithm. The derivation of the probability functions and the moments of the run length distribution were also obtained and they were consistent with some existing results for the in-control and out-of-control situation. By simulation process, the procedure identified a class of ARL for the MEWMA control when the process is in-control and out-of-control. From our study, it was observed that the MEWMA scheme is quite adequate for detecting a small shift and a good way to improve the quality of goods and services in a multivariate situation. It was also observed that as the in-control average run length ARL0¬ or the number of variables (p) increases, the optimum value of the ARL0pt increases asymptotically and as the magnitude of the shift σ increases, the optimal ARLopt decreases. Finally, we use the example from the literature to illustrate our method and demonstrate its efficiency.Keywords: average run length, markov chain, multivariate exponentially weighted moving average, optimal smoothing parameter
Procedia PDF Downloads 422673 A Cohort and Empirical Based Multivariate Mortality Model
Authors: Jeffrey Tzu-Hao Tsai, Yi-Shan Wong
Abstract:
This article proposes a cohort-age-period (CAP) model to characterize multi-population mortality processes using cohort, age, and period variables. Distinct from the factor-based Lee-Carter-type decomposition mortality model, this approach is empirically based and includes the age, period, and cohort variables into the equation system. The model not only provides a fruitful intuition for explaining multivariate mortality change rates but also has a better performance in forecasting future patterns. Using the US and the UK mortality data and performing ten-year out-of-sample tests, our approach shows smaller mean square errors in both countries compared to the models in the literature.Keywords: longevity risk, stochastic mortality model, multivariate mortality rate, risk management
Procedia PDF Downloads 56672 Introduction of Robust Multivariate Process Capability Indices
Authors: Behrooz Khalilloo, Hamid Shahriari, Emad Roghanian
Abstract:
Process capability indices (PCIs) are important concepts of statistical quality control and measure the capability of processes and how much processes are meeting certain specifications. An important issue in statistical quality control is parameter estimation. Under the assumption of multivariate normality, the distribution parameters, mean vector and variance-covariance matrix must be estimated, when they are unknown. Classic estimation methods like method of moment estimation (MME) or maximum likelihood estimation (MLE) makes good estimation of the population parameters when data are not contaminated. But when outliers exist in the data, MME and MLE make weak estimators of the population parameters. So we need some estimators which have good estimation in the presence of outliers. In this work robust M-estimators for estimating these parameters are used and based on robust parameter estimators, robust process capability indices are introduced. The performances of these robust estimators in the presence of outliers and their effects on process capability indices are evaluated by real and simulated multivariate data. The results indicate that the proposed robust capability indices perform much better than the existing process capability indices.Keywords: multivariate process capability indices, robust M-estimator, outlier, multivariate quality control, statistical quality control
Procedia PDF Downloads 284671 Chemometric Regression Analysis of Radical Scavenging Ability of Kombucha Fermented Kefir-Like Products
Authors: Strahinja Kovacevic, Milica Karadzic Banjac, Jasmina Vitas, Stefan Vukmanovic, Radomir Malbasa, Lidija Jevric, Sanja Podunavac-Kuzmanovic
Abstract:
The present study deals with chemometric regression analysis of quality parameters and the radical scavenging ability of kombucha fermented kefir-like products obtained with winter savory (WS), peppermint (P), stinging nettle (SN) and wild thyme tea (WT) kombucha inoculums. Each analyzed sample was described by milk fat content (MF, %), total unsaturated fatty acids content (TUFA, %), monounsaturated fatty acids content (MUFA, %), polyunsaturated fatty acids content (PUFA, %), the ability of free radicals scavenging (RSA Dₚₚₕ, % and RSA.ₒₕ, %) and pH values measured after each hour from the start until the end of fermentation. The aim of the conducted regression analysis was to establish chemometric models which can predict the radical scavenging ability (RSA Dₚₚₕ, % and RSA.ₒₕ, %) of the samples by correlating it with the MF, TUFA, MUFA, PUFA and the pH value at the beginning, in the middle and at the end of fermentation process which lasted between 11 and 17 hours, until pH value of 4.5 was reached. The analysis was carried out applying univariate linear (ULR) and multiple linear regression (MLR) methods on the raw data and the data standardized by the min-max normalization method. The obtained models were characterized by very limited prediction power (poor cross-validation parameters) and weak statistical characteristics. Based on the conducted analysis it can be concluded that the resulting radical scavenging ability cannot be precisely predicted only on the basis of MF, TUFA, MUFA, PUFA content, and pH values, however, other quality parameters should be considered and included in the further modeling. This study is based upon work from project: Kombucha beverages production using alternative substrates from the territory of the Autonomous Province of Vojvodina, 142-451-2400/2019-03, supported by Provincial Secretariat for Higher Education and Scientific Research of AP Vojvodina.Keywords: chemometrics, regression analysis, kombucha, quality control
Procedia PDF Downloads 143670 A Non-parametric Clustering Approach for Multivariate Geostatistical Data
Authors: Francky Fouedjio
Abstract:
Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.Keywords: clustering, geostatistics, multivariate data, non-parametric
Procedia PDF Downloads 477669 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart
Authors: O. Ikpotokin
Abstract:
In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics
Procedia PDF Downloads 348668 The Use of Boosted Multivariate Trees in Medical Decision-Making for Repeated Measurements
Authors: Ebru Turgal, Beyza Doganay Erdogan
Abstract:
Machine learning aims to model the relationship between the response and features. Medical decision-making researchers would like to make decisions about patients’ course and treatment, by examining the repeated measurements over time. Boosting approach is now being used in machine learning area for these aims as an influential tool. The aim of this study is to show the usage of multivariate tree boosting in this field. The main reason for utilizing this approach in the field of decision-making is the ease solutions of complex relationships. To show how multivariate tree boosting method can be used to identify important features and feature-time interaction, we used the data, which was collected retrospectively from Ankara University Chest Diseases Department records. Dataset includes repeated PF ratio measurements. The follow-up time is planned for 120 hours. A set of different models is tested. In conclusion, main idea of classification with weighed combination of classifiers is a reliable method which was shown with simulations several times. Furthermore, time varying variables will be taken into consideration within this concept and it could be possible to make accurate decisions about regression and survival problems.Keywords: boosted multivariate trees, longitudinal data, multivariate regression tree, panel data
Procedia PDF Downloads 203667 Analyzing the Influence of Hydrometeorlogical Extremes, Geological Setting, and Social Demographic on Public Health
Authors: Irfan Ahmad Afip
Abstract:
This main research objective is to accurately identify the possibility for a Leptospirosis outbreak severity of a certain area based on its input features into a multivariate regression model. The research question is the possibility of an outbreak in a specific area being influenced by this feature, such as social demographics and hydrometeorological extremes. If the occurrence of an outbreak is being subjected to these features, then the epidemic severity for an area will be different depending on its environmental setting because the features will influence the possibility and severity of an outbreak. Specifically, this research objective was three-fold, namely: (a) to identify the relevant multivariate features and visualize the patterns data, (b) to develop a multivariate regression model based from the selected features and determine the possibility for Leptospirosis outbreak in an area, and (c) to compare the predictive ability of multivariate regression model and machine learning algorithms. Several secondary data features were collected locations in the state of Negeri Sembilan, Malaysia, based on the possibility it would be relevant to determine the outbreak severity in the area. The relevant features then will become an input in a multivariate regression model; a linear regression model is a simple and quick solution for creating prognostic capabilities. A multivariate regression model has proven more precise prognostic capabilities than univariate models. The expected outcome from this research is to establish a correlation between the features of social demographic and hydrometeorological with Leptospirosis bacteria; it will also become a contributor for understanding the underlying relationship between the pathogen and the ecosystem. The relationship established can be beneficial for the health department or urban planner to inspect and prepare for future outcomes in event detection and system health monitoring.Keywords: geographical information system, hydrometeorological, leptospirosis, multivariate regression
Procedia PDF Downloads 117666 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture
Authors: Thrivikraman Aswathi, S. Advaith
Abstract:
As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.Keywords: GAN, transformer, classification, multivariate time series
Procedia PDF Downloads 131665 Multi-scale Spatial and Unified Temporal Feature-fusion Network for Multivariate Time Series Anomaly Detection
Authors: Hang Yang, Jichao Li, Kewei Yang, Tianyang Lei
Abstract:
Multivariate time series anomaly detection is a significant research topic in the field of data mining, encompassing a wide range of applications across various industrial sectors such as traffic roads, financial logistics, and corporate production. The inherent spatial dependencies and temporal characteristics present in multivariate time series introduce challenges to the anomaly detection task. Previous studies have typically been based on the assumption that all variables belong to the same spatial hierarchy, neglecting the multi-level spatial relationships. To address this challenge, this paper proposes a multi-scale spatial and unified temporal feature fusion network, denoted as MSUT-Net, for multivariate time series anomaly detection. The proposed model employs a multi-level modeling approach, incorporating both temporal and spatial modules. The spatial module is designed to capture the spatial characteristics of multivariate time series data, utilizing an adaptive graph structure learning model to identify the multi-level spatial relationships between data variables and their attributes. The temporal module consists of a unified temporal processing module, which is tasked with capturing the temporal features of multivariate time series. This module is capable of simultaneously identifying temporal dependencies among different variables. Extensive testing on multiple publicly available datasets confirms that MSUT-Net achieves superior performance on the majority of datasets. Our method is able to model and accurately detect systems data with multi-level spatial relationships from a spatial-temporal perspective, providing a novel perspective for anomaly detection analysis.Keywords: data mining, industrial system, multivariate time series, anomaly detection
Procedia PDF Downloads 17664 Multivariate Dependent Frequency-Severity Modeling of Insurance Claims: A Vine Copula Approach
Authors: Islem Kedidi, Rihab Bedoui Bensalem, Faysal Manssouri
Abstract:
In traditional models of insurance data, the number and size of claims are assumed to be independent. Relaxing the independence assumption, this article explores the Vine copula to model dependence structure between multivariate frequency and average severity of insurance claim. To illustrate this approach, we use the Wisconsin local government property insurance fund which offers several insurance protections for motor vehicles, property and contractor’s equipment claims. Results show that the C-vine copula can better characterize the multivariate dependence structure between frequency and severity. Furthermore, we find significant dependencies especially between frequency and average severity among different coverage types.Keywords: dependency modeling, government insurance, insurance claims, vine copula
Procedia PDF Downloads 210663 On the Impact of Oil Price Fluctuations on Stock Markets: A Multivariate Long-Memory GARCH Framework
Authors: Manel Youssef, Lotfi Belkacem
Abstract:
This paper employs multivariate long memory GARCH models to simultaneously estimate mean and conditional variance spillover effects between oil prices and different financial markets. Since different financial assets are traded based on these market sector returns, it’s important for financial market participants to understand the volatility transmission mechanism over time and across these series in order to make optimal portfolio allocation decisions. We examine weekly returns from January 1, 2003 to November 30, 2012 and find evidence of significant transmission of shocks and volatilities between oil prices and some of the examined financial markets. The findings support the idea of cross-market hedging and sharing of common information by investors.Keywords: oil prices, stock indices returns, oil volatility, contagion, DCC-multivariate (FI) GARCH
Procedia PDF Downloads 534662 Model of Optimal Centroids Approach for Multivariate Data Classification
Authors: Pham Van Nha, Le Cam Binh
Abstract:
Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm. PSO was inspired by the natural behavior of birds and fish in migration and foraging for food. PSO is considered as a multidisciplinary optimization model that can be applied in various optimization problems. PSO’s ideas are simple and easy to understand but PSO is only applied in simple model problems. We think that in order to expand the applicability of PSO in complex problems, PSO should be described more explicitly in the form of a mathematical model. In this paper, we represent PSO in a mathematical model and apply in the multivariate data classification. First, PSOs general mathematical model (MPSO) is analyzed as a universal optimization model. Then, Model of Optimal Centroids (MOC) is proposed for the multivariate data classification. Experiments were conducted on some benchmark data sets to prove the effectiveness of MOC compared with several proposed schemes.Keywords: analysis of optimization, artificial intelligence based optimization, optimization for learning and data analysis, global optimization
Procedia PDF Downloads 209