Search results for: multivariate data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 40728

Search results for: multivariate data analysis

40698 Multivariate Control Chart to Determine Efficiency Measurements in Industrial Processes

Authors: J. J. Vargas, N. Prieto, L. A. Toro

Abstract:

Control charts are commonly used to monitor processes involving either variable or attribute of quality characteristics and determining the control limits as a critical task for quality engineers to improve the processes. Nonetheless, in some applications it is necessary to include an estimation of efficiency. In this paper, the ability to define the efficiency of an industrial process was added to a control chart by means of incorporating a data envelopment analysis (DEA) approach. In depth, a Bayesian estimation was performed to calculate the posterior probability distribution of parameters as means and variance and covariance matrix. This technique allows to analyse the data set without the need of using the hypothetical large sample implied in the problem and to be treated as an approximation to the finite sample distribution. A rejection simulation method was carried out to generate random variables from the parameter functions. Each resulting vector was used by stochastic DEA model during several cycles for establishing the distribution of each efficiency measures for each DMU (decision making units). A control limit was calculated with model obtained and if a condition of a low level efficiency of DMU is presented, system efficiency is out of control. In the efficiency calculated a global optimum was reached, which ensures model reliability.

Keywords: data envelopment analysis, DEA, Multivariate control chart, rejection simulation method

Procedia PDF Downloads 353
40697 Fast Bayesian Inference of Multivariate Block-Nearest Neighbor Gaussian Process (NNGP) Models for Large Data

Authors: Carlos Gonzales, Zaida Quiroz, Marcos Prates

Abstract:

Several spatial variables collected at the same location that share a common spatial distribution can be modeled simultaneously through a multivariate geostatistical model that takes into account the correlation between these variables and the spatial autocorrelation. The main goal of this model is to perform spatial prediction of these variables in the region of study. Here we focus on a geostatistical multivariate formulation that relies on sharing common spatial random effect terms. In particular, the first response variable can be modeled by a mean that incorporates a shared random spatial effect, while the other response variables depend on this shared spatial term, in addition to specific random spatial effects. Each spatial random effect is defined through a Gaussian process with a valid covariance function, but in order to improve the computational efficiency when the data are large, each Gaussian process is approximated to a Gaussian random Markov field (GRMF), specifically to the block nearest neighbor Gaussian process (Block-NNGP). This approach involves dividing the spatial domain into several dependent blocks under certain constraints, where the cross blocks allow capturing the spatial dependence on a large scale, while each individual block captures the spatial dependence on a smaller scale. The multivariate geostatistical model belongs to the class of Latent Gaussian Models; thus, to achieve fast Bayesian inference, it is used the integrated nested Laplace approximation (INLA) method. The good performance of the proposed model is shown through simulations and applications for massive data.

Keywords: Block-NNGP, geostatistics, gaussian process, GRMF, INLA, multivariate models.

Procedia PDF Downloads 67
40696 Multivariate Dependent Frequency-Severity Modeling of Insurance Claims: A Vine Copula Approach

Authors: Islem Kedidi, Rihab Bedoui Bensalem, Faysal Manssouri

Abstract:

In traditional models of insurance data, the number and size of claims are assumed to be independent. Relaxing the independence assumption, this article explores the Vine copula to model dependence structure between multivariate frequency and average severity of insurance claim. To illustrate this approach, we use the Wisconsin local government property insurance fund which offers several insurance protections for motor vehicles, property and contractor’s equipment claims. Results show that the C-vine copula can better characterize the multivariate dependence structure between frequency and severity. Furthermore, we find significant dependencies especially between frequency and average severity among different coverage types.

Keywords: dependency modeling, government insurance, insurance claims, vine copula

Procedia PDF Downloads 176
40695 Timely Detection and Identification of Abnormalities for Process Monitoring

Authors: Hyun-Woo Cho

Abstract:

The detection and identification of multivariate manufacturing processes are quite important in order to maintain good product quality. Unusual behaviors or events encountered during its operation can have a serious impact on the process and product quality. Thus they should be detected and identified as soon as possible. This paper focused on the efficient representation of process measurement data in detecting and identifying abnormalities. This qualitative method is effective in representing fault patterns of process data. In addition, it is quite sensitive to measurement noise so that reliable outcomes can be obtained. To evaluate its performance a simulation process was utilized, and the effect of adopting linear and nonlinear methods in the detection and identification was tested with different simulation data. It has shown that the use of a nonlinear technique produced more satisfactory and more robust results for the simulation data sets. This monitoring framework can help operating personnel to detect the occurrence of process abnormalities and identify their assignable causes in an on-line or real-time basis.

Keywords: detection, monitoring, identification, measurement data, multivariate techniques

Procedia PDF Downloads 204
40694 Testing the Change in Correlation Structure across Markets: High-Dimensional Data

Authors: Malay Bhattacharyya, Saparya Suresh

Abstract:

The Correlation Structure associated with a portfolio is subjected to vary across time. Studying the structural breaks in the time-dependent Correlation matrix associated with a collection had been a subject of interest for a better understanding of the market movements, portfolio selection, etc. The current paper proposes a methodology for testing the change in the time-dependent correlation structure of a portfolio in the high dimensional data using the techniques of generalized inverse, singular valued decomposition and multivariate distribution theory which has not been addressed so far. The asymptotic properties of the proposed test are derived. Also, the performance and the validity of the method is tested on a real data set. The proposed test performs well for detecting the change in the dependence of global markets in the context of high dimensional data.

Keywords: correlation structure, high dimensional data, multivariate distribution theory, singular valued decomposition

Procedia PDF Downloads 103
40693 Small Target Recognition Based on Trajectory Information

Authors: Saad Alkentar, Abdulkareem Assalem

Abstract:

Recognizing small targets has always posed a significant challenge in image analysis. Over long distances, the image signal-to-noise ratio tends to be low, limiting the amount of useful information available to detection systems. Consequently, visual target recognition becomes an intricate task to tackle. In this study, we introduce a Track Before Detect (TBD) approach that leverages target trajectory information (coordinates) to effectively distinguish between noise and potential targets. By reframing the problem as a multivariate time series classification, we have achieved remarkable results. Specifically, our TBD method achieves an impressive 97% accuracy in separating target signals from noise within a mere half-second time span (consisting of 10 data points). Furthermore, when classifying the identified targets into our predefined categories—airplane, drone, and bird—we achieve an outstanding classification accuracy of 96% over a more extended period of 1.5 seconds (comprising 30 data points).

Keywords: small targets, drones, trajectory information, TBD, multivariate time series

Procedia PDF Downloads 19
40692 Ranking of Provinces in Iran for Capital Formation in Spatial Planning with Numerical Taxonomy Technique (An Improvement) Case Study: Agriculture Sector

Authors: Farhad Nouparast

Abstract:

For more production we need more capital formation. Capital formation in each country should be based on comparative advantages in different economic sectors due to the different production possibility curves. In regional planning, recognizing the relative advantages and consequently investing in more production requires identifying areas with the necessary capabilities and location of each region compared to other regions. In this article, ranking of Iran's provinces is done according to the specific and given variables as the best investment position in agricultural activity. So we can provide the necessary background for investment analysis in different regions of the country to formulate national and regional planning and execute investment projects. It is used factor analysis technique and numerical taxonomy analysis to do this in thisarticle. At first, the provinces are homogenized and graded according to the variables using cross-sectional data obtained from the agricultural census and population and housing census of Iran as data matrix. The results show that which provinces have the most potential for capital formation in agronomy sub-sector. Taxonomy classifies organisms based on similar genetic traits in biology and botany. Numerical taxonomy using quantitative methods controls large amounts of information and get the number of samples and categories and take them based on inherent characteristics and differences indirectly accommodates. Numerical taxonomy is related to multivariate statistics.

Keywords: Capital Formation, Factor Analysis, Multivariate statistics, Numerical Taxonomy Analysis, Production, Ranking, Spatial Planning

Procedia PDF Downloads 113
40691 A Gauge Repeatability and Reproducibility Study for Multivariate Measurement Systems

Authors: Jeh-Nan Pan, Chung-I Li

Abstract:

Measurement system analysis (MSA) plays an important role in helping organizations to improve their product quality. Generally speaking, the gauge repeatability and reproducibility (GRR) study is performed according to the MSA handbook stated in QS9000 standards. Usually, GRR study for assessing the adequacy of gauge variation needs to be conducted prior to the process capability analysis. Traditional MSA only considers a single quality characteristic. With the advent of modern technology, industrial products have become very sophisticated with more than one quality characteristic. Thus, it becomes necessary to perform multivariate GRR analysis for a measurement system when collecting data with multiple responses. In this paper, we take the correlation coefficients among tolerances into account to revise the multivariate precision-to-tolerance (P/T) ratio as proposed by Majeske (2008). We then compare the performance of our revised P/T ratio with that of the existing ratios. The simulation results show that our revised P/T ratio outperforms others in terms of robustness and proximity to the actual value. Moreover, the optimal allocation of several parameters such as the number of quality characteristics (v), sample size of parts (p), number of operators (o) and replicate measurements (r) is discussed using the confidence interval of the revised P/T ratio. Finally, a standard operating procedure (S.O.P.) to perform the GRR study for multivariate measurement systems is proposed based on the research results. Hopefully, it can be served as a useful reference for quality practitioners when conducting such study in industries. Measurement system analysis (MSA) plays an important role in helping organizations to improve their product quality. Generally speaking, the gauge repeatability and reproducibility (GRR) study is performed according to the MSA handbook stated in QS9000 standards. Usually, GRR study for assessing the adequacy of gauge variation needs to be conducted prior to the process capability analysis. Traditional MSA only considers a single quality characteristic. With the advent of modern technology, industrial products have become very sophisticated with more than one quality characteristic. Thus, it becomes necessary to perform multivariate GRR analysis for a measurement system when collecting data with multiple responses. In this paper, we take the correlation coefficients among tolerances into account to revise the multivariate precision-to-tolerance (P/T) ratio as proposed by Majeske (2008). We then compare the performance of our revised P/T ratio with that of the existing ratios. The simulation results show that our revised P/T ratio outperforms others in terms of robustness and proximity to the actual value. Moreover, the optimal allocation of several parameters such as the number of quality characteristics (v), sample size of parts (p), number of operators (o) and replicate measurements (r) is discussed using the confidence interval of the revised P/T ratio. Finally, a standard operating procedure (S.O.P.) to perform the GRR study for multivariate measurement systems is proposed based on the research results. Hopefully, it can be served as a useful reference for quality practitioners when conducting such study in industries.

Keywords: gauge repeatability and reproducibility, multivariate measurement system analysis, precision-to-tolerance ratio, Gauge repeatability

Procedia PDF Downloads 230
40690 Irrigation Water Quality Evaluation Based on Multivariate Statistical Analysis: A Case Study of Jiaokou Irrigation District

Authors: Panpan Xu, Qiying Zhang, Hui Qian

Abstract:

Groundwater is main source of water supply in the Guanzhong Basin, China. To investigate the quality of groundwater for agricultural purposes in Jiaokou Irrigation District located in the east of the Guanzhong Basin, 141 groundwater samples were collected for analysis of major ions (K+, Na+, Mg2+, Ca2+, SO42-, Cl-, HCO3-, and CO32-), pH, and total dissolved solids (TDS). Sodium percentage (Na%), residual sodium carbonate (RSC), magnesium hazard (MH), and potential salinity (PS) were applied for irrigation water quality assessment. In addition, multivariate statistical techniques were used to identify the underlying hydrogeochemical processes. Results show that the content of TDS mainly depends on Cl-, Na+, Mg2+, and SO42-, and the HCO3- content is generally high except for the eastern sand area. These are responsible for complex hydrogeochemical processes, such as dissolution of carbonate minerals (dolomite and calcite), gypsum, halite, and silicate minerals, the cation exchange, as well as evaporation and concentration. The average evaluation levels of Na%, RSC, MH, and PS for irrigation water quality are doubtful, good, unsuitable, and injurious to unsatisfactory, respectively. Therefore, it is necessary for decision makers to comprehensively consider the indicators and thus reasonably evaluate the irrigation water quality.

Keywords: irrigation water quality, multivariate statistical analysis, groundwater, hydrogeochemical process

Procedia PDF Downloads 120
40689 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: share of electricity generation, k-means clustering, discriminant, CO2 emission

Procedia PDF Downloads 395
40688 Neutral Heavy Scalar Searches via Standard Model Gauge Boson Decays at the Large Hadron Electron Collider with Multivariate Techniques

Authors: Luigi Delle Rose, Oliver Fischer, Ahmed Hammad

Abstract:

In this article, we study the prospects of the proposed Large Hadron electron Collider (LHeC) in the search for heavy neutral scalar particles. We consider a minimal model with one additional complex scalar singlet that interacts with the Standard Model (SM) via mixing with the Higgs doublet, giving rise to an SM-like Higgs boson and a heavy scalar particle. Both scalar particles are produced via vector boson fusion and can be tested via their decays into pairs of SM particles, analogously to the SM Higgs boson. Using multivariate techniques, we show that the LHeC is sensitive to heavy scalars with masses between 200 and 800 GeV down to scalar mixing of order 0.01.

Keywords: beyond the standard model, large hadron electron collider, multivariate analysis, scalar singlet

Procedia PDF Downloads 109
40687 Simultaneous Determination of Six Characterizing/Quality Parameters of Biodiesels via 1H NMR and Multivariate Calibration

Authors: Gustavo G. Shimamoto, Matthieu Tubino

Abstract:

The characterization and the quality of biodiesel samples are checked by determining several parameters. Considering a large number of analysis to be performed, as well as the disadvantages of the use of toxic solvents and waste generation, multivariate calibration is suggested to reduce the number of tests. In this work, hydrogen nuclear magnetic resonance (1H NMR) spectra were used to build multivariate models, from partial least squares (PLS) regression, in order to determine simultaneously six important characterizing and/or quality parameters of biodiesels: density at 20 ºC, kinematic viscosity at 40 ºC, iodine value, acid number, oxidative stability, and water content. Biodiesels from twelve different oils sources were used in this study: babassu, brown flaxseed, canola, corn, cottonseed, macauba almond, microalgae, palm kernel, residual frying, sesame, soybean, and sunflower. 1H NMR reflects the structures of the compounds present in biodiesel samples and showed suitable correlations with the six parameters. The PLS models were constructed with latent variables between 5 and 7, the obtained values of r(cal) and r(val) were greater than 0.994 and 0.989, respectively. In addition, the models were considered suitable to predict all the six parameters for external samples, taking into account the analytical speed to perform it. Thus, the alliance between 1H NMR and PLS showed to be appropriate to characterize and evaluate the quality of biodiesels, reducing significantly analysis time, the consumption of reagents/solvents, and waste generation. Therefore, the proposed methods can be considered to adhere to the principles of green chemistry.

Keywords: biodiesel, multivariate calibration, nuclear magnetic resonance, quality parameters

Procedia PDF Downloads 501
40686 The Moment of the Optimal Average Length of the Multivariate Exponentially Weighted Moving Average Control Chart for Equally Correlated Variables

Authors: Edokpa Idemudia Waziri, Salisu S. Umar

Abstract:

The Hotellng’s T^2 is a well-known statistic for detecting a shift in the mean vector of a multivariate normal distribution. Control charts based on T have been widely used in statistical process control for monitoring a multivariate process. Although it is a powerful tool, the T statistic is deficient when the shift to be detected in the mean vector of a multivariate process is small and consistent. The Multivariate Exponentially Weighted Moving Average (MEWMA) control chart is one of the control statistics used to overcome the drawback of the Hotellng’s T statistic. In this paper, the probability distribution of the Average Run Length (ARL) of the MEWMA control chart when the quality characteristics exhibit substantial cross correlation and when the process is in-control and out-of-control was derived using the Markov Chain algorithm. The derivation of the probability functions and the moments of the run length distribution were also obtained and they were consistent with some existing results for the in-control and out-of-control situation. By simulation process, the procedure identified a class of ARL for the MEWMA control when the process is in-control and out-of-control. From our study, it was observed that the MEWMA scheme is quite adequate for detecting a small shift and a good way to improve the quality of goods and services in a multivariate situation. It was also observed that as the in-control average run length ARL0¬ or the number of variables (p) increases, the optimum value of the ARL0pt increases asymptotically and as the magnitude of the shift σ increases, the optimal ARLopt decreases. Finally, we use the example from the literature to illustrate our method and demonstrate its efficiency.

Keywords: average run length, markov chain, multivariate exponentially weighted moving average, optimal smoothing parameter

Procedia PDF Downloads 394
40685 Estimation of Desktop E-Wastes in Delhi Using Multivariate Flow Analysis

Authors: Sumay Bhojwani, Ashutosh Chandra, Mamita Devaburman, Akriti Bhogal

Abstract:

This article uses the Material flow analysis for estimating e-wastes in the Delhi/NCR region. The Material flow analysis is based on sales data obtained from various sources. Much of the data available for the sales is unreliable because of the existence of a huge informal sector. The informal sector in India accounts for more than 90%. Therefore, the scope of this study is only limited to the formal one. Also, for projection of the sales data till 2030, we have used regression (linear) to avoid complexity. The actual sales in the years following 2015 may vary non-linearly but we have assumed a basic linear relation. The purpose of this study was to know an approximate quantity of desktop e-wastes that we will have by the year 2030 so that we start preparing ourselves for the ineluctable investment in the treatment of these ever-rising e-wastes. The results of this study can be used to install a treatment plant for e-wastes in Delhi.

Keywords: e-wastes, Delhi, desktops, estimation

Procedia PDF Downloads 235
40684 Principal Component Analysis of Body Weight and Morphometric Traits of New Zealand Rabbits Raised under Semi-Arid Condition in Nigeria

Authors: Emmanuel Abayomi Rotimi

Abstract:

Context: Rabbits production plays important role in increasing animal protein supply in Nigeria. Rabbit production provides a cheap, affordable, and healthy source of meat. The growth of animals involves an increase in body weight, which can change the conformation of various parts of the body. Live weight and linear measurements are indicators of growth rate in rabbits and other farm animals. Aims: This study aimed to define the body dimensions of New Zealand rabbits and also to investigate the morphometric traits variables that contribute to body conformation by the use of principal component analysis (PCA). Methods: Data were obtained from 80 New Zealand rabbits (40 bucks and 40 does) raised in Livestock Teaching and Research Farm, Federal University Dutsinma. Data were taken on body weight (BWT), body length (BL), ear length (EL), tail length (TL), heart girth (HG) and abdominal circumference (AC). Data collected were subjected to multivariate analysis using SPSS 20.0 statistical package. Key results: The descriptive statistics showed that the mean BWT, BL, EL, TL, HG, and AC were 0.91kg, 27.34cm, 10.24cm, 8.35cm, 19.55cm and 21.30cm respectively. Sex showed significant (P<0.05) effect on all the variables examined, with higher values recorded for does. The phenotypic correlation coefficient values (r) between the morphometric traits were all positive and ranged from r = 0.406 (between EL and BL) to r = 0.909 (between AC and HG). HG is the most correlated with BWT (r = 0.786). The principal component analysis with variance maximizing orthogonal rotation was used to extract the components. Two principal components (PCs) from the factor analysis of morphometric traits explained about 80.42% of the total variance. PC1 accounted for 64.46% while PC2 accounted for 15.97% of the total variances. Three variables, representing body conformation, loaded highest in PC1. PC1 had the highest contribution (64.46%) to the total variance, and it is regarded as body conformation traits. Conclusions: This component could be used as selection criteria for improving body weight of rabbits.

Keywords: conformation, multicollinearity, multivariate, rabbits and principal component analysis

Procedia PDF Downloads 97
40683 Statistical Discrimination of Blue Ballpoint Pen Inks by Diamond Attenuated Total Reflectance (ATR) FTIR

Authors: Mohamed Izzharif Abdul Halim, Niamh Nic Daeid

Abstract:

Determining the source of pen inks used on a variety of documents is impartial for forensic document examiners. The examination of inks is often performed to differentiate between inks in order to evaluate the authenticity of a document. A ballpoint pen ink consists of synthetic dyes in (acidic and/or basic), pigments (organic and/or inorganic) and a range of additives. Inks of similar color may consist of different composition and are frequently the subjects of forensic examinations. This study emphasizes on blue ballpoint pen inks available in the market because it is reported that approximately 80% of questioned documents analysis involving ballpoint pen ink. Analytical techniques such as thin layer chromatography, high-performance liquid chromatography, UV-vis spectroscopy, luminescence spectroscopy and infrared spectroscopy have been used in the analysis of ink samples. In this study, application of Diamond Attenuated Total Reflectance (ATR) FTIR is straightforward but preferable in forensic science as it offers no sample preparation and minimal analysis time. The data obtained from these techniques were further analyzed using multivariate chemometric methods which enable extraction of more information based on the similarities and differences among samples in a dataset. It was indicated that some pens from the same manufactures can be similar in composition, however, discrete types can be significantly different.

Keywords: ATR FTIR, ballpoint, multivariate chemometric, PCA

Procedia PDF Downloads 434
40682 A Statistical Approach to Classification of Agricultural Regions

Authors: Hasan Vural

Abstract:

Turkey is a favorable country to produce a great variety of agricultural products because of her different geographic and climatic conditions which have been used to divide the country into four main and seven sub regions. This classification into seven regions traditionally has been used in order to data collection and publication especially related with agricultural production. Afterwards, nine agricultural regions were considered. Recently, the governmental body which is responsible of data collection and dissemination (Turkish Institute of Statistics-TIS) has used 12 classes which include 11 sub regions and Istanbul province. This study aims to evaluate these classification efforts based on the acreage of ten main crops in a ten years time period (1996-2005). The panel data grouped in 11 subregions has been evaluated by cluster and multivariate statistical methods. It was concluded that from the agricultural production point of view, it will be rather meaningful to consider three main and eight sub-agricultural regions throughout the country.

Keywords: agricultural region, factorial analysis, cluster analysis,

Procedia PDF Downloads 381
40681 Women, Quality of Life, and Infertility: The Mediating Role of Social Support and Hope

Authors: Saeideh Lotfi Nikoo, Azadeh Ghaheri, Reza Omani Samani

Abstract:

Context: In most cultures around the globe, infertility is recognized as a crisis and exposed infertile couples are under psychosocial pressure. Indeed, the quality of life (QoL) for infertile women is lower in comparison with fertile control. Objective, The purpose of this study, was to investigate the impact of social support and hope on QoL in women undergoing infertility treatment. Methods: A cross-sectional study. Patient(s): In this cross-sectional study, 350 infertile women were recruited who were referred to an infertility clinic for the first time and had no history of Assisted Reproductive Techniques (ART) failure. Intervention(s): Questionnaires on the Fertility Quality of Life (FertiQoL), Multi-dimensional Scale of Perceived Social Support (family and friends), and Snyder Hope Scale (pathway and agency) were used to collect data. Data analysis was done by univariate and multivariate analysis. P value <0.05 was considered statistically significant. Result(s): Multivariate analysis indicated that infertile women with a higher score of social support (by family & friends) (b= 0.59 (CI 95%: 0.03, 1.15) (P = 0.040), b= 0.61 (CI 95%: 0.17, 1.04) (P = 0.006)) and hope (pathway & agency) (b= 0.94 (CI 95%: 0.29, 1.59) (P = 0.005), b= 1.13 (CI 95%: 0.45, 1.82) (P = 0.001) respectively) have significantly better Core FertiQoL. The result revealed that social support and hope are significantly and positively associated with other subscales of FertiQoL as well. Conclusions: According to the results, lifestyle interventions such as receiving social support, building a sound family with effective communication, and providing appropriate health education are of crucial importance to address psychological distress and improve the fertility QoL of women experiencing fertility problems.

Keywords: inertility, social support, infertile women, hope

Procedia PDF Downloads 67
40680 Linking Remittances and Household Level Development in India: An Analysis of NSSO 64th Round Data

Authors: Rakesh Mishra, Mukunda Upadhyay, Rajni Singh

Abstract:

This paper attempts to link remittances sent by internal as well as international out-migrants and its domestic preferences of usage in three different dimension of Household level development in India and its states. Investment of remittances in these sectors reveals for mixed choices of preferential among the states from where people have out-migrated. The multivariate analysis implies that among all three indicators of human development, health (Investment in Food and Health) is the one that attracts the major investment followed by capital formation and least on Education. Usage of the remittances has been found to be varying across all the states in India as far as usage in health, capital formation and education are concerned. Orissa, Nagaland, Madhya Pradesh, Jharkhand, Gujarat, D & H Haweli are some of the states and union territory that contributes highest of its international remittances on health, while most of the usage of the internal remittances has second or third preferences of investment on the health except for Uttar Pradesh, D & H Haweli, Arunachal Pradesh and A & N Is. This paper tries to access usage of international remittances as well as internal remittances on the flow of remittances at the micro level and its implications across three basic determinants of Human Development that is Health, Capital formation and Education coupled with the preferences of usage in presence of Several Socio economic and Demographic variable.

Keywords: multivariate analysis, household development, remittances, internal and international migration

Procedia PDF Downloads 419
40679 The Use of Multivariate Statistical and GIS for Characterization Groundwater Quality in Laghouat Region, Algeria

Authors: Rouighi Mustapha, Bouzid Laghaa Souad, Rouighi Tahar

Abstract:

Due to rain Shortage and the increase of population in the last years, wells excavation and groundwater use for different purposes had been increased without any planning. This is a great challenge for our country. Moreover, this scarcity of water resources in this region is unfortunately combined with rapid fresh water resources quality deterioration, due to salinity and contamination processes. Therefore, it is necessary to conduct the studies about groundwater quality in Algeria. In this work consists in the identification of the factors which influence the water quality parameters in Laghouat region by using statistical analysis Principal Component Analysis (PCA), Hierarchical Cluster Analysis (HCA) and geographic information system (GIS) in an attempt to discriminate the sources of the variation of water quality variations. The results of PCA technique indicate that variables responsible for water quality composition are mainly related to soluble salts variables; natural processes and the nature of the rock which modifies significantly the water chemistry. Inferred from the positive correlation between K+ and NO3-, NO3- is believed to be human induced rather than naturally originated. In this study, the multivariate statistical analysis and GIS allows the hydrogeologist to have supplementary tools in the characterization and evaluating of aquifers.

Keywords: cluster, analysis, GIS, groundwater, laghouat, quality

Procedia PDF Downloads 298
40678 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second

Authors: P. V. Pramila , V. Mahesh

Abstract:

Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.

Keywords: FEV, multivariate adaptive regression splines pulmonary function test, random forest

Procedia PDF Downloads 280
40677 Assessment of Social Vulnerability of Urban Population to Floods – a Case Study of Mumbai

Authors: Sherly M. A., Varsha Vijaykumar, Subhankar Karmakar, Terence Chan, Christian Rau

Abstract:

This study aims at proposing an indicator-based framework for assessing social vulnerability of any coastal megacity to floods. The final set of indicators of social vulnerability are chosen from a set of feasible and available indicators which are prepared using a Geographic Information System (GIS) framework on a smaller scale considering 1-km grid cell to provide an insight into the spatial variability of vulnerability. The optimal weight for each individual indicator is assigned using data envelopment analysis (DEA) as it avoids subjective weights and improves the confidence on the results obtained. In order to de-correlate and reduce the dimension of multivariate data, principal component analysis (PCA) has been applied. The proposed methodology is demonstrated on twenty four wards of Mumbai under the jurisdiction of Municipal Corporation of Greater Mumbai (MCGM). This framework of vulnerability assessment is not limited to the present study area, and may be applied to other urban damage centers.

Keywords: urban floods, vulnerability, data envelopment analysis, principal component analysis

Procedia PDF Downloads 338
40676 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart

Authors: O. Ikpotokin

Abstract:

In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.

Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics

Procedia PDF Downloads 324
40675 Space Telemetry Anomaly Detection Based On Statistical PCA Algorithm

Authors: Bassem Nassar, Wessam Hussein, Medhat Mokhtar

Abstract:

The crucial concern of satellite operations is to ensure the health and safety of satellites. The worst case in this perspective is probably the loss of a mission but the more common interruption of satellite functionality can result in compromised mission objectives. All the data acquiring from the spacecraft are known as Telemetry (TM), which contains the wealth information related to the health of all its subsystems. Each single item of information is contained in a telemetry parameter, which represents a time-variant property (i.e. a status or a measurement) to be checked. As a consequence, there is a continuous improvement of TM monitoring systems in order to reduce the time required to respond to changes in a satellite's state of health. A fast conception of the current state of the satellite is thus very important in order to respond to occurring failures. Statistical multivariate latent techniques are one of the vital learning tools that are used to tackle the aforementioned problem coherently. Information extraction from such rich data sources using advanced statistical methodologies is a challenging task due to the massive volume of data. To solve this problem, in this paper, we present a proposed unsupervised learning algorithm based on Principle Component Analysis (PCA) technique. The algorithm is particularly applied on an actual remote sensing spacecraft. Data from the Attitude Determination and Control System (ADCS) was acquired under two operation conditions: normal and faulty states. The models were built and tested under these conditions and the results shows that the algorithm could successfully differentiate between these operations conditions. Furthermore, the algorithm provides competent information in prediction as well as adding more insight and physical interpretation to the ADCS operation.

Keywords: space telemetry monitoring, multivariate analysis, PCA algorithm, space operations

Procedia PDF Downloads 393
40674 Survival Outcomes Related to Treatment Modalities in Patients with Oropharyngeal Squamous Cell Carcinoma

Authors: Danni Cheng

Abstract:

Purpose:Surgicallyinclusive treatment(SIT)isthemajor treatment fororopharyngealsquamouscellcarcinoma (OPSCC) in Eastern countries, while nonsurgical treatments(NSTs) are the priority treatment in Western countries. The preferred treatmentsforOPSCC patients remaindebated. Methods:Atotalof 153 consecutive OPSCC casesdiagnosed between 2009 and 2019inWCH, and 15,400 OPSCC cases from SEER database (2000-2017) were obtained. Clinical characteristics, treatments, and survival outcomes were retrospectively collected. We conductedKaplan-Meier curves univariate and multivariate analysis to compare the prognosis of OPSCC patients in WCH, SEER Asian, and SEER all ethnic population by different treatment modalities,HPVstatus, ages, and TNM stages. Results: The 5-year overall survival rate was 59% in WCH, 64% in the SEER all ethnic and 67% in SEER Asian group. In both univariate and multivariate analysis, SIT was observed as a consistent benefit factor for OPSCC patients in all three populations when classified by genders, tumor stages, and HPV status. Patients who underwent SIT had significantly better survival outcomes than those who received NSTsin WCH, SEER Asian, and SEER all ethnic groups. HPV positive status was the beneficial factor of OPSCC patients in all three groups. Besides, male patients had worse survival outcomes in both WCH and SEER Asian group, whereas male patients had better outcomes in the SEER all ethnic group. Conclusion: In contrast to nowadaysNSTs are the first-line therapiesfor OPSCC, our ten-year real-world data and SEER data indicated that OPSCC patients who underwent SIT had better prognosis than NSTs.

Keywords: OPSCC, survival outcome, SEER, treatment modalities

Procedia PDF Downloads 144
40673 Multivariate Analytical Insights into Spatial and Temporal Variation in Water Quality of a Major Drinking Water Reservoir

Authors: Azadeh Golshan, Craig Evans, Phillip Geary, Abigail Morrow, Zoe Rogers, Marcel Maeder

Abstract:

22 physicochemical variables have been determined in water samples collected weekly from January to December in 2013 from three sampling stations located within a major drinking water reservoir. Classical Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) analysis was used to investigate the environmental factors associated with the physico-chemical variability of the water samples at each of the sampling stations. Matrix augmentation MCR-ALS (MA-MCR-ALS) was also applied, and the two sets of results were compared for interpretative clarity. Links between these factors, reservoir inflows and catchment land-uses were investigated and interpreted in relation to chemical composition of the water and their resolved geographical distribution profiles. The results suggested that the major factors affecting reservoir water quality were those associated with agricultural runoff, with evidence of influence on algal photosynthesis within the water column. Water quality variability within the reservoir was also found to be strongly linked to physical parameters such as water temperature and the occurrence of thermal stratification. The two methods applied (MCR-ALS and MA-MCR-ALS) led to similar conclusions; however, MA-MCR-ALS appeared to provide results more amenable to interpretation of temporal and geological variation than those obtained through classical MCR-ALS.

Keywords: drinking water reservoir, multivariate analysis, physico-chemical parameters, water quality

Procedia PDF Downloads 260
40672 Prediction of Slaughter Body Weight in Rabbits: Multivariate Approach through Path Coefficient and Principal Component Analysis

Authors: K. A. Bindu, T. V. Raja, P. M. Rojan, A. Siby

Abstract:

The multivariate path coefficient approach was employed to study the effects of various production and reproduction traits on the slaughter body weight of rabbits. Information on 562 rabbits maintained at the university rabbit farm attached to the Centre for Advanced Studies in Animal Genetics, and Breeding, Kerala Veterinary and Animal Sciences University, Kerala State, India was utilized. The manifest variables used in the study were age and weight of dam, birth weight, litter size at birth and weaning, weight at first, second and third months. The linear multiple regression analysis was performed by keeping the slaughter weight as the dependent variable and the remaining as independent variables. The model explained 48.60 percentage of the total variation present in the market weight of the rabbits. Even though the model used was significant, the standardized beta coefficients for the independent variables viz., age and weight of the dam, birth weight and litter sizes at birth and weaning were less than one indicating their negligible influence on the slaughter weight. However, the standardized beta coefficient of the second-month body weight was maximum followed by the first-month weight indicating their major role on the market weight. All the other factors influence indirectly only through these two variables. Hence it was concluded that the slaughter body weight can be predicted using the first and second-month body weights. The principal components were also developed so as to achieve more accuracy in the prediction of market weight of rabbits.

Keywords: component analysis, multivariate, slaughter, regression

Procedia PDF Downloads 135
40671 A Semiparametric Approach to Estimate the Mode of Continuous Multivariate Data

Authors: Tiee-Jian Wu, Chih-Yuan Hsu

Abstract:

Mode estimation is an important task, because it has applications to data from a wide variety of sources. We propose a semi-parametric approach to estimate the mode of an unknown continuous multivariate density function. Our approach is based on a weighted average of a parametric density estimate using the Box-Cox transform and a non-parametric kernel density estimate. Our semi-parametric mode estimate improves both the parametric- and non-parametric- mode estimates. Specifically, our mode estimate solves the non-consistency problem of parametric mode estimates (at large sample sizes) and reduces the variability of non-parametric mode estimates (at small sample sizes). The performance of our method at practical sample sizes is demonstrated by simulation examples and two real examples from the fields of climatology and image recognition.

Keywords: Box-Cox transform, density estimation, mode seeking, semiparametric method

Procedia PDF Downloads 256
40670 Application of Deep Learning in Top Pair and Single Top Quark Production at the Large Hadron Collider

Authors: Ijaz Ahmed, Anwar Zada, Muhammad Waqas, M. U. Ashraf

Abstract:

We demonstrate the performance of a very efficient tagger applies on hadronically decaying top quark pairs as signal based on deep neural network algorithms and compares with the QCD multi-jet background events. A significant enhancement of performance in boosted top quark events is observed with our limited computing resources. We also compare modern machine learning approaches and perform a multivariate analysis of boosted top-pair as well as single top quark production through weak interaction at √s = 14 TeV proton-proton Collider. The most relevant known background processes are incorporated. Through the techniques of Boosted Decision Tree (BDT), likelihood and Multlayer Perceptron (MLP) the analysis is trained to observe the performance in comparison with the conventional cut based and count approach

Keywords: top tagger, multivariate, deep learning, LHC, single top

Procedia PDF Downloads 84
40669 A Bayesian Multivariate Microeconometric Model for Estimation of Price Elasticity of Demand

Authors: Jefferson Hernandez, Juan Padilla

Abstract:

Estimation of price elasticity of demand is a valuable tool for the task of price settling. Given its relevance, it is an active field for microeconomic and statistical research. Price elasticity in the industry of oil and gas, in particular for fuels sold in gas stations, has shown to be a challenging topic given the market and state restrictions, and underlying correlations structures between the types of fuels sold by the same gas station. This paper explores the Lotka-Volterra model for the problem for price elasticity estimation in the context of fuels; in addition, it is introduced multivariate random effects with the purpose of dealing with errors, e.g., measurement or missing data errors. In order to model the underlying correlation structures, the Inverse-Wishart, Hierarchical Half-t and LKJ distributions are studied. Here, the Bayesian paradigm through Markov Chain Monte Carlo (MCMC) algorithms for model estimation is considered. Simulation studies covering a wide range of situations were performed in order to evaluate parameter recovery for the proposed models and algorithms. Results revealed that the proposed algorithms recovered quite well all model parameters. Also, a real data set analysis was performed in order to illustrate the proposed approach.

Keywords: price elasticity, volume, correlation structures, Bayesian models

Procedia PDF Downloads 130