Search results for: principal component analysis
29296 Implementation of a Method of Crater Detection Using Principal Component Analysis in FPGA
Authors: Izuru Nomura, Tatsuya Takino, Yuji Kageyama, Shin Nagata, Hiroyuki Kamata
Abstract:
We propose a method of crater detection from the image of the lunar surface captured by the small space probe. We use the principal component analysis (PCA) to detect craters. Nevertheless, considering severe environment of the space, it is impossible to use generic computer in practice. Accordingly, we have to implement the method in FPGA. This paper compares FPGA and generic computer by the processing time of a method of crater detection using principal component analysis.Keywords: crater, PCA, eigenvector, strength value, FPGA, processing time
Procedia PDF Downloads 54729295 Estimation of Functional Response Model by Supervised Functional Principal Component Analysis
Authors: Hyon I. Paek, Sang Rim Kim, Hyon A. Ryu
Abstract:
In functional linear regression, one typical problem is to reduce dimension. Compared with multivariate linear regression, functional linear regression is regarded as an infinite-dimensional case, and the main task is to reduce dimensions of functional response and functional predictors. One common approach is to adapt functional principal component analysis (FPCA) on functional predictors and then use a few leading functional principal components (FPC) to predict the functional model. The leading FPCs estimated by the typical FPCA explain a major variation of the functional predictor, but these leading FPCs may not be mostly correlated with the functional response, so they may not be significant in the prediction for response. In this paper, we propose a supervised functional principal component analysis method for a functional response model with FPCs obtained by considering the correlation of the functional response. Our method would have a better prediction accuracy than the typical FPCA method.Keywords: supervised, functional principal component analysis, functional response, functional linear regression
Procedia PDF Downloads 7129294 Differentiation between Different Rangeland Sites Using Principal Component Analysis in Semi-Arid Areas of Sudan
Authors: Nancy Ibrahim Abdalla, Abdelaziz Karamalla Gaiballa
Abstract:
Rangelands in semi-arid areas provide a good source for feeding huge numbers of animals and serving environmental, economic and social importance; therefore, these areas are considered economically very important for the pastoral sector in Sudan. This paper investigates the means of differentiating between different rangelands sites according to soil types using principal component analysis to assist in monitoring and assessment purposes. Three rangeland sites were identified in the study area as flat sandy sites, sand dune site, and hard clay site. Principal component analysis (PCA) was used to reduce the number of factors needed to distinguish between rangeland sites and produce a new set of data including the most useful spectral information to run satellite image processing. It was performed using selected types of data (two vegetation indices, topographic data and vegetation surface reflectance within the three bands of MODIS data). Analysis with PCA indicated that there is a relatively high correspondence between vegetation and soil of the total variance in the data set. The results showed that the use of the principal component analysis (PCA) with the selected variables showed a high difference, reflected in the variance and eigenvalues and it can be used for differentiation between different range sites.Keywords: principal component analysis, PCA, rangeland sites, semi-arid areas, soil types
Procedia PDF Downloads 17929293 Correlation between Electromyographic and Textural Parameters for Different Textured Indian Foods Using Principal Component Analysis
Authors: S. Rustagi, N. S. Sodhi, B. Dhillon, T. Kaur
Abstract:
The objective of this study was to check whether there is any relationship between electromyographic (EMG) and textural parameters during food texture evaluation. In this study, a total of eighteen mastication variables were measured for entire mastication, per chew mastication and three different stages of mastication (viz. early, middle and late) by EMG for five different foods using eight human subjects. Cluster analysis was used to reduce the number of mastication variables from 18 to 5, so that principal component analysis (PCA) could be applied on them. The PCA further resulted in two meaningful principal components. The principal component scores for each food were measured and correlated with five textural parameters (viz. hardness, cohesiveness, chewiness, gumminess and adhesiveness). Correlation coefficients were found to be statistically significant (p < 0.10) for cohesiveness and adhesiveness while if we reduce the significance level (p < 0.20) then chewiness also showed correlation with mastication parameters.Keywords: electromyography, mastication, sensory, texture
Procedia PDF Downloads 33829292 Optimal Feature Extraction Dimension in Finger Vein Recognition Using Kernel Principal Component Analysis
Authors: Amir Hajian, Sepehr Damavandinejadmonfared
Abstract:
In this paper the issue of dimensionality reduction is investigated in finger vein recognition systems using kernel Principal Component Analysis (KPCA). One aspect of KPCA is to find the most appropriate kernel function on finger vein recognition as there are several kernel functions which can be used within PCA-based algorithms. In this paper, however, another side of PCA-based algorithms -particularly KPCA- is investigated. The aspect of dimension of feature vector in PCA-based algorithms is of importance especially when it comes to the real-world applications and usage of such algorithms. It means that a fixed dimension of feature vector has to be set to reduce the dimension of the input and output data and extract the features from them. Then a classifier is performed to classify the data and make the final decision. We analyze KPCA (Polynomial, Gaussian, and Laplacian) in details in this paper and investigate the optimal feature extraction dimension in finger vein recognition using KPCA.Keywords: biometrics, finger vein recognition, principal component analysis (PCA), kernel principal component analysis (KPCA)
Procedia PDF Downloads 36229291 On the Estimation of Crime Rate in the Southwest of Nigeria: Principal Component Analysis Approach
Authors: Kayode Balogun, Femi Ayoola
Abstract:
Crime is at alarming rate in this part of world and there are many factors that are contributing to this antisocietal behaviour both among the youths and old. In this work, principal component analysis (PCA) was used as a tool to reduce the dimensionality and to really know those variables that were crime prone in the study region. Data were collected on twenty-eight crime variables from National Bureau of Statistics (NBS) databank for a period of fifteen years, while retaining as much of the information as possible. We use PCA in this study to know the number of major variables and contributors to the crime in the Southwest Nigeria. The results of our analysis revealed that there were eight principal variables have been retained using the Scree plot and Loading plot which implies an eight-equation solution will be appropriate for the data. The eight components explained 93.81% of the total variation in the data set. We also found that the highest and commonly committed crimes in the Southwestern Nigeria were: Assault, Grievous Harm and Wounding, theft/stealing, burglary, house breaking, false pretence, unlawful arms possession and breach of public peace.Keywords: crime rates, data, Southwest Nigeria, principal component analysis, variables
Procedia PDF Downloads 43929290 Utilizing the Principal Component Analysis on Multispectral Aerial Imagery for Identification of Underlying Structures
Authors: Marcos Bosques-Perez, Walter Izquierdo, Harold Martin, Liangdon Deng, Josue Rodriguez, Thony Yan, Mercedes Cabrerizo, Armando Barreto, Naphtali Rishe, Malek Adjouadi
Abstract:
Aerial imagery is a powerful tool when it comes to analyzing temporal changes in ecosystems and extracting valuable information from the observed scene. It allows us to identify and assess various elements such as objects, structures, textures, waterways, and shadows. To extract meaningful information, multispectral cameras capture data across different wavelength bands of the electromagnetic spectrum. In this study, the collected multispectral aerial images were subjected to principal component analysis (PCA) to identify independent and uncorrelated components or features that extend beyond the visible spectrum captured in standard RGB images. The results demonstrate that these principal components contain unique characteristics specific to certain wavebands, enabling effective object identification and image segmentation.Keywords: big data, image processing, multispectral, principal component analysis
Procedia PDF Downloads 16929289 Quantitative Ranking Evaluation of Wine Quality
Authors: A. Brunel, A. Kernevez, F. Leclere, J. Trenteseaux
Abstract:
Today, wine quality is only evaluated by wine experts with their own different personal tastes, even if they may agree on some common features. So producers do not have any unbiased way to independently assess the quality of their products. A tool is here proposed to evaluate wine quality by an objective ranking based upon the variables entering wine elaboration, and analysed through principal component analysis (PCA) method. Actual climatic data are compared by measuring the relative distance between each considered wine, out of which the general ranking is performed.Keywords: wine, grape, weather conditions, rating, climate, principal component analysis, metric analysis
Procedia PDF Downloads 31529288 Detection of Cardiac Arrhythmia Using Principal Component Analysis and Xgboost Model
Authors: Sujay Kotwale, Ramasubba Reddy M.
Abstract:
Electrocardiogram (ECG) is a non-invasive technique used to study and analyze various heart diseases. Cardiac arrhythmia is a serious heart disease which leads to death of the patients, when left untreated. An early-time detection of cardiac arrhythmia would help the doctors to do proper treatment of the heart. In the past, various algorithms and machine learning (ML) models were used to early-time detection of cardiac arrhythmia, but few of them have achieved better results. In order to improve the performance, this paper implements principal component analysis (PCA) along with XGBoost model. The PCA was implemented to the raw ECG signals which suppress redundancy information and extracted significant features. The obtained significant ECG features were fed into XGBoost model and the performance of the model was evaluated. In order to valid the proposed technique, raw ECG signals obtained from standard MIT-BIH database were employed for the analysis. The result shows that the performance of proposed method is superior to the several state-of-the-arts techniques.Keywords: cardiac arrhythmia, electrocardiogram, principal component analysis, XGBoost
Procedia PDF Downloads 11529287 Identifying Missing Component in the Bechdel Test Using Principal Component Analysis Method
Authors: Raghav Lakhotia, Chandra Kanth Nagesh, Krishna Madgula
Abstract:
A lot has been said and discussed regarding the rationale and significance of the Bechdel Score. It became a digital sensation in 2013, when Swedish cinemas began to showcase the Bechdel test score of a film alongside its rating. The test has drawn criticism from experts and the film fraternity regarding its use to rate the female presence in a movie. The pundits believe that the score is too simplified and the underlying criteria of a film to pass the test must include 1) at least two women, 2) who have at least one dialogue, 3) about something other than a man, is egregious. In this research, we have considered a few more parameters which highlight how we represent females in film, like the number of female dialogues in a movie, dialogue genre, and part of speech tags in the dialogue. The parameters were missing in the existing criteria to calculate the Bechdel score. The research aims to analyze 342 movies scripts to test a hypothesis if these extra parameters, above with the current Bechdel criteria, are significant in calculating the female representation score. The result of the Principal Component Analysis method concludes that the female dialogue content is a key component and should be considered while measuring the representation of women in a work of fiction.Keywords: Bechdel test, dialogue genre, parts of speech tags, principal component analysis
Procedia PDF Downloads 13729286 Application of Principal Component Analysis and Ordered Logit Model in Diabetic Kidney Disease Progression in People with Type 2 Diabetes
Authors: Mequanent Wale Mekonen, Edoardo Otranto, Angela Alibrandi
Abstract:
Diabetic kidney disease is one of the main microvascular complications caused by diabetes. Several clinical and biochemical variables are reported to be associated with diabetic kidney disease in people with type 2 diabetes. However, their interrelations could distort the effect estimation of these variables for the disease's progression. The objective of the study is to determine how the biochemical and clinical variables in people with type 2 diabetes are interrelated with each other and their effects on kidney disease progression through advanced statistical methods. First, principal component analysis was used to explore how the biochemical and clinical variables intercorrelate with each other, which helped us reduce a set of correlated biochemical variables to a smaller number of uncorrelated variables. Then, ordered logit regression models (cumulative, stage, and adjacent) were employed to assess the effect of biochemical and clinical variables on the order-level response variable (progression of kidney function) by considering the proportionality assumption for more robust effect estimation. This retrospective cross-sectional study retrieved data from a type 2 diabetic cohort in a polyclinic hospital at the University of Messina, Italy. The principal component analysis yielded three uncorrelated components. These are principal component 1, with negative loading of glycosylated haemoglobin, glycemia, and creatinine; principal component 2, with negative loading of total cholesterol and low-density lipoprotein; and principal component 3, with negative loading of high-density lipoprotein and a positive load of triglycerides. The ordered logit models (cumulative, stage, and adjacent) showed that the first component (glycosylated haemoglobin, glycemia, and creatinine) had a significant effect on the progression of kidney disease. For instance, the cumulative odds model indicated that the first principal component (linear combination of glycosylated haemoglobin, glycemia, and creatinine) had a strong and significant effect on the progression of kidney disease, with an effect or odds ratio of 0.423 (P value = 0.000). However, this effect was inconsistent across levels of kidney disease because the first principal component did not meet the proportionality assumption. To address the proportionality problem and provide robust effect estimates, alternative ordered logit models, such as the partial cumulative odds model, the partial adjacent category model, and the partial continuation ratio model, were used. These models suggested that clinical variables such as age, sex, body mass index, medication (metformin), and biochemical variables such as glycosylated haemoglobin, glycemia, and creatinine have a significant effect on the progression of kidney disease.Keywords: diabetic kidney disease, ordered logit model, principal component analysis, type 2 diabetes
Procedia PDF Downloads 3429285 Sparse Principal Component Analysis: A Least Squares Approximation Approach
Authors: Giovanni Merola
Abstract:
Sparse Principal Components Analysis aims to find principal components with few non-zero loadings. We derive such sparse solutions by adding a genuine sparsity requirement to the original Principal Components Analysis (PCA) objective function. This approach differs from others because it preserves PCA's original optimality: uncorrelatedness of the components and least squares approximation of the data. To identify the best subset of non-zero loadings we propose a branch-and-bound search and an iterative elimination algorithm. This last algorithm finds sparse solutions with large loadings and can be run without specifying the cardinality of the loadings and the number of components to compute in advance. We give thorough comparisons with the existing sparse PCA methods and several examples on real datasets.Keywords: SPCA, uncorrelated components, branch-and-bound, backward elimination
Procedia PDF Downloads 37529284 Principal Component Analysis of Body Weight and Morphometric Traits of New Zealand Rabbits Raised under Semi-Arid Condition in Nigeria
Authors: Emmanuel Abayomi Rotimi
Abstract:
Context: Rabbits production plays important role in increasing animal protein supply in Nigeria. Rabbit production provides a cheap, affordable, and healthy source of meat. The growth of animals involves an increase in body weight, which can change the conformation of various parts of the body. Live weight and linear measurements are indicators of growth rate in rabbits and other farm animals. Aims: This study aimed to define the body dimensions of New Zealand rabbits and also to investigate the morphometric traits variables that contribute to body conformation by the use of principal component analysis (PCA). Methods: Data were obtained from 80 New Zealand rabbits (40 bucks and 40 does) raised in Livestock Teaching and Research Farm, Federal University Dutsinma. Data were taken on body weight (BWT), body length (BL), ear length (EL), tail length (TL), heart girth (HG) and abdominal circumference (AC). Data collected were subjected to multivariate analysis using SPSS 20.0 statistical package. Key results: The descriptive statistics showed that the mean BWT, BL, EL, TL, HG, and AC were 0.91kg, 27.34cm, 10.24cm, 8.35cm, 19.55cm and 21.30cm respectively. Sex showed significant (P<0.05) effect on all the variables examined, with higher values recorded for does. The phenotypic correlation coefficient values (r) between the morphometric traits were all positive and ranged from r = 0.406 (between EL and BL) to r = 0.909 (between AC and HG). HG is the most correlated with BWT (r = 0.786). The principal component analysis with variance maximizing orthogonal rotation was used to extract the components. Two principal components (PCs) from the factor analysis of morphometric traits explained about 80.42% of the total variance. PC1 accounted for 64.46% while PC2 accounted for 15.97% of the total variances. Three variables, representing body conformation, loaded highest in PC1. PC1 had the highest contribution (64.46%) to the total variance, and it is regarded as body conformation traits. Conclusions: This component could be used as selection criteria for improving body weight of rabbits.Keywords: conformation, multicollinearity, multivariate, rabbits and principal component analysis
Procedia PDF Downloads 12529283 Modeling Factors Affecting Fertility Transition in Africa: Case of Kenya
Authors: Dennis Okora Amima Ondieki
Abstract:
Fertility transition has been identified to be affected by numerous factors. This research aimed to investigate the most real factors affecting fertility transition in Kenya. These factors were firstly extracted from the literature convened into demographic features, social, and economic features, social-cultural features, reproductive features and modernization features. All these factors had 23 factors identified for this study. The data for this study was from the Kenya Demographic and Health Surveys (KDHS) conducted in 1999-2003 and 2003-2008/9. The data was continuous, and it involved the mean birth order for the ten periods. Principal component analysis (PCA) was utilized using 23 factors. Principal component analysis conveyed religion, region, education and marital status as the real factors. PC scores were calculated for every point. The identified principal components were utilized as forecasters in the multiple regression model, with the fertility level as the response variable. The four components were found to be affecting fertility transition differently. It was found that fertility is affected positively by factors of region and marital and negatively by factors of religion and education. These four factors can be considered in the planning policy in Kenya and Africa at large.Keywords: fertility transition, principal component analysis, Kenya demographic health survey, birth order
Procedia PDF Downloads 8629282 Effects of Different Meteorological Variables on Reference Evapotranspiration Modeling: Application of Principal Component Analysis
Authors: Akinola Ikudayisi, Josiah Adeyemo
Abstract:
The correct estimation of reference evapotranspiration (ETₒ) is required for effective irrigation water resources planning and management. However, there are some variables that must be considered while estimating and modeling ETₒ. This study therefore determines the multivariate analysis of correlated variables involved in the estimation and modeling of ETₒ at Vaalharts irrigation scheme (VIS) in South Africa using Principal Component Analysis (PCA) technique. Weather and meteorological data between 1994 and 2014 were obtained both from South African Weather Service (SAWS) and Agricultural Research Council (ARC) in South Africa for this study. Average monthly data of minimum and maximum temperature (°C), rainfall (mm), relative humidity (%), and wind speed (m/s) were the inputs to the PCA-based model, while ETₒ is the output. PCA technique was adopted to extract the most important information from the dataset and also to analyze the relationship between the five variables and ETₒ. This is to determine the most significant variables affecting ETₒ estimation at VIS. From the model performances, two principal components with a variance of 82.7% were retained after the eigenvector extraction. The results of the two principal components were compared and the model output shows that minimum temperature, maximum temperature and windspeed are the most important variables in ETₒ estimation and modeling at VIS. In order words, ETₒ increases with temperature and windspeed. Other variables such as rainfall and relative humidity are less important and cannot be used to provide enough information about ETₒ estimation at VIS. The outcome of this study has helped to reduce input variable dimensionality from five to the three most significant variables in ETₒ modelling at VIS, South Africa.Keywords: irrigation, principal component analysis, reference evapotranspiration, Vaalharts
Procedia PDF Downloads 24829281 Application of FT-NIR Spectroscopy and Electronic Nose in On-line Monitoring of Dough Proofing
Authors: Madhuresh Dwivedi, Navneet Singh Deora, Aastha Deswal, H. N. Mishra
Abstract:
FT-NIR spectroscopy and electronic nose was used to study the kinetics of dough proofing. Spectroscopy was conducted with an optic probe in the diffuse reflectance mode. The dough leavening was carried out at different temperatures (25 and 35°C) and constant RH (80%). Spectra were collected in the range of wave numbers from 12,000 to 4,000 cm-1 directly on the samples, every 5 min during proofing, up to 2 hours. NIR spectra were corrected for scatter effect and second order derivatization was done to transform the spectra. Principal component analysis (PCA) was applied for the leavening process and process kinetics was calculated. PCA was performed on data set and loadings were calculated. For leavening, four absorption zones (8,950-8,850, 7,200-6,800, 5,250-5,150 and 4,700-4,250 cm-1) were involved in describing the process. Simultaneously electronic nose was also used for understanding the development of odour compounds during fermentation. The electronic nose was able to differential the sample on the basis of aroma generation at different time during fermentation. In order to rapidly differentiate samples based on odor, a Principal component analysis is performed and successfully demonstrated in this study. The result suggests that electronic nose and FT-NIR spectroscopy can be utilized for the online quality control of the fermentation process during leavening of bread dough.Keywords: FT-NIR, dough, e-nose, proofing, principal component analysis
Procedia PDF Downloads 38329280 Principal Component Analysis Applied to the Electric Power Systems – Practical Guide; Practical Guide for Algorithms
Authors: John Morales, Eduardo Orduña
Abstract:
Currently the Principal Component Analysis (PCA) theory has been used to develop algorithms regarding to Electric Power Systems (EPS). In this context, this paper presents a practical tutorial of this technique detailed their concept, on-line and off-line mathematical foundations, which are necessary and desirables in EPS algorithms. Thus, features of their eigenvectors which are very useful to real-time process are explained, showing how it is possible to select these parameters through a direct optimization. On the other hand, in this work in order to show the application of PCA to off-line and on-line signals, an example step to step using Matlab commands is presented. Finally, a list of different approaches using PCA is presented, and some works which could be analyzed using this tutorial are presented.Keywords: practical guide; on-line; off-line, algorithms, faults
Procedia PDF Downloads 55929279 Comparison of Power Generation Status of Photovoltaic Systems under Different Weather Conditions
Authors: Zhaojun Wang, Zongdi Sun, Qinqin Cui, Xingwan Ren
Abstract:
Based on multivariate statistical analysis theory, this paper uses the principal component analysis method, Mahalanobis distance analysis method and fitting method to establish the photovoltaic health model to evaluate the health of photovoltaic panels. First of all, according to weather conditions, the photovoltaic panel variable data are classified into five categories: sunny, cloudy, rainy, foggy, overcast. The health of photovoltaic panels in these five types of weather is studied. Secondly, a scatterplot of the relationship between the amount of electricity produced by each kind of weather and other variables was plotted. It was found that the amount of electricity generated by photovoltaic panels has a significant nonlinear relationship with time. The fitting method was used to fit the relationship between the amount of weather generated and the time, and the nonlinear equation was obtained. Then, using the principal component analysis method to analyze the independent variables under five kinds of weather conditions, according to the Kaiser-Meyer-Olkin test, it was found that three types of weather such as overcast, foggy, and sunny meet the conditions for factor analysis, while cloudy and rainy weather do not satisfy the conditions for factor analysis. Therefore, through the principal component analysis method, the main components of overcast weather are temperature, AQI, and pm2.5. The main component of foggy weather is temperature, and the main components of sunny weather are temperature, AQI, and pm2.5. Cloudy and rainy weather require analysis of all of their variables, namely temperature, AQI, pm2.5, solar radiation intensity and time. Finally, taking the variable values in sunny weather as observed values, taking the main components of cloudy, foggy, overcast and rainy weather as sample data, the Mahalanobis distances between observed value and these sample values are obtained. A comparative analysis was carried out to compare the degree of deviation of the Mahalanobis distance to determine the health of the photovoltaic panels under different weather conditions. It was found that the weather conditions in which the Mahalanobis distance fluctuations ranged from small to large were: foggy, cloudy, overcast and rainy.Keywords: fitting, principal component analysis, Mahalanobis distance, SPSS, MATLAB
Procedia PDF Downloads 14029278 A Multivariate Statistical Approach for Water Quality Assessment of River Hindon, India
Authors: Nida Rizvi, Deeksha Katyal, Varun Joshi
Abstract:
River Hindon is an important river catering the demand of highly populated rural and industrial cluster of western Uttar Pradesh, India. Water quality of river Hindon is deteriorating at an alarming rate due to various industrial, municipal and agricultural activities. The present study aimed at identifying the pollution sources and quantifying the degree to which these sources are responsible for the deteriorating water quality of the river. Various water quality parameters, like pH, temperature, electrical conductivity, total dissolved solids, total hardness, calcium, chloride, nitrate, sulphate, biological oxygen demand, chemical oxygen demand and total alkalinity were assessed. Water quality data obtained from eight study sites for one year has been subjected to the two multivariate techniques, namely, principal component analysis and cluster analysis. Principal component analysis was applied with the aim to find out spatial variability and to identify the sources responsible for the water quality of the river. Three Varifactors were obtained after varimax rotation of initial principal components using principal component analysis. Cluster analysis was carried out to classify sampling stations of certain similarity, which grouped eight different sites into two clusters. The study reveals that the anthropogenic influence (municipal, industrial, waste water and agricultural runoff) was the major source of river water pollution. Thus, this study illustrates the utility of multivariate statistical techniques for analysis and elucidation of multifaceted data sets, recognition of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.Keywords: cluster analysis, multivariate statistical techniques, river Hindon, water quality
Procedia PDF Downloads 45729277 QSRR Analysis of 17-Picolyl and 17-Picolinylidene Androstane Derivatives Based on Partial Least Squares and Principal Component Regression
Authors: Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Lidija Jevrić, Evgenija Djurendić, Jovana Ajduković
Abstract:
There are several methods for determination of the lipophilicity of biologically active compounds, however chromatography has been shown as a very suitable method for this purpose. Chromatographic (C18-RP-HPLC) analysis of a series of 24 17-picolyl and 17-picolinylidene androstane derivatives was carried out. The obtained retention indices (logk, methanol (90%) / water (10%)) were correlated with calculated physicochemical and lipophilicity descriptors. The QSRR analysis was carried out applying principal component regression (PCR) and partial least squares regression (PLS). The PCR and PLS model were selected on the basis of the highest variance and the lowest root mean square error of cross-validation. The obtained PCR and PLS model successfully correlate the calculated molecular descriptors with logk parameter indicating the significance of the lipophilicity of compounds in chromatographic process. On the basis of the obtained results it can be concluded that the obtained logk parameters of the analyzed androstane derivatives can be considered as their chromatographic lipophilicity. These results are the part of the project No. 114-451-347/2015-02, financially supported by the Provincial Secretariat for Science and Technological Development of Vojvodina and CMST COST Action CM1105.Keywords: androstane derivatives, chromatography, molecular structure, principal component regression, partial least squares regression
Procedia PDF Downloads 27229276 Statistical Model of Water Quality in Estero El Macho, Machala-El Oro
Authors: Rafael Zhindon Almeida
Abstract:
Surface water quality is an important concern for the evaluation and prediction of water quality conditions. The objective of this study is to develop a statistical model that can accurately predict the water quality of the El Macho estuary in the city of Machala, El Oro province. The methodology employed in this study is of a basic type that involves a thorough search for theoretical foundations to improve the understanding of statistical modeling for water quality analysis. The research design is correlational, using a multivariate statistical model involving multiple linear regression and principal component analysis. The results indicate that water quality parameters such as fecal coliforms, biochemical oxygen demand, chemical oxygen demand, iron and dissolved oxygen exceed the allowable limits. The water of the El Macho estuary is determined to be below the required water quality criteria. The multiple linear regression model, based on chemical oxygen demand and total dissolved solids, explains 99.9% of the variance of the dependent variable. In addition, principal component analysis shows that the model has an explanatory power of 86.242%. The study successfully developed a statistical model to evaluate the water quality of the El Macho estuary. The estuary did not meet the water quality criteria, with several parameters exceeding the allowable limits. The multiple linear regression model and principal component analysis provide valuable information on the relationship between the various water quality parameters. The findings of the study emphasize the need for immediate action to improve the water quality of the El Macho estuary to ensure the preservation and protection of this valuable natural resource.Keywords: statistical modeling, water quality, multiple linear regression, principal components, statistical models
Procedia PDF Downloads 9129275 Fuzzy-Machine Learning Models for the Prediction of Fire Outbreak: A Comparative Analysis
Authors: Uduak Umoh, Imo Eyoh, Emmauel Nyoho
Abstract:
This paper compares fuzzy-machine learning algorithms such as Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) for the predicting cases of fire outbreak. The paper uses the fire outbreak dataset with three features (Temperature, Smoke, and Flame). The data is pre-processed using Interval Type-2 Fuzzy Logic (IT2FL) algorithm. Min-Max Normalization and Principal Component Analysis (PCA) are used to predict feature labels in the dataset, normalize the dataset, and select relevant features respectively. The output of the pre-processing is a dataset with two principal components (PC1 and PC2). The pre-processed dataset is then used in the training of the aforementioned machine learning models. K-fold (with K=10) cross-validation method is used to evaluate the performance of the models using the matrices – ROC (Receiver Operating Curve), Specificity, and Sensitivity. The model is also tested with 20% of the dataset. The validation result shows KNN is the better model for fire outbreak detection with an ROC value of 0.99878, followed by SVM with an ROC value of 0.99753.Keywords: Machine Learning Algorithms , Interval Type-2 Fuzzy Logic, Fire Outbreak, Support Vector Machine, K-Nearest Neighbour, Principal Component Analysis
Procedia PDF Downloads 17729274 Genetic Variability and Principal Component Analysis in Eggplant (Solanum melongena)
Authors: M. R. Naroui Rad, A. Ghalandarzehi, J. A. Koohpayegani
Abstract:
Nine advanced cultivars and lines were planted in transplant trays on March, 2013. In mid-April 2014, nine cultivars and lines were taken from the seedling trays and were evaluated and compared in an experiment in form of a completely randomized block design with three replications at the Agricultural Research Station, Zahak. The results of the analysis of variance showed that there was a significant difference between the studied cultivars in terms of average fruit weight, fruit length, fruit diameter, ratio of fruit length to its diameter, the relative number of seeds per fruit, and each plant yield. The total yield of Sohrab and Y6 line with and an average of 41.9 and 36.7 t/ ha allocated the highest yield respectively to themselves. The results of simple correlation between the analyzed traits showed the final yield was affected by the average fruit weight due to direct and indirect effects of fruit weight and plant yield on the final yield. The genotypic and heritability values were high for fruit weight, fruit length and number of seed per fruit. The first two principal components accounted for 81.6% of the total variation among the characters describing genotypes.Keywords: eggplant, principal component, variation, path analysis
Procedia PDF Downloads 22629273 Investigating the Demand of Short-Shelf Life Food Products for SME Wholesalers
Authors: Yamini Raju, Parminder S. Kang, Adam Moroz, Ross Clement, Alistair Duffy, Ashley Hopwell
Abstract:
Accurate prediction of fresh produce demand is one the challenges faced by Small Medium Enterprise (SME) wholesalers. Current research in this area focused on limited number of factors specific to a single product or a business type. This paper gives an overview of the current literature on the variability factors used to predict demand and the existing forecasting techniques of short shelf life products. It then extends it by adding new factors and investigating if there is a time lag and possibility of noise in the orders. It also identifies the most important factors using correlation and Principal Component Analysis (PCA).Keywords: demand forecasting, deteriorating products, food wholesalers, principal component analysis, variability factors
Procedia PDF Downloads 51329272 Discriminating Between Energy Drinks and Sports Drinks Based on Their Chemical Properties Using Chemometric Methods
Authors: Robert Cazar, Nathaly Maza
Abstract:
Energy drinks and sports drinks are quite popular among young adults and teenagers worldwide. Some concerns regarding their health effects – particularly those of the energy drinks - have been raised based on scientific findings. Differentiating between these two types of drinks by means of their chemical properties seems to be an instructive task. Chemometrics provides the most appropriate strategy to do so. In this study, a discrimination analysis of the energy and sports drinks has been carried out applying chemometric methods. A set of eleven samples of available commercial brands of drinks – seven energy drinks and four sports drinks – were collected. Each sample was characterized by eight chemical variables (carbohydrates, energy, sugar, sodium, pH, degrees Brix, density, and citric acid). The data set was standardized and examined by exploratory chemometric techniques such as clustering and principal component analysis. As a preliminary step, a variable selection was carried out by inspecting the variable correlation matrix. It was detected that some variables are redundant, so they can be safely removed, leaving only five variables that are sufficient for this analysis. They are sugar, sodium, pH, density, and citric acid. Then, a hierarchical clustering `employing the average – linkage criterion and using the Euclidian distance metrics was performed. It perfectly separates the two types of drinks since the resultant dendogram, cut at the 25% similarity level, assorts the samples in two well defined groups, one of them containing the energy drinks and the other one the sports drinks. Further assurance of the complete discrimination is provided by the principal component analysis. The projection of the data set on the first two principal components – which retain the 71% of the data information – permits to visualize the distribution of the samples in the two groups identified in the clustering stage. Since the first principal component is the discriminating one, the inspection of its loadings consents to characterize such groups. The energy drinks group possesses medium to high values of density, citric acid, and sugar. The sports drinks group, on the other hand, exhibits low values of those variables. In conclusion, the application of chemometric methods on a data set that features some chemical properties of a number of energy and sports drinks provides an accurate, dependable way to discriminate between these two types of beverages.Keywords: chemometrics, clustering, energy drinks, principal component analysis, sports drinks
Procedia PDF Downloads 10329271 Poster : Incident Signals Estimation Based on a Modified MCA Learning Algorithm
Authors: Rashid Ahmed , John N. Avaritsiotis
Abstract:
Many signal subspace-based approaches have already been proposed for determining the fixed Direction of Arrival (DOA) of plane waves impinging on an array of sensors. Two procedures for DOA estimation based neural networks are presented. First, Principal Component Analysis (PCA) is employed to extract the maximum eigenvalue and eigenvector from signal subspace to estimate DOA. Second, minor component analysis (MCA) is a statistical method of extracting the eigenvector associated with the smallest eigenvalue of the covariance matrix. In this paper, we will modify a Minor Component Analysis (MCA(R)) learning algorithm to enhance the convergence, where a convergence is essential for MCA algorithm towards practical applications. The learning rate parameter is also presented, which ensures fast convergence of the algorithm, because it has direct effect on the convergence of the weight vector and the error level is affected by this value. MCA is performed to determine the estimated DOA. Preliminary results will be furnished to illustrate the convergences results achieved.Keywords: Direction of Arrival, neural networks, Principle Component Analysis, Minor Component Analysis
Procedia PDF Downloads 44729270 Air Quality Forecast Based on Principal Component Analysis-Genetic Algorithm and Back Propagation Model
Authors: Bin Mu, Site Li, Shijin Yuan
Abstract:
Under the circumstance of environment deterioration, people are increasingly concerned about the quality of the environment, especially air quality. As a result, it is of great value to give accurate and timely forecast of AQI (air quality index). In order to simplify influencing factors of air quality in a city, and forecast the city’s AQI tomorrow, this study used MATLAB software and adopted the method of constructing a mathematic model of PCA-GABP to provide a solution. To be specific, this study firstly made principal component analysis (PCA) of influencing factors of AQI tomorrow including aspects of weather, industry waste gas and IAQI data today. Then, we used the back propagation neural network model (BP), which is optimized by genetic algorithm (GA), to give forecast of AQI tomorrow. In order to verify validity and accuracy of PCA-GABP model’s forecast capability. The study uses two statistical indices to evaluate AQI forecast results (normalized mean square error and fractional bias). Eventually, this study reduces mean square error by optimizing individual gene structure in genetic algorithm and adjusting the parameters of back propagation model. To conclude, the performance of the model to forecast AQI is comparatively convincing and the model is expected to take positive effect in AQI forecast in the future.Keywords: AQI forecast, principal component analysis, genetic algorithm, back propagation neural network model
Procedia PDF Downloads 22129269 Potential Ecological Risk Assessment of Selected Heavy Metals in Sediments of Tidal Flat Marsh, the Case Study: Shuangtai Estuary, China
Authors: Chang-Fa Liu, Yi-Ting Wang, Yuan Liu, Hai-Feng Wei, Lei Fang, Jin Li
Abstract:
Heavy metals in sediments can cause adverse ecological effects while it exceeds a given criteria. The present study investigated sediment environmental quality, pollutant enrichment, ecological risk, and source identification for copper, cadmium, lead, zinc, mercury, and arsenic in the sediments collected from tidal flat marsh of Shuangtai estuary, China. The arithmetic mean integrated pollution index, geometric mean integrated pollution index, fuzzy integrated pollution index, and principal component score were used to characterize sediment environmental quality; fuzzy similarity and geo-accumulation Index were used to evaluate pollutant enrichment; correlation matrix, principal component analysis, and cluster analysis were used to identify source of pollution; environmental risk index and potential ecological risk index were used to assess ecological risk. The environmental qualities of sediment are classified to very low degree of contamination or low contamination. The similar order to element background of soil in the Liaohe plain is region of Sanjiaozhou, Honghaitan, Sandaogou, Xiaohe by pollutant enrichment analysis. The source identification indicates that correlations are significantly among metals except between copper and cadmium. Cadmium, lead, zinc, mercury, and arsenic will be clustered in the same clustering as the first principal component. Copper will be clustered as second principal component. The environmental risk assessment level will be scaled to no risk in the studied area. The order of potential ecological risk is As > Cd > Hg > Cu > Pb > Zn.Keywords: ecological risk assessment, heavy metals, sediment, marsh, Shuangtai estuary
Procedia PDF Downloads 34229268 Detection of Abnormal Process Behavior in Copper Solvent Extraction by Principal Component Analysis
Authors: Kirill Filianin, Satu-Pia Reinikainen, Tuomo Sainio
Abstract:
Frequent measurements of product steam quality create a data overload that becomes more and more difficult to handle. In the current study, plant history data with multiple variables was successfully treated by principal component analysis to detect abnormal process behavior, particularly, in copper solvent extraction. The multivariate model is based on the concentration levels of main process metals recorded by the industrial on-stream x-ray fluorescence analyzer. After mean-centering and normalization of concentration data set, two-dimensional multivariate model under principal component analysis algorithm was constructed. Normal operating conditions were defined through control limits that were assigned to squared score values on x-axis and to residual values on y-axis. 80 percent of the data set were taken as the training set and the multivariate model was tested with the remaining 20 percent of data. Model testing showed successful application of control limits to detect abnormal behavior of copper solvent extraction process as early warnings. Compared to the conventional techniques of analyzing one variable at a time, the proposed model allows to detect on-line a process failure using information from all process variables simultaneously. Complex industrial equipment combined with advanced mathematical tools may be used for on-line monitoring both of process streams’ composition and final product quality. Defining normal operating conditions of the process supports reliable decision making in a process control room. Thus, industrial x-ray fluorescence analyzers equipped with integrated data processing toolbox allows more flexibility in copper plant operation. The additional multivariate process control and monitoring procedures are recommended to apply separately for the major components and for the impurities. Principal component analysis may be utilized not only in control of major elements’ content in process streams, but also for continuous monitoring of plant feed. The proposed approach has a potential in on-line instrumentation providing fast, robust and cheap application with automation abilities.Keywords: abnormal process behavior, failure detection, principal component analysis, solvent extraction
Procedia PDF Downloads 30529267 Data and Spatial Analysis for Economy and Education of 28 E.U. Member-States for 2014
Authors: Alexiou Dimitra, Fragkaki Maria
Abstract:
The objective of the paper is the study of geographic, economic and educational variables and their contribution to determine the position of each member-state among the EU-28 countries based on the values of seven variables as given by Eurostat. The Data Analysis methods of Multiple Factorial Correspondence Analysis (MFCA) Principal Component Analysis and Factor Analysis have been used. The cross tabulation tables of data consist of the values of seven variables for the 28 countries for 2014. The data are manipulated using the CHIC Analysis V 1.1 software package. The results of this program using MFCA and Ascending Hierarchical Classification are given in arithmetic and graphical form. For comparison reasons with the same data the Factor procedure of Statistical package IBM SPSS 20 has been used. The numerical and graphical results presented with tables and graphs, demonstrate the agreement between the two methods. The most important result is the study of the relation between the 28 countries and the position of each country in groups or clouds, which are formed according to the values of the corresponding variables.Keywords: Multiple Factorial Correspondence Analysis, Principal Component Analysis, Factor Analysis, E.U.-28 countries, Statistical package IBM SPSS 20, CHIC Analysis V 1.1 Software, Eurostat.eu Statistics
Procedia PDF Downloads 506