Search results for: Anthropometric data
7363 Association Rules Mining and NOSQL Oriented Document in Big Data
Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub
Abstract:
Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.
Keywords: Apriori, Association rules mining, Big Data, data mining, Hadoop, Map Reduce, MongoDB, NoSQL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6947362 Identifying Critical Success Factors for Data Quality Management through a Delphi Study
Authors: Maria Paula Santos, Ana Lucas
Abstract:
Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.
Keywords: Critical success factors, data quality, data quality management, Delphi, Q-Sort.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11097361 Secure Data Aggregation Using Clusters in Sensor Networks
Authors: Prakash G L, Thejaswini M, S H Manjula, K R Venugopal, L M Patnaik
Abstract:
Wireless sensor network can be applied to both abominable and military environments. A primary goal in the design of wireless sensor networks is lifetime maximization, constrained by the energy capacity of batteries. One well-known method to reduce energy consumption in such networks is data aggregation. Providing efcient data aggregation while preserving data privacy is a challenging problem in wireless sensor networks research. In this paper, we present privacy-preserving data aggregation scheme for additive aggregation functions. The Cluster-based Private Data Aggregation (CPDA)leverages clustering protocol and algebraic properties of polynomials. It has the advantage of incurring less communication overhead. The goal of our work is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy. We present simulation results of our schemes and compare their performance to a typical data aggregation scheme TAG, where no data privacy protection is provided. Results show the efficacy and efficiency of our schemes.Keywords: Aggregation, Clustering, Query Processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17347360 Hematologic Inflammatory Markers and Inflammation-Related Hepatokines in Pediatric Obesity
Authors: Mustafa M. Donma, Orkide Donma
Abstract:
Obesity in children particularly draws attention, because it may threaten the individual’s future life due to many chronic diseases it may lead to. Most of these diseases including obesity itself altogether are related to inflammation. For this reason, inflammation-related parameters gain importance. Within this context, complete blood cell counts, ratios or indices derived from these counts have recently found some platform to be used as inflammatory markers. So far, mostly adipokines were investigated within the field of obesity. Metabolic inflammation is closely associated with cellular dysfunction. In this study, hematologic inflammatory markers and cytokines produced predominantly by the liver (fibroblast growth factor-21 (FGF-21) and fetuin A) were investigated in pediatric obesity. Two groups were constituted from 76 obese children based on World Health Organization criteria. Group 1 was composed of children, whose age- and sex-adjusted body mass index (BMI) percentiles were between 95 and 99. Group 2 consists of children, who are above 99th percentile. The first and the latter groups were defined as obese (OB) and morbid obese (MO). Anthropometric measurements of the children were performed. Informed consent forms and the approval of the institutional ethics committee were obtained. Blood cell counts and ratios were determined by automated hematology analyzer. The related ratios and indexes were calculated. Statistical evaluation of the data was performed by SPSS program. There was no statistically significant difference in terms of neutrophil-to lymphocyte ratio, monocyte-to-high density lipoprotein cholesterol ratio and platelet-to-lymphocyte ratio between the groups. Mean platelet volume and platelet distribution width values were decreased (p < 0.05), total platelet count, red cell distribution width (RDW) and systemic immune inflammation index values were increased (p < 0.01) in MO group. Both hepatokines were increased in the same group, however increases were not statistically significant. In this group, also a strong correlation was calculated between FGF-21 and RDW when controlled by age, hematocrit, iron and ferritin (r = 0.425; p < 0.01). In conclusion, the association between RDW, a hematologic inflammatory marker, and FGF-21, an inflammation-related hepatokine, found in MO group is an important finding discriminating between OB and MO children. This association is even more powerful when controlled by age and iron-related parameters.
Keywords: Childhood obesity, fetuin A, fibroblast growth factor-21, hematologic markers, red cell distribution width.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7017359 A New Protocol for Concealed Data Aggregation in Wireless Sensor Networks
Authors: M. Abbasi Dezfouli, S. Mazraeh, M. H. Yektaie
Abstract:
Wireless sensor networks (WSN) consists of many sensor nodes that are placed on unattended environments such as military sites in order to collect important information. Implementing a secure protocol that can prevent forwarding forged data and modifying content of aggregated data and has low delay and overhead of communication, computing and storage is very important. This paper presents a new protocol for concealed data aggregation (CDA). In this protocol, the network is divided to virtual cells, nodes within each cell produce a shared key to send and receive of concealed data with each other. Considering to data aggregation in each cell is locally and implementing a secure authentication mechanism, data aggregation delay is very low and producing false data in the network by malicious nodes is not possible. To evaluate the performance of our proposed protocol, we have presented computational models that show the performance and low overhead in our protocol.Keywords: Wireless Sensor Networks, Security, Concealed Data Aggregation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17357358 Prominent Lipid Parameters Correlated with Trunk-to-Leg and Appendicular Fat Ratios in Severe Pediatric Obesity
Authors: Mustafa M. Donma, Orkide Donma
Abstract:
Alterations in lipid parameters as well as in the fat distribution of the body are noteworthy during the evaluation of obesity stages. Total cholesterol (TC), triglycerides (TRG), low density lipoprotein-cholesterol (LDL-C), high density lipoprotein-cholesterol (HDL-C) are basic lipid fractions. Fat deposited in trunk and extremities may give considerable amount of information. Ratios such as trunk-to-leg fat ratio (TLFR) and trunk-to-appendicular fat ratio (TAFR) are derived from distinct fat distribution in these areas. In this study, lipid fractions and TLFR as well as TAFR were evaluated and the distinctions among healthy, obese (OB) and morbid obese (MO) groups were investigated. Three groups [normal body mass index (N-BMI), OB, MO] were constituted. Ages and sexes of the groups were matched. The study protocol was approved by the Non-interventional Ethics Committee of Tekirdag Namik Kemal University. Written informed consent forms were obtained from the parents of the participants. Anthropometric measurements (height, weight, waist circumference, hip circumference, head circumference, neck circumference) were recorded during the physical examination. BMI values were calculated. Total, trunk, leg and arm fat mass values were obtained by TANITA Bioelectrical Impedance Analysis. These values were used to calculate TLFR and TAFR. Systolic (SBP) and diastolic blood pressures (DBP) were measured. Routine biochemical tests including lipid fractions were performed. Data were evaluated using SPSS software. p value smaller than 0.05 was accepted as significant. There was no difference among the age values and gender ratios of the groups. Any statistically significant difference was not observed in terms of DBP, TLFR as well as serum lipid fractions. Higher SBP values were measured both in OB and MO children than those with N-BMI. TAFR showed a significant difference between N-BMI and OB groups. Statistically significant increases were detected between insulin values of N-BMI group and OB as well as MO groups. There were bivariate correlations between LDL and TLFR as well as TAFR values in MO group. When adjusted for SBP and DBP, partial correlations were calculated for LDL-TLFR as well as LDL-TAFR. Much stronger partial correlations were obtained for the same couples upon controlling for TRG and HDL-C. Much stronger partial correlations observed in MO children emphasize the potential transition from morbid obesity to metabolic syndrome. These findings have concluded that LDL-C may be suggested as a discriminating parameter between OB and MO children.
Keywords: Children, lipid parameters, obesity, trunk-to-leg fat ratio, trunk-to-appendicular fat ratio.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3767357 IMDC: An Image-Mapped Data Clustering Technique for Large Datasets
Authors: Faruq A. Al-Omari, Nabeel I. Al-Fayoumi
Abstract:
In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.
Keywords: Data clustering, Data mining, Image-mapping, Pattern discovery, Predictive analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15007356 The New Method of Concealed Data Aggregation in Wireless Sensor: A Case Study
Authors: M. Abbasi Dezfouli, S. Mazraeh, M. H. Yektaie
Abstract:
Wireless sensor networks (WSN) consists of many sensor nodes that are placed on unattended environments such as military sites in order to collect important information. Implementing a secure protocol that can prevent forwarding forged data and modifying content of aggregated data and has low delay and overhead of communication, computing and storage is very important. This paper presents a new protocol for concealed data aggregation (CDA). In this protocol, the network is divided to virtual cells, nodes within each cell produce a shared key to send and receive of concealed data with each other. Considering to data aggregation in each cell is locally and implementing a secure authentication mechanism, data aggregation delay is very low and producing false data in the network by malicious nodes is not possible. To evaluate the performance of our proposed protocol, we have presented computational models that show the performance and low overhead in our protocol.
Keywords: Wireless Sensor Networks, Security, Concealed Data Aggregation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17687355 Peakwise Smoothing of Data Models using Wavelets
Authors: D Sudheer Reddy, N Gopal Reddy, P V Radhadevi, J Saibaba, Geeta Varadan
Abstract:
Smoothing or filtering of data is first preprocessing step for noise suppression in many applications involving data analysis. Moving average is the most popular method of smoothing the data, generalization of this led to the development of Savitzky-Golay filter. Many window smoothing methods were developed by convolving the data with different window functions for different applications; most widely used window functions are Gaussian or Kaiser. Function approximation of the data by polynomial regression or Fourier expansion or wavelet expansion also gives a smoothed data. Wavelets also smooth the data to great extent by thresholding the wavelet coefficients. Almost all smoothing methods destroys the peaks and flatten them when the support of the window is increased. In certain applications it is desirable to retain peaks while smoothing the data as much as possible. In this paper we present a methodology called as peak-wise smoothing that will smooth the data to any desired level without losing the major peak features.Keywords: smoothing, moving average, peakwise smoothing, spatialdensity models, planar shape models, wavelets.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17507354 A New Precautionary Method for Measurement and Improvement the Data Quality
Authors: Seyed Mohammad Hossein Moossavizadeh, Mehran Mohsenzadeh, Nasrin Arshadi
Abstract:
the data quality is a kind of complex and unstructured concept, which is concerned by information systems managers. The reason of this attention is the high amount of Expenses for maintenance and cleaning of the inefficient data. Such a data more than its expenses of lack of quality, cause wrong statistics, analysis and decisions in organizations. Therefor the managers intend to improve the quality of their information systems' data. One of the basic subjects of quality improvement is the evaluation of the amount of it. In this paper, we present a precautionary method, which with its application the data of information systems would have a better quality. Our method would cover different dimensions of data quality; therefor it has necessary integrity. The presented method has tested on three dimensions of accuracy, value-added and believability and the results confirm the improvement and integrity of this method.
Keywords: Data quality, precaution, information system, measurement, improvement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14687353 An Efficient Data Mining Approach on Compressed Transactions
Authors: Jia-Yu Dai, Don-Lin Yang, Jungpin Wu, Ming-Chuan Hung
Abstract:
In an era of knowledge explosion, the growth of data increases rapidly day by day. Since data storage is a limited resource, how to reduce the data space in the process becomes a challenge issue. Data compression provides a good solution which can lower the required space. Data mining has many useful applications in recent years because it can help users discover interesting knowledge in large databases. However, existing compression algorithms are not appropriate for data mining. In [1, 2], two different approaches were proposed to compress databases and then perform the data mining process. However, they all lack the ability to decompress the data to their original state and improve the data mining performance. In this research a new approach called Mining Merged Transactions with the Quantification Table (M2TQT) was proposed to solve these problems. M2TQT uses the relationship of transactions to merge related transactions and builds a quantification table to prune the candidate itemsets which are impossible to become frequent in order to improve the performance of mining association rules. The experiments show that M2TQT performs better than existing approaches.Keywords: Association rule, data mining, merged transaction, quantification table.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19607352 Weigh-in-Motion Data Analysis Software for Developing Traffic Data for Mechanistic Empirical Pavement Design
Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder
Abstract:
Currently, there are few user friendly Weigh-in- Motion (WIM) data analysis softwares available which can produce traffic input data for the recently developed AASHTOWare pavement Mechanistic-Empirical (ME) design software. However, these softwares have only rudimentary Quality Control (QC) processes. Therefore, they cannot properly deal with erroneous WIM data. As the pavement performance is highly sensible to the quality of WIM data, it is highly recommended to use more refined QC process on raw WIM data to get a good result. This study develops a userfriendly software, which can produce traffic input for the ME design software. This software takes the raw data (Class and Weight data) collected from the WIM station and processes it with a sophisticated QC procedure. Traffic data such as traffic volume, traffic distribution, axle load spectra, etc. can be obtained from this software; which can directly be used in the ME design software.Keywords: Weigh-in-motion, software, axle load spectra, traffic distribution, AASHTOWare.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18967351 Human Growth Curve Estimation through a Combination of Longitudinal and Cross-sectional Data
Authors: Sedigheh Mirzaei S., Debasis Sengupta
Abstract:
Parametric models have been quite popular for studying human growth, particularly in relation to biological parameters such as peak size velocity and age at peak size velocity. Longitudinal data are generally considered to be vital for fittinga parametric model to individual-specific data, and for studying the distribution of these biological parameters in a human population. However, cross-sectional data are easier to obtain than longitudinal data. In this paper, we present a method of combining longitudinal and cross-sectional data for the purpose of estimating the distribution of the biological parameters. We demonstrate, through simulations in the special case ofthePreece Baines model, how estimates based on longitudinal data can be improved upon by harnessing the information contained in cross-sectional data.We study the extent of improvement for different mixes of the two types of data, and finally illustrate the use of the method through data collected by the Indian Statistical Institute.Keywords: Preece-Baines growth model, MCMC method, Mixed effect model
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21397350 Semantic Support for Hypothesis-Based Research from Smart Environment Monitoring and Analysis Technologies
Authors: T. S. Myers, J. Trevathan
Abstract:
Improvements in the data fusion and data analysis phase of research are imperative due to the exponential growth of sensed data. Currently, there are developments in the Semantic Sensor Web community to explore efficient methods for reuse, correlation and integration of web-based data sets and live data streams. This paper describes the integration of remotely sensed data with web-available static data for use in observational hypothesis testing and the analysis phase of research. The Semantic Reef system combines semantic technologies (e.g., well-defined ontologies and logic systems) with scientific workflows to enable hypothesis-based research. A framework is presented for how the data fusion concepts from the Semantic Reef architecture map to the Smart Environment Monitoring and Analysis Technologies (SEMAT) intelligent sensor network initiative. The data collected via SEMAT and the inferred knowledge from the Semantic Reef system are ingested to the Tropical Data Hub for data discovery, reuse, curation and publication.
Keywords: Information architecture, Semantic technologies Sensor networks, Ontologies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17157349 Data Migration between Document-Oriented and Relational Databases
Authors: Bogdan Walek, Cyril Klimes
Abstract:
Current tools for data migration between documentoriented and relational databases have several disadvantages. We propose a new approach for data migration between documentoriented and relational databases. During data migration the relational schema of the target (relational database) is automatically created from collection of XML documents. Proposed approach is verified on data migration between document-oriented database IBM Lotus/ Notes Domino and relational database implemented in relational database management system (RDBMS) MySQL.Keywords: data migration, database, document-oriented database, XML, relational schema
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 35257348 Identity Verification Using k-NN Classifiers and Autistic Genetic Data
Authors: Fuad M. Alkoot
Abstract:
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).
Keywords: Biometrics, identity verification, genetic data, k-nearest neighbor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11207347 An Index for the Differential Diagnosis of Morbid Obese Children with and without Metabolic Syndrome
Authors: Mustafa M. Donma, Orkide Donma
Abstract:
Metabolic syndrome (MetS) is a severe health problem caused by morbid obesity, the severest form of obesity. The components of MetS are rather stable in adults. However, the diagnosis of MetS in morbid obese (MO) children still constitutes a matter of discussion. The aim of this study was to develop a formula, which facilitated the diagnosis of MetS in MO children and was capable of discriminating MO children with and without MetS findings. The study population comprised MO children. Age and sex-dependent body mass index (BMI) percentiles of the children were above 99. Increased blood pressure, elevated fasting blood glucose (FBG), elevated triglycerides (TRG) and/or decreased high density lipoprotein cholesterol (HDL-C) in addition to central obesity were listed as MetS components for each child. Two groups were constituted. In the first group, there were 42 MO children without MetS components. Second group was composed of 44 MO children with at least two MetS components. Anthropometric measurements including weight, height, waist and hip circumferences were performed during physical examination. BMI and homeostatic model assessment of insulin resistance (HOMA-IR) values were calculated. Informed consent forms were obtained from the parents of the children. Institutional Non-Interventional Clinical Studies Ethics Committee approved the study design. Routine biochemical analyses including FBG, insulin (INS), TRG, HDL-C were performed. The performance and the clinical utility of Diagnostic Obesity Notation Model Assessment Metabolic Syndrome Index (DONMA MetS index) [(INS/FBG)/(HDL-C/TRG)*100] was tested. Appropriate statistical tests were applied to the study data. p value smaller than 0.05 was defined as significant. MetS index values were 41.6 ± 5.1 in MO group and 104.4 ± 12.8 in MetS group. Corresponding values for HDL-C values were 54.5 ± 13.2 mg/dl and 44.2 ± 11.5 mg/dl. There was a statistically significant difference between the groups (p < 0.001). Upon evaluation of the correlations between MetS index and HDL-C values, a much stronger negative correlation was found in MetS group (r = -0.515; p = 0.001) in comparison with the correlation detected in MO group (r = -0.371; p = 0.016). From these findings, it was concluded that the statistical significance degree of the difference between MO and MetS groups was highly acceptable for this recently introduced MetS index. This was due to the involvement of all of the biochemically defined MetS components into the index. This is particularly important because each of these four parameters used in the formula is a cardiac risk factor. Aside from discriminating MO children with and without MetS findings, MetS index introduced in this study is important from the cardiovascular risk point of view in MetS group of children.
Keywords: Fasting blood glucose, high density lipoprotein cholesterol, insulin, metabolic syndrome, morbid obesity, triglycerides.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2557346 Power Saving System in Green Data Center
Authors: Joon-young Jung, Dong-oh Kang, Chang-seok Bae
Abstract:
Power consumption is rapidly increased in data centers because the number of data center is increased and more the scale of data center become larger. Therefore, it is one of key research items to reduce power consumption in data center. The peak power of a typical server is around 250 watts. When a server is idle, it continues to use around 60% of the power consumed when in use, though vendors are putting effort into reducing this “idle" power load. Servers tend to work at only around a 5% to 20% utilization rate, partly because of response time concerns. An average of 10% of servers in their data centers was unused. In those reason, we propose dynamic power management system to reduce power consumption in green data center. Experiment result shows that about 55% power consumption is reduced at idle time.Keywords: Data Center, Green IT, Management Server, Power Saving.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16287345 Spatial Econometric Approaches for Count Data: An Overview and New Directions
Authors: Paula Simões, Isabel Natário
Abstract:
This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.Keywords: Spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27047344 MATLAB-Based Graphical User Interface (GUI) for Data Mining as a Tool for Environment Management
Authors: M. Awawdeh, A. Fedi
Abstract:
The application of data mining to environmental monitoring has become crucial for a number of tasks related to emergency management. Over recent years, many tools have been developed for decision support system (DSS) for emergency management. In this article a graphical user interface (GUI) for environmental monitoring system is presented. This interface allows accomplishing (i) data collection and observation and (ii) extraction for data mining. This tool may be the basis for future development along the line of the open source software paradigm.
Keywords: Data Mining, Environmental data, Mathematical Models, Matlab Graphical User Interface.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 47417343 Principal Component Analysis using Singular Value Decomposition of Microarray Data
Authors: Dong Hoon Lim
Abstract:
A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. Principal component analysis(PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. PCA, which can be implemented via a singular value decomposition(SVD), is useful for analysis of microarray data. For application of PCA using SVD we use the DNA microarray data for the small round blue cell tumors(SRBCT) of childhood by Khan et al.(2001). To decide the number of components which account for sufficient amount of information we draw scree plot. Biplot, a graphic display associated with PCA, reveals important features that exhibit relationship between variables and also the relationship of variables with observations.
Keywords: Principal component analysis, singular value decomposition, microarray data, SRBCT
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32507342 Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring
Authors: Youngji Yoo, Cheong-Sool Park, Jun Seok Kim, Young-Hak Lee, Sung-Shick Kim, Jun-Geol Baek
Abstract:
In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof- control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.Keywords: Semiconductor, non-normal mixed process data, clustering, Statistical Quality Control (SQC), regression tree, Pearson distribution system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17807341 Speech Data Compression using Vector Quantization
Authors: H. B. Kekre, Tanuja K. Sarode
Abstract:
Mostly transforms are used for speech data compressions which are lossy algorithms. Such algorithms are tolerable for speech data compression since the loss in quality is not perceived by the human ear. However the vector quantization (VQ) has a potential to give more data compression maintaining the same quality. In this paper we propose speech data compression algorithm using vector quantization technique. We have used VQ algorithms LBG, KPE and FCG. The results table shows computational complexity of these three algorithms. Here we have introduced a new performance parameter Average Fractional Change in Speech Sample (AFCSS). Our FCG algorithm gives far better performance considering mean absolute error, AFCSS and complexity as compared to others.Keywords: Vector Quantization, Data Compression, Encoding, , Speech coding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24047340 Ontology and CDSS Based Intelligent Health Data Management in Health Care Server
Authors: Eun-Jung Ko, Hyung-Jik Lee, Jeun-Woo Lee
Abstract:
In ubiqutious healthcare environment, user's health data are transfered to the remote healthcare server by the user's wearable system or mobile phone. These collected user's health data should be managed and analyzed in the healthcare server, so that care giver or user can monitor user's physiological state. In this paper, we designed and developed the intelligent Healthcare Server to manage the user's health data using CDSS and ontology. Our system can analyze user's health data semantically using CDSS and ontology, and report the result of user's physiological raw data to the user and care giver.
Keywords: u-healthcare, CDSS, healthcare server, health data, ontology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22357339 A Genetic Algorithm for Clustering on Image Data
Authors: Qin Ding, Jim Gasvoda
Abstract:
Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.
Keywords: Clustering, data mining, genetic algorithm, image data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20537338 A Holistic Framework for Unifying Data Security and Management in Modern Enterprises
Authors: Ashly Joseph
Abstract:
Modern businesses struggle significantly to secure and manage their data properly as the volume and complexity of their data both expand exponentially. Through the use of a multi-layered defense strategy, a centralized management platform, and cutting-edge technologies like AI, this research paper presents a comprehensive framework to integrate data security and management. The constraints of current data protection and management strategies, technological advancements, and the evolving threat landscape are all examined in this article. It suggests best practices for putting into practice integrated data security and governance models, placing an emphasis on ongoing adaptation. The advantages mentioned include a strengthened security posture, simpler procedures, lower costs, and reduced complexity. Additionally, issues including skill shortages, antiquated systems, and cultural obstacles are examined. Security executives and Chief Information Security Officers are given practical advice on how to evaluate, plan, and put into place strong data-centric security and management capabilities. The goal of the paper is to provide a thorough study of the data security and management landscape and to arm contemporary businesses with the knowledge they need to be proactive in protecting their data assets.
Keywords: Data security, security management, cloud computing, cybersecurity, data governance, security architecture, data management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2707337 Post Mining- Discovering Valid Rules from Different Sized Data Sources
Authors: R. Nedunchezhian, K. Anbumani
Abstract:
A big organization may have multiple branches spread across different locations. Processing of data from these branches becomes a huge task when innumerable transactions take place. Also, branches may be reluctant to forward their data for centralized processing but are ready to pass their association rules. Local mining may also generate a large amount of rules. Further, it is not practically possible for all local data sources to be of the same size. A model is proposed for discovering valid rules from different sized data sources where the valid rules are high weighted rules. These rules can be obtained from the high frequency rules generated from each of the data sources. A data source selection procedure is considered in order to efficiently synthesize rules. Support Equalization is another method proposed which focuses on eliminating low frequency rules at the local sites itself thus reducing the rules by a significant amount.
Keywords: Association rules, multiple data stores, synthesizing, valid rules.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14047336 RFID-ready Master Data Management for Reverse Logistics
Authors: Jincheol Han, Hyunsun Ju, Jonghoon Chun
Abstract:
Sharing consistent and correct master data among disparate applications in a reverse-logistics chain has long been recognized as an intricate problem. Although a master data management (MDM) system can surely assume that responsibility, applications that need to co-operate with it must comply with proprietary query interfaces provided by the specific MDM system. In this paper, we present a RFID-ready MDM system which makes master data readily available for any participating applications in a reverse-logistics chain. We propose a RFID-wrapper as a part of our MDM. It acts as a gateway between any data retrieval request and query interfaces that process it. With the RFID-wrapper, any participating applications in a reverse-logistics chain can easily retrieve master data in a way that is analogous to retrieval of any other RFID-based logistics transactional data.Keywords: Reverse Logistics, Master Data Management, RFID.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19747335 Dynamic Models versus Frailty Models for Recurrent Event Data
Authors: Entisar A. Elgmati
Abstract:
Recurrent event data is a special type of multivariate survival data. Dynamic and frailty models are one of the approaches that dealt with this kind of data. A comparison between these two models is studied using the empirical standard deviation of the standardized martingale residual processes as a way of assessing the fit of the two models based on the Aalen additive regression model. Here we found both approaches took heterogeneity into account and produce residual standard deviations close to each other both in the simulation study and in the real data set.Keywords: Dynamic, frailty, misspecification, recurrent events.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23507334 Spexin and Fetuin A in Morbid Obese Children
Authors: Mustafa M. Donma, Orkide Donma
Abstract:
Spexin, expressed in the central nervous system, has attracted much interest in feeding behavior, obesity, diabetes, energy metabolism and cardiovascular functions. Fetuin A is known as the negative acute phase reactant synthesized in the liver. Eosinophils are early indicators of cardiometabolic complications. Patients with elevated platelet count, associated with hypercoagulable state in the body, are also more liable to cardiovascular diseases (CVDs). In this study, the aim is to examine the profiles of spexin and fetuin A concomitant with the course of variations detected in eosinophil as well as platelet counts in morbid obese children. 34 children with normal-body mass index (N-BMI) and 51 morbid obese (MO) children participated in the study. Written-informed consent forms were obtained prior to the study. Institutional ethics committee approved the study protocol. Age- and sex-adjusted BMI percentile tables prepared by World Health Organization were used to classify healthy and obese children. Mean age ± SEM of the children were 9.3 ± 0.6 years and 10.7 ± 0.5 years in N-BMI and MO groups, respectively. Anthropometric measurements of the children were taken. BMI values were calculated from weight and height values. Blood samples were obtained after an overnight fasting. Routine hematologic and biochemical tests were performed. Within this context, fasting blood glucose (FBG), insulin (INS), triglycerides (TRG), high density lipoprotein-cholesterol (HDL-C) concentrations were measured. Homeostatic model assessment for insulin resistance (HOMA-IR) values were calculated. Spexin and fetuin A levels were determined by enzyme-linked immunosorbent assay. Data were evaluated from the statistical point of view. Statistically significant differences were found between groups in terms of BMI, fat mass index, INS, HOMA-IR and HDL-C. In MO group, all parameters increased as HDL-C decreased. Elevated concentrations in MO group were detected in eosinophils (p < 0.05) and platelets (p > 0.05). Fetuin A levels decreased in MO group (p > 0.05). However, decrease was statistically significant in spexin levels for this group (p < 0.05). In conclusion, these results have suggested that increases in eosinophils and platelets exhibit behavior as cardiovascular risk factors. Decreased fetuin A behaved as a risk factor suitable to increased risk for cardiovascular problems associated with the severity of obesity. Along with increased eosinophils, increased platelets and decreased fetuin A, decreased spexin was the parameter, which reflects best its possible participation in the early development of CVD risk in MO children.
Keywords: Cardiovascular diseases, eosinophils, fetuin A, pediatric morbid obesity, platelets, spexin.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 682