Search results for: Anthropometric data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7407

Search results for: Anthropometric data

7377 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2574
7376 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1528
7375 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2433
7374 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3744
7373 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
7372 Stature Prediction Model Based On Hand Anthropometry

Authors: Arunesh Chandra, Pankaj Chandna, Surinder Deswal, Rajesh Kumar Mishra, Rajender Kumar

Abstract:

The arm length, hand length, hand breadth and middle finger length of 1540 right-handed industrial workers of Haryana state was used to assess the relationship between the upper limb dimensions and stature. Initially, the data were analyzed using basic univariate analysis and independent t-tests; then simple and multiple linear regression models were used to estimate stature using SPSS (version 17). There was a positive correlation between upper limb measurements (hand length, hand breadth, arm length and middle finger length) and stature (p < 0.01), which was highest for hand length. The accuracy of stature prediction ranged from ± 54.897 mm to ± 58.307 mm. The use of multiple regression equations gave better results than simple regression equations. This study provides new forensic standards for stature estimation from the upper limb measurements of male industrial workers of Haryana (India). The results of this research indicate that stature can be determined using hand dimensions with accuracy, when only upper limb is available due to any reasons likewise explosions, train/plane crashes, mutilated bodies, etc. The regression formula derived in this study will be useful for anatomists, archaeologists, anthropologists, design engineers and forensic scientists for fairly prediction of stature using regression equations.

Keywords: Anthropometric dimensions, Forensic identification, Industrial workers, Stature prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2913
7371 Hormones and Mineral Elements Associated with Osteoporosis in Postmenopausal Women in Eastern Slovakia

Authors: M. Mydlárová Blaščáková, J. Poráčová, Z. Tomková, Ľ. Blaščáková, M. Nagy, M. Konečná, E. Petrejčíková, Z. Gogaľová, V. Sedlák, J. Mydlár, M. Zahatňanská, K. Hricová

Abstract:

Osteoporosis is a multifactorial disease that results in reduced quality of life, causes decreased bone strength, and changes in their microarchitecture. Mostly postmenopausal women are at risk. In our study, we measured anthropometric parameters of postmenopausal women (104 women of control group – CG and 105 women of osteoporotic group - OG) and determined TSH hormone levels and PTH as well as mineral elements - Ca, P, Mg and enzyme alkaline phosphatase. Through the correlation analysis in CG, we have found association based on age and BMI, P and Ca, as well as Mg and Ca; in OG we determined interdependence based on an association of age and BMI, age and Ca. Using the Student's t test, we found significantly important differences in biochemical parameters of Mg (p ˂ 0,001) and TSH (p ˂ 0,05) between CG and OG.

Keywords: Factors, bone mass density, Central Europe, biomarkers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 834
7370 Classification of Potential Biomarkers in Breast Cancer Using Artificial Intelligence Algorithms and Anthropometric Datasets

Authors: Aref Aasi, Sahar Ebrahimi Bajgani, Erfan Aasi

Abstract:

Breast cancer (BC) continues to be the most frequent cancer in females and causes the highest number of cancer-related deaths in women worldwide. Inspired by recent advances in studying the relationship between different patient attributes and features and the disease, in this paper, we have tried to investigate the different classification methods for better diagnosis of BC in the early stages. In this regard, datasets from the University Hospital Centre of Coimbra were chosen, and different machine learning (ML)-based and neural network (NN) classifiers have been studied. For this purpose, we have selected favorable features among the nine provided attributes from the clinical dataset by using a random forest algorithm. This dataset consists of both healthy controls and BC patients, and it was noted that glucose, BMI, resistin, and age have the most importance, respectively. Moreover, we have analyzed these features with various ML-based classifier methods, including Decision Tree (DT), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) along with NN-based Multi-Layer Perceptron (MLP) classifier. The results revealed that among different techniques, the SVM and MLP classifiers have the most accuracy, with amounts of 96% and 92%, respectively. These results divulged that the adopted procedure could be used effectively for the classification of cancer cells, and also it encourages further experimental investigations with more collected data for other types of cancers.

Keywords: Breast cancer, health diagnosis, Machine Learning, biomarker classification, Neural Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 246
7369 The Comparison of Data Replication in Distributed Systems

Authors: Iman Zangeneh, Mostafa Moradi, Ali Mokhtarbaf

Abstract:

The necessity of ever-increasing use of distributed data in computer networks is obvious for all. One technique that is performed on the distributed data for increasing of efficiency and reliablity is data rplication. In this paper, after introducing this technique and its advantages, we will examine some dynamic data replication. We will examine their characteristies for some overus scenario and the we will propose some suggestion for their improvement.

Keywords: data replication, data hiding, consistency, dynamicdata replication strategy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1603
7368 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1971
7367 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2022
7366 Cobalamin, Folate and Metabolic Syndrome Parameters in Pediatric Morbid Obesity and Metabolic Syndrome

Authors: Mustafa M. Donma, Orkide Donma

Abstract:

Obesity is known to be associated with many clinically important diseases including metabolic syndrome (MetS). Vitamin B12 plays essential roles in fat and protein metabolisms and its cooperation with vitamin B9 is well-known. The aim of this study is to investigate the possible contributions as well as associations of these micronutrients upon obesity and MetS during childhood. A total of 128 children admitted to Namik Kemal University, Medical Faculty, Department of Pediatrics Outpatient Clinics were included into the scope of this study. The mean age±SEM of 92 morbid obese (MO) children and 36 with MetS were 118.3±3.8 months and 129.5±6.4 months, respectively (p > 0.05). The study was approved by Namık Kemal University, Medical Faculty Ethics Committee. Written informed consent forms were obtained from the parents. Demographic features and anthropometric measurements were recorded. WHO BMI-for age percentiles were used. The values above 99 percentile were defined as MO. Components of MetS [waist circumference (WC), fasting blood glucose (FBG), triacylglycerol (TRG), high density lipoprotein cholesterol (HDL-Chol), systolic pressure (SP), diastolic pressure (DP)] were determined. Routine laboratory tests were performed. Serum vitamin B12 concentrations were measured using electrochemiluminescence immunoassay. Vitamin B9 was analyzed by an immunoassay analyzer. Values for vitamin B12 < 148 pmol/L, 148-221 pmol/L, > 221 pmol/L were accepted as low, borderline and normal, respectively. Vitamin B9 levels ≤ 4 mcg/L defined deficiency state. Statistical evaluations were performed by SPSSx Version 16.0. p≤0.05 was accepted as statistical significance level. Statistically higher body mass index (BMI), WC, hip circumference (C) and neck C were calculated in MetS group compared to children with MO. No difference was noted for head C. All MetS components differed between the groups (SP, DP p < 0.001; WC, FBG, TRG p < 0.01; HDL-Chol p < 0.05). Significantly decreased vitamin B9 and vitamin B12 levels were detected (p < 0.05) in children with MetS. In both groups percentage of folate deficiency was 5.5%. No cases were below < 148 pmol/L. However, in MO group 14.3% and in MetS group 22.2% of the cases were of borderline status. In MO group B12 levels were negatively correlated with BMI, WC, hip C and head C, but not with neck C. WC, hip C, head C and neck C were all negatively correlated with HDL-Chol. None of these correlations were observed in the group of children with MetS. Strong positive correlation between FBG and insulin as well as strong negative correlation between TRG and HDL-Chol detected in MO children were lost in MetS group. Deficiency state end-products of both B9 and B12 may interfere with the expected profiles of MetS components. In this study, the alterations in MetS components affected vitamin B12 metabolism and also its associations with anthropometric body measurements. Further increases in vitamin B12 and vitamin B9 deficiency in MetS associated with the increased vitamin B12 as well as vitamin B9 deficiency metabolites may add to MetS parameters.

Keywords: Children, cobalamin, folate, metabolic syndrome, obesity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1098
7365 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2745
7364 The Effects of Three Months of HIIT on Plasma Adiponectin on Overweight College Men

Authors: M. J. Pourvaghar, M. E. Bahram, M. Sayyah, Sh. Khoshemehry

Abstract:

Adiponectin is a cytokine secreted by the adipose tissue that functions as an anti-inflammatory, antiathrogenic and anti-diabetic substance. Its density is inversely correlated with body mass index. The purpose of this research was to examine the effect of 12 weeks of high intensity interval training (HIIT) with the level of serum adiponectin and some selected adiposity markers in overweight and fat college students. This was a clinical research in which 24 students with BMI between 25 kg/m2 to 30 kg/m2. The sample was purposefully selected and then randomly assigned into two groups of experimental (age =22.7±1.5 yr.; weight = 85.8±3.18 kg and height =178.7±3.29 cm) and control (age =23.1±1.1 yr.; weight = 79.1±2.4 kg and height =181.3±4.6 cm), respectively. The experimental group participated in an aerobic exercise program for 12 weeks, three sessions per weeks at a high intensity between 85% to 95% of maximum heart rate (considering the over load principle). Prior and after the termination of exercise protocol, the level of serum adiponectin, BMI, waist to hip ratio, and body fat percentages were calculated. The data were analyzed by using SPSS: PC 16.0 and statistical procedure such as ANCOVA, was used. The results indicated that 12 weeks of intensive interval training led to the increase of serum adiponectin level and decrease of body weight, body fat percent, body mass index and waist to hip ratio (P < 0.05). Based on the results of this research, it may be concluded that participation in intensive interval training for 12 weeks is a non-invasive treatment to increase the adiponectin level while decreasing some of the anthropometric indices associated with obesity or being overweight.

Keywords: Adiponectin, interval, intensive, overweight, training.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1089
7363 Automatic Real-Patient Medical Data De-Identification for Research Purposes

Authors: Petr Vcelak, Jana Kleckova

Abstract:

Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1607
7362 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range

Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu

Abstract:

Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.

Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1184
7361 Evaluation of Eating Habits among Portuguese University Students: A Preliminary Study

Authors: T. H. Rodrigues, Maria J. Reis Lima, R. P. F. Guiné, E. Teixeira de Lemos

Abstract:

Portuguese diet has been gradually diverging from the basic principles of healthy eating, leading to an unbalanced dietary pattern which, associated with increasing sedentary lifestyle, has a negative impact on public health. The main objective of this work was to characterize the dietary habits of university students in Viseu, Portugal. The study consisted of a sample of 80 university students, aged between 18 and 28 years. Anthropometric data (weight (kg) and height (m)) were collected and Body Mass Index (BMI) was calculated. The dietary habits were assessed through a three-day food record and the software Medpoint was used to convert food into energy and nutrients. The results showed that students present a normal body mass index. Female university students made a higher number of daily meals than male students, and these last skipped breakfast more frequently. The values of average daily intake of energy, macronutrients and calcium were higher in males. The food pattern was characterized by a predominant consumption of meat, cereal, fats and sugar. Dietary intake of dairy products, fruits, vegetables and legumes does not meet the recommendations, revealing inadequate food habits such as hypoglycemic, hyperprotein and hyperlipidemic diet. Our findings suggest that preventive interventions should be focus in promoting healthy eating habits and physical activity in adulthood.

Keywords: Food habits, BMI, fortified foods, nutritional deficiencies, university students.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1973
7360 Steganalysis of Data Hiding via Halftoning and Coordinate Projection

Authors: Woong Hee Kim, Ilhwan Park

Abstract:

Steganography is the art of hiding and transmitting data through apparently innocuous carriers in an effort to conceal the existence of the data. A lot of steganography algorithms have been proposed recently. Many of them use the digital image data as a carrier. In data hiding scheme of halftoning and coordinate projection, still image data is used as a carrier, and the data of carrier image are modified for data embedding. In this paper, we present three features for analysis of data hiding via halftoning and coordinate projection. Also, we present a classifier using the proposed three features.

Keywords: Steganography, steganalysis, digital halftoning, data hiding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1563
7359 Biological Data Integration using SOA

Authors: Noura Meshaan Al-Otaibi, Amin Yousef Noaman

Abstract:

Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. This research suggests the use of Service Oriented Architecture (SOA) to integrate biological data from different data sources. This work shows SOA will solve the problems that facing integration process and if the biologist scientists can access the biological data in easier way. There are several methods to implement SOA but web service is the most popular method. The Microsoft .Net Framework used to implement proposed architecture.

Keywords: Bioinformatics, Biological data, Data Integration, SOA and Web Services.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2434
7358 STATISTICA Software: A State of the Art Review

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, P. Ranjetha

Abstract:

Data mining idea is mounting rapidly in admiration and also in their popularity. The foremost aspire of data mining method is to extract data from a huge data set into several forms that could be comprehended for additional use. The data mining is a technology that contains with rich potential resources which could be supportive for industries and businesses that pay attention to collect the necessary information of the data to discover their customer’s performances. For extracting data there are several methods are available such as Classification, Clustering, Association, Discovering, and Visualization… etc., which has its individual and diverse algorithms towards the effort to fit an appropriate model to the data. STATISTICA mostly deals with excessive groups of data that imposes vast rigorous computational constraints. These results trials challenge cause the emergence of powerful STATISTICA Data Mining technologies. In this survey an overview of the STATISTICA software is illustrated along with their significant features.

Keywords: Data Mining, STATISTICA Data Miner, Text Miner, Enterprise Server, Classification, Association, Clustering, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2575
7357 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: Communication, computer network, data collection, probe.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1758
7356 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: Data mining, fuzzy sets, linguistic summarization, patent data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1175
7355 Metadata Update Mechanism Improvements in Data Grid

Authors: S. Farokhzad, M. Reza Salehnamadi

Abstract:

Grid environments include aggregation of geographical distributed resources. Grid is put forward in three types of computational, data and storage. This paper presents a research on data grid. Data grid is used for covering and securing accessibility to data from among many heterogeneous sources. Users are not worry on the place where data is located in it, provided that, they should get access to the data. Metadata is used for getting access to data in data grid. Presently, application metadata catalogue and SRB middle-ware package are used in data grids for management of metadata. At this paper, possibility of updating, streamlining and searching is provided simultaneously and rapidly through classified table of preserving metadata and conversion of each table to numerous tables. Meanwhile, with regard to the specific application, the most appropriate and best division is set and determined. Concurrency of implementation of some of requests and execution of pipeline is adaptability as a result of this technique.

Keywords: Grids, data grid, metadata, update.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1662
7354 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1033
7353 Association of Overweight and Obesity with Breast Cancer

Authors: Amir Ghasemlouei, Alireza Khalaj

Abstract:

Breast cancer is in the top rate of cancer. We analyzed the prevalence of obesity and its association with breast cancer and finally we reviewed 25 article that 320 patient and 320 control which enrolled to our study. The distribution of breast cancer patients and controls with respect to their anthropometric indices in patients with higher weight, which was statistically significant (60.2 ± 10.2 kg) compared with control group (56.1 ± 11.3 kg). The body mass index of patients was (26.06+/-3.42) and significantly higher than the control group (24.1+/-1.7). Obesity leads to increased levels of adipose tissue in the body that can be stored toxins and carcinogens to produce a continuous supply. Due to the high level of fat and the role of estrogen in a woman which is endogenous estrogen of the tumor and regulates the activities of growth steroids, obesity has confirmed as a risk factor for breast cancer. Our study and other studies have shown that obesity is a risk factor for breast cancer. And it can be prevented with a weight loss intervention for breast cancer in the future.

Keywords: Breast cancer, review study, obesity, overweight.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2065
7352 Using Data Clustering in Oral Medicine

Authors: Fahad Shahbaz Khan, Rao Muhammad Anwer, Olof Torgersson

Abstract:

The vast amount of information hidden in huge databases has created tremendous interests in the field of data mining. This paper examines the possibility of using data clustering techniques in oral medicine to identify functional relationships between different attributes and classification of similar patient examinations. Commonly used data clustering algorithms have been reviewed and as a result several interesting results have been gathered.

Keywords: Oral Medicine, Cluto, Data Clustering, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1941
7351 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: Data mining, data analysis, prediction, optimization, building operational performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3666
7350 Query Algebra for Semistuctured Data

Authors: Ei Ei Myat, Ni Lar Thein

Abstract:

With the tremendous growth of World Wide Web (WWW) data, there is an emerging need for effective information retrieval at the document level. Several query languages such as XML-QL, XPath, XQL, Quilt and XQuery are proposed in recent years to provide faster way of querying XML data, but they still lack of generality and efficiency. Our approach towards evolving a framework for querying semistructured documents is based on formal query algebra. Two elements are introduced in the proposed framework: first, a generic and flexible data model for logical representation of semistructured data and second, a set of operators for the manipulation of objects defined in the data model. In additional to accommodating several peculiarities of semistructured data, our model offers novel features such as bidirectional paths for navigational querying and partitions for data transformation that are not available in other proposals.

Keywords: Algebra, Semistructured data, Query Algebra.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1343
7349 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: Simulation data, data summarization, spatial histograms, exploration and visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 712
7348 Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis

Authors: Reza Nadimi, Fariborz Jolai

Abstract:

This article combines two techniques: data envelopment analysis (DEA) and Factor analysis (FA) to data reduction in decision making units (DMU). Data envelopment analysis (DEA), a popular linear programming technique is useful to rate comparatively operational efficiency of decision making units (DMU) based on their deterministic (not necessarily stochastic) input–output data and factor analysis techniques, have been proposed as data reduction and classification technique, which can be applied in data envelopment analysis (DEA) technique for reduction input – output data. Numerical results reveal that the new approach shows a good consistency in ranking with DEA.

Keywords: Effectiveness, Decision Making, Data EnvelopmentAnalysis, Factor Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2390