Search results for: data association.

7334 Identification of PIP Aquaporin Genes from Wheat

Abstract:

There is strong evidence that water channel proteins 'aquaporins (AQPs)' are central components in plant-water relations as well as a number of other physiological parameters. We had previously reported the isolation of 24 plasma membrane intrinsic protein (PIP) type AQPs. However, the gene numbers in rice and the polyploid nature of bread wheat indicated a high probability of further genes in the latter. The present work focused on identification of further AQP isoforms in bread wheat. With the use of altered primer design, we identified five genes homologous, designated PIP1;5b, PIP2;9b, TaPIP2;2, TaPIP2;2a, TaPIP2;2b. Sequence alignments indicate PIP1;5b, PIP2;9b are likely to be homeologues of two previously reported genes while the other three are new genes and could be homeologs of each other. The results indicate further AQP diversity in wheat and the sequence data will enable physical mapping of these genes to identify their genomes as well as genetic to determine their association with any quantitative trait loci (QTLs) associated with plant-water relation such as salinity or drought tolerance.

Keywords: Aquaporins, homeologues, PIP, wheat

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1981

7333 Customer Segmentation in Foreign Trade based on Clustering Algorithms Case Study: Trade Promotion Organization of Iran

Authors: Samira Malekmohammadi Golsefid, Mehdi Ghazanfari, Somayeh Alizadeh

Abstract:

The goal of this paper is to segment the countries based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and use this distance function in K-means algorithm. The DEB function is defined based on the concepts of the association rules and the value of export group-commodities. In this paper, clustering quality function and clusters intraclass inertia are defined to, respectively, calculate the optimum number of clusters and to compare the functionality of DEB versus Euclidean distance. We have also study the effects of importance weight in DEB function to improve clustering quality. Lastly when segmentation is completed, a designated RFM model is used to analyze the relative profitability of each cluster.

Keywords: Customers segmentation, Customer relationship management, Clustering, Data Mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2240

7332 Food Quality Labels and their Perception by Consumers in the Czech Republic

Authors: Sarka Velcovska

Abstract:

The paper deals with quality labels used in the food products market, especially with labels of quality, labels of origin, and labels of organic farming. The aim of the paper is to identify perception of these labels by consumers in the Czech Republic. The first part refers to the definition and specification of food quality labels that are relevant in the Czech Republic. The second part includes the discussion of marketing research results. Data were collected with personal questioning method. Empirical findings on 150 respondents are related to consumer awareness and perception of national and European food quality labels used in the Czech Republic, attitudes to purchases of labelled products, and interest in information regarding the labels. Statistical methods, in the concrete Pearson´s chi-square test of independence, coefficient of contingency, and coefficient of association are used to determinate if significant differences do exist among selected demographic categories of Czech consumers.

Keywords: Food quality labels, quality labels awareness, quality labels perception, marketing research.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2268

7331 Association of Zinc with New Generation Cardiovascular Risk Markers in Childhood Obesity

Authors: Mustafa M. Donma, Orkide Donma

Abstract:

Zinc (Zn) is a vital element required for growth and development particularly in children. It exhibits some protective effects against cardiovascular diseases (CVDs). Zn may be a potential biomarker of cardiovascular health. High sensitive cardiac troponin T (hs-cTnT) and cardiac myosin binding protein C (cMyBP-C) are new generation markers used for prediagnosis, diagnosis and prognosis of CVDs. The aim of this study is to determine Zn as well as new generation cardiac markers’ profiles in children with normal body mass index (N-BMI), obese (OB), morbid obese (MO) children and children with metabolic syndrome (MetS) findings. The association among them will also be investigated. Four study groups were constituted. The study protocol was approved by the institutional Ethics Committee of Tekirdag Namik Kemal University. Parents of the participants filled informed consent forms to participate in the study. Group 1 is composed of 44 children with N-BMI. Group 2 and Group 3 comprised 43 OB and 45 MO children, respectively. 45 MO children with MetS findings were included in Group 4. World Health Organization age- and sex-adjusted BMI percentile tables were used to constitute groups. These values were 15-85, 95-99 and above 99 for N-BMI, OB and MO, respectively. Criteria for MetS findings were determined. Routine biochemical analyses including Zn were performed. hs-cTnT and cMyBP-C concentrations were measured by enzyme-linked immunosorbent assay. Data were analyzed by using SPSS software. p < 0.05 was accepted as significant. Four groups were matched for age and gender. Decreased Zn concentrations were measured in Groups 2, 3 and 4 compared to Group 1. Groups did not differ from one another in terms of hs-cTnT. There were statistically significant differences between cMyBP-C levels of MetS group and N-BMI as well as OB groups. There was an increasing trend going from N-BMI group to MetS group. There were statistically significant negative correlations between Zn and hs-cTnT as well as cMyBP-C concentrations in MetS group. In conclusion, inverse correlations detected between Zn and new generation cardiac markers (hs-TnT and cMyBP-C) have pointed out that decreased levels of Zn accompany increased levels of hs-cTnT as well as cMyBP-C in children with MetS. This finding emphasizes that both Zn and these new generation cardiac markers may be evaluated as biomarkers of cardiovascular health during severe childhood obesity precipitated with MetS findings and also suggested as the messengers of the future risk in the adulthood periods of children with MetS.

Keywords: Cardiac myosin binding protein-C, cardiovascular diseases, children, high sensitive cardiac troponin T, obesity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 437

7330 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: Clustering algorithms, coastal engineering, data mining, data summarization, statistical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1187

7329 Dimensional Modeling of HIV Data Using Open Source

Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer

Abstract:

Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.

Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1910

7328 Efficient Lossless Compression of Weather Radar Data

Authors: Wei-hua Ai, Wei Yan, Xiang Li

Abstract:

Data compression is used operationally to reduce bandwidth and storage requirements. An efficient method for achieving lossless weather radar data compression is presented. The characteristics of the data are taken into account and the optical linear prediction is used for the PPI images in the weather radar data in the proposed method. The next PPI image is identical to the current one and a dramatic reduction in source entropy is achieved by using the prediction algorithm. Some lossless compression methods are used to compress the predicted data. Experimental results show that for the weather radar data, the method proposed in this paper outperforms the other methods.

Keywords: Lossless compression, weather radar data, optical linear prediction, PPI image

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2203

7327 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: Data management, digitization, Industry 4.0, knowledge engineering, metamodel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1393

7326 A Methodology for Data Migration between Different Database Management Systems

Authors: Bogdan Walek, Cyril Klimes

Abstract:

In present days the area of data migration is very topical. Current tools for data migration in the area of relational database have several disadvantages that are presented in this paper. We propose a methodology for data migration of the database tables and their data between various types of relational database systems (RDBMS). The proposed methodology contains an expert system. The expert system contains a knowledge base that is composed of IFTHEN rules and based on the input data suggests appropriate data types of columns of database tables. The proposed tool, which contains an expert system, also includes the possibility of optimizing the data types in the target RDBMS database tables based on processed data of the source RDBMS database tables. The proposed expert system is shown on data migration of selected database of the source RDBMS to the target RDBMS.

Keywords: Expert system, fuzzy, data migration, database, relational database, data type, relational database management system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3405

7325 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: Big data, open data, productivity, transparency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1573

7324 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data

Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin

Abstract:

Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.

Keywords: Big data, correlation analysis, data recommendation system, urban data network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1055

7323 On the Combination of Patient-Generated Data with Data from a Secure Clinical Network Environment – A Practical Example

Authors: Jeroen S. de Bruin, Karin Schindler, Christian Schuh

Abstract:

With increasingly more mobile health applications appearing due to the popularity of smartphones, the possibility arises that these data can be used to improve the medical diagnostic process, as well as the overall quality of healthcare, while at the same time lowering costs. However, as of yet there have been no reports of a successful combination of patient-generated data from smartphones with data from clinical routine. In this paper we describe how these two types of data can be combined in a secure way without modification to hospital information systems, and how they can together be used in a medical expert system for automatic nutritional classification and triage.

Keywords: Data integration, disease-related malnutrition, expert systems, mobile health.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2156

7322 Economic Factors Affecting Rice Export of Thailand

Authors: Somphoom Sawaengkun

Abstract:

The purpose of this study was primarily assessing how important economic factors namely: The Thai export price of white rice, the exchange rate, and the world rice consumption affect the overall Thai white rice export, using historical data during the period 1989-2013 from the Thai Rice Exporters Association, and Food and Agricultural Organization of the United Nations. The co-integration method, regression analysis, and error correction model were applied to investigate the econometric model. The findings indicated that in the long-run, the world rice consumption, the exchange rate, and the Thai export price of white rice were the important factors affecting the export quantity of Thai white rice respectively, as indicated by their significant coefficients. Meanwhile, the rice export price was an important factor affecting the export quantity of Thai white rice in the short-run. This information is useful in the business, export opportunities, price competitiveness, and policymaker in Thailand.

Keywords: Economic Factors, Rice Export, White Rice.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3424

7321 Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes

Authors: Geeta Sikka, Arvinder Kaur Takkar, Moin Uddin

Abstract:

Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so as to construct quality software models. Our empirical study is based on NASA-s two public dataset. KC4 and KC1. The actual data sets of 125 cases and 2107 cases respectively, without any missing values were considered. The data set is used to create Missing at Random (MAR) data Listwise Deletion(LD), Mean Substitution(MS), Interpolation, Regression with an error term and Expectation-Maximization (EM) approaches were used to compare the effects of the various techniques.

Keywords: Missing data, Imputation, Missing Data Techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1624

7320 Cluster Analysis for the Statistical Modeling of Aesthetic Judgment Data Related to Comics Artists

Authors: George E. Tsekouras, Evi Sampanikou

Abstract:

We compare three categorical data clustering algorithms with respect to the problem of classifying cultural data related to the aesthetic judgment of comics artists. Such a classification is very important in Comics Art theory since the determination of any classes of similarities in such kind of data will provide to art-historians very fruitful information of Comics Art-s evolution. To establish this, we use a categorical data set and we study it by employing three categorical data clustering algorithms. The performances of these algorithms are compared each other, while interpretations of the clustering results are also given.

Keywords: Aesthetic judgment, comics artists, cluster analysis, categorical data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1601

7319 IoT Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework

Authors: Femi Elegbeleye, Seani Rananga

Abstract:

This paper focused on cost effective storage architecture using fog and cloud data storage gateway, and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. Several results obtained from this study on data privacy models show that when two or more data privacy models are integrated via a fog storage gateway, we often have more secure data. Our main focus in the study is to design a framework for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, including its structure, and its interrelationships.

Keywords: IoT, fog storage, cloud storage, data analysis, data privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 148

7318 Factors Associated with Mammography Screening Behaviors: A Cross-Sectional Descriptive Study of Egyptian Women

Authors: Salwa Hagag Abdelaziz, Naglaa Fathy Youssef, Nadia Abdel Latif Hassan, Rasha Wesam Abdel Rahman

Abstract:

Breast cancer is considered as a substantial health concern and practicing mammography screening [MS] is important in minimizing its related morbidity. So it is essential to have a better understanding of breast cancer screening behaviors of women and factors that influence utilization of them. The aim of this study is to identify the factors that are linked to MS behaviors among the Egyptian women. A cross-sectional descriptive design was carried out to provide a snapshot of the factors that are linked to MS behaviors. A convenience sample of 311 women was utilized and all eligible participants admitted to the Women Imaging Unit who are 40 years of age or above, coming for mammography assessment, not pregnant or breast feeding and who accepted to participate in the study were included. A structured questionnaire was developed by the researchers and contains three parts; Socio-demographic data; Motivating factors associated with MS; and association between MS and model of behavior change. The analyzed data indicated that most of the participated women (66.6%) belonged to the age group of 40- 49.A high proportion of participants (58.1%) of group having previous MS influenced by their neighbors to practice MS, whereas 32.7 % in group not having previous MS were influenced by family members which indicated significant differences (P <0.05). Doctors and media shown to be the least influence of others to practice MS. Women with intention to have a future mammogram had higher OR (1.404) for practicing MS compared with women with no intention. Further studies are needed to examine the relation between Transtheoretical Model [TTM] and practicing MS.

Keywords: Breast cancer, mammography, screening behaviors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2101

7317 Hematologic Inflammatory Markers and Inflammation-Related Hepatokines in Pediatric Obesity

Authors: Mustafa M. Donma, Orkide Donma

Abstract:

Obesity in children particularly draws attention, because it may threaten the individual’s future life due to many chronic diseases it may lead to. Most of these diseases including obesity itself altogether are related to inflammation. For this reason, inflammation-related parameters gain importance. Within this context, complete blood cell counts, ratios or indices derived from these counts have recently found some platform to be used as inflammatory markers. So far, mostly adipokines were investigated within the field of obesity. Metabolic inflammation is closely associated with cellular dysfunction. In this study, hematologic inflammatory markers and cytokines produced predominantly by the liver (fibroblast growth factor-21 (FGF-21) and fetuin A) were investigated in pediatric obesity. Two groups were constituted from 76 obese children based on World Health Organization criteria. Group 1 was composed of children, whose age- and sex-adjusted body mass index (BMI) percentiles were between 95 and 99. Group 2 consists of children, who are above 99th percentile. The first and the latter groups were defined as obese (OB) and morbid obese (MO). Anthropometric measurements of the children were performed. Informed consent forms and the approval of the institutional ethics committee were obtained. Blood cell counts and ratios were determined by automated hematology analyzer. The related ratios and indexes were calculated. Statistical evaluation of the data was performed by SPSS program. There was no statistically significant difference in terms of neutrophil-to lymphocyte ratio, monocyte-to-high density lipoprotein cholesterol ratio and platelet-to-lymphocyte ratio between the groups. Mean platelet volume and platelet distribution width values were decreased (p < 0.05), total platelet count, red cell distribution width (RDW) and systemic immune inflammation index values were increased (p < 0.01) in MO group. Both hepatokines were increased in the same group, however increases were not statistically significant. In this group, also a strong correlation was calculated between FGF-21 and RDW when controlled by age, hematocrit, iron and ferritin (r = 0.425; p < 0.01). In conclusion, the association between RDW, a hematologic inflammatory marker, and FGF-21, an inflammation-related hepatokine, found in MO group is an important finding discriminating between OB and MO children. This association is even more powerful when controlled by age and iron-related parameters.

Keywords: Childhood obesity, fetuin A, fibroblast growth factor-21, hematologic markers, red cell distribution width.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 620

7316 Integration of Multi-Source Data to Monitor Coral Biodiversity

Authors: K. Jitkue, W. Srisang, C. Yaiprasert, K. Jaroensutasinee, M. Jaroensutasinee

Abstract:

This study aims at using multi-source data to monitor coral biodiversity and coral bleaching. We used coral reef at Racha Islands, Phuket as a study area. There were three sources of data: coral diversity, sensor based data and satellite data.

Keywords: Coral reefs, Remote sensing, Sea surfacetemperatue, Satellite imagery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1501

7315 Decision Support System Based on Data Warehouse

Authors: Yang Bao, LuJing Zhang

Abstract:

Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.

Keywords: Decision Support System, Data Warehouse, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3806

7314 Mixed-Mode Study of Rock Fracture Mechanics by using the Modified Arcan Specimen Test

Authors: R. Hasanpour, N. Choupani

Abstract:

This paper studies mixed-mode fracture mechanics in rock based on experimental and numerical analyses. Experiments were performed on sharp-cracked specimens using the modified Arcan specimen test loading device. The modified Arcan specimen test was, in association with a special loading device, an appropriate apparatus for experimental mixed-mode fracture analysis. By varying the loading angle from 0° to 90°, pure mode-I, pure mode-II and a wide range of mixed-mode data were obtained experimentally. Using the finite element results, correction factors applied to the rectangular fracture specimen. By employing experimentally measured critical loads and the aid of the finite element method, mixed-mode fracture toughness for the limestone under consideration determined.

Keywords: Rock Fracture Mechanics, Mixed-mode Loading, Finite Element Analysis, Arcan Test specimen.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2505

7313 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: Text mining, topic extraction, independent, incremental, independent component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 999

7312 A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data

Authors: H. Baazaoui Zghal, S. Faiz, H. Ben Ghezala

Abstract:

Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.

Keywords: Databases, data mining, multi-agent, spatial datamart.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2000

7311 Graves’ Disease and Its Related Single Nucleotide Polymorphisms and Genes

Authors: Yuhong Lu

Abstract:

Graves’ Disease (GD), an autoimmune health condition caused by the over reactiveness of the thyroid, affects about 1 in 200 people worldwide. GD is not caused by one specific single nucleotide polymorphism (SNP) or gene mutation, but rather determined by multiple factors, each differing from each other. Malfunction of the genes in Human Leukocyte Antigen (HLA) family tend to play a major role in autoimmune diseases, but other genes, such as LOC101929163, have functions that still remain ambiguous. Currently, little studies were done to study GD, resulting in inconclusive results. This study serves not only to introduce background knowledge about GD, but also to organize and pinpoint the major SNPs and genes that are potentially related to the occurrence of GD in humans. Collected from multiple sources from genome-wide association studies (GWAS) Central, the potential SNPs related to the causes of GD are included in this study. This study has located the genes that are related to those SNPs and closely examines a selected sample. Using the data from this study, scientists will then be able to focus on the most expressed genes in GD patients and develop a treatment for GD.

Keywords: CTLA4, Graves’ Disease, HLA, single nucleotide polymorphism, SNP.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 480

7310 Latent Topic Based Medical Data Classification

Authors: Jian-hua Yeh, Shi-yi Kuo

Abstract:

This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers are quite different in their nature: target set is only 0.6% size in total, while the outliers consist of 99.4% of the data set. We use this data set as an example to show how we dealt with this extremely biased data set with latent topic discovery and noise reduction techniques. Our experiment faces two major challenge: (1) extremely distributed outliers, and (2) positive samples are far smaller than negative ones. We try to propose a suitable process flow to deal with these issues and get a best AUC result of 0.98.

Keywords: classification, latent topics, outlier adjustment, feature scaling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1607

7309 Data Collection in Hospital Emergencies: A Questionnaire Survey

Authors: Nouha Mhimdi, Wahiba Ben Abdessalem Karaa, Henda Ben Ghezala

Abstract:

Many methods are used to collect data like questionnaires, surveys, focus group interviews. Or the collection of poor-quality data resulting, for example, from poorly designed questionnaires, the absence of good translators or interpreters, and the incorrect recording of data allow conclusions to be drawn that are not supported by the data or to focus only on the average effect of the program or policy. There are several solutions to avoid or minimize the most frequent errors, including obtaining expert advice on the design or adaptation of data collection instruments; or use technologies allowing better "anonymity" in the responses. In this context, and to overcome the aforementioned problems, we suggest in this paper an approach to achieve the collection of relevant data, by carrying out a large-scale questionnaire-based survey. We have been able to collect good quality, consistent and practical data on hospital emergencies to improve emergency services in hospitals, especially in the case of epidemics or pandemics.

Keywords: Data collection, survey, database, data analysis, hospital emergencies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 559

7308 Data Transformation Services (DTS): Creating Data Mart by Consolidating Multi-Source Enterprise Operational Data

Authors: J. D. D. Daniel, K. N. Goh, S. M. Yusop

Abstract:

Trends in business intelligence, e-commerce and remote access make it necessary and practical to store data in different ways on multiple systems with different operating systems. As business evolve and grow, they require efficient computerized solution to perform data update and to access data from diverse enterprise business applications. The objective of this paper is to demonstrate the capability of DTS [1] as a database solution for automatic data transfer and update in solving business problem. This DTS package is developed for the sales of variety of plants and eventually expanded into commercial supply and landscaping business. Dimension data modeling is used in DTS package to extract, transform and load data from heterogeneous database systems such as MySQL, Microsoft Access and Oracle that consolidates into a Data Mart residing in SQL Server. Hence, the data transfer from various databases is scheduled to run automatically every quarter of the year to review the efficient sales analysis. Therefore, DTS is absolutely an attractive solution for automatic data transfer and update which meeting today-s business needs.

Keywords: Data Transformation Services (DTS), ObjectLinking and Embedding Database (OLEDB), Data Mart, OnlineAnalytical Processing (OLAP), Online Transactional Processing(OLTP).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1978

7307 Extraction of Data from Web Pages: A Vision Based Approach

Authors: P. S. Hiremath, Siddu P. Algur

Abstract:

With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, tools for the mining of data regions, data records and data items need to be developed in order to provide value-added services. Currently available automatic techniques to mine data regions from web pages are still unsatisfactory because of their poor performance and tag-dependence. In this paper a novel method to extract data items from the web pages automatically is proposed. It comprises of two steps: (1) Identification and Extraction of the data regions based on visual clues information. (2) Identification of data records and extraction of data items from a data region. For step1, a novel and more effective method is proposed based on visual clues, which finds the data regions formed by all types of tags using visual clues. For step2 a more effective method namely, Extraction of Data Items from web Pages (EDIP), is adopted to mine data items. The EDIP technique is a list-based approach in which the list is a linear data structure. The proposed technique is able to mine the non-contiguous data records and can correctly identify data regions, irrespective of the type of tag in which it is bound. Our experimental results show that the proposed technique performs better than the existing techniques.

Keywords: Web data records, web data regions, web mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1858

7306 Visual-Graphical Methods for Exploring Longitudinal Data

Authors: H. W. Ker

Abstract:

Longitudinal data typically have the characteristics of changes over time, nonlinear growth patterns, between-subjects variability, and the within errors exhibiting heteroscedasticity and dependence. The data exploration is more complicated than that of cross-sectional data. The purpose of this paper is to organize/integrate of various visual-graphical techniques to explore longitudinal data. From the application of the proposed methods, investigators can answer the research questions include characterizing or describing the growth patterns at both group and individual level, identifying the time points where important changes occur and unusual subjects, selecting suitable statistical models, and suggesting possible within-error variance.

Keywords: Data exploration, exploratory analysis, HLMs/LMEs, longitudinal data, visual-graphical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2042

7305 A Materialized Approach to the Integration of XML Documents: the OSIX System

Authors: H. Ahmad, S. Kermanshahani, A. Simonet, M. Simonet

Abstract:

The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.

Keywords: Data integration, semi-structured data, views, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1541