Search results for: data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7587

Search results for: data mining

7557 Data Mining Classification Methods Applied in Drug Design

Authors: Mária Stachová, Lukáš Sobíšek

Abstract:

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

Keywords: data mining, classification, drug design, QSAR

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2852
7556 Prospects, Problems of Marketing Research and Data Mining in Turkey

Authors: Sema Kurtuluş, Kemal Kurtuluş

Abstract:

The objective of this paper is to review and assess the methodological issues and problems in marketing research, data and knowledge mining in Turkey. As a summary, academic marketing research publications in Turkey have significant problems. The most vital problem seems to be related with modeling. Most of the publications had major weaknesses in modeling. There were also, serious problems regarding measurement and scaling, sampling and analyses. Analyses myopia seems to be the most important problem for young academia in Turkey. Another very important finding is the lack of publications on data and knowledge mining in the academic world.

Keywords: Marketing research, data mining, knowledge mining, research modeling, analyses.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1969
7555 Distributed Data-Mining by Probability-Based Patterns

Authors: M. Kargar, F. Gharbalchi

Abstract:

In this paper a new method is suggested for distributed data-mining by the probability patterns. These patterns use decision trees and decision graphs. The patterns are cared to be valid, novel, useful, and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. By using the suggested method we will be able to extract the useful information from massive and multi-relational data bases.

Keywords: Data-mining, Decision tree, Decision graph, Pattern, Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1558
7554 Proposing an Efficient Method for Frequent Pattern Mining

Authors: Vaibhav Kant Singh, Vijay Shah, Yogendra Kumar Jain, Anupam Shukla, A.S. Thoke, Vinay KumarSingh, Chhaya Dule, Vivek Parganiha

Abstract:

Data mining, which is the exploration of knowledge from the large set of data, generated as a result of the various data processing activities. Frequent Pattern Mining is a very important task in data mining. The previous approaches applied to generate frequent set generally adopt candidate generation and pruning techniques for the satisfaction of the desired objective. This paper shows how the different approaches achieve the objective of frequent mining along with the complexities required to perform the job. This paper will also look for hardware approach of cache coherence to improve efficiency of the above process. The process of data mining is helpful in generation of support systems that can help in Management, Bioinformatics, Biotechnology, Medical Science, Statistics, Mathematics, Banking, Networking and other Computer related applications. This paper proposes the use of both upward and downward closure property for the extraction of frequent item sets which reduces the total number of scans required for the generation of Candidate Sets.

Keywords: Data Mining, Candidate Sets, Frequent Item set, Pruning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1685
7553 An Improved Data Mining Method Applied to the Search of Relationship between Metabolic Syndrome and Lifestyles

Authors: Yi Chao Huang, Yu Ling Liao, Chiu Shuang Lin

Abstract:

A data cutting and sorting method (DCSM) is proposed to optimize the performance of data mining. DCSM reduces the calculation time by getting rid of redundant data during the data mining process. In addition, DCSM minimizes the computational units by splitting the database and by sorting data with support counts. In the process of searching for the relationship between metabolic syndrome and lifestyles with the health examination database of an electronics manufacturing company, DCSM demonstrates higher search efficiency than the traditional Apriori algorithm in tests with different support counts.

Keywords: Data mining, Data cutting and sorting method, Apriori algorithm, Metabolic syndrome

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1589
7552 Predictions Using Data Mining and Case-based Reasoning: A Case Study for Retinopathy

Authors: Vimala Balakrishnan, Mohammad R. Shakouri, Hooman Hoodeh, Loo, Huck-Soo

Abstract:

Diabetes is one of the high prevalence diseases worldwide with increased number of complications, with retinopathy as one of the most common one. This paper describes how data mining and case-based reasoning were integrated to predict retinopathy prevalence among diabetes patients in Malaysia. The knowledge base required was built after literature reviews and interviews with medical experts. A total of 140 diabetes patients- data were used to train the prediction system. A voting mechanism selects the best prediction results from the two techniques used. It has been successfully proven that both data mining and case-based reasoning can be used for retinopathy prediction with an improved accuracy of 85%.

Keywords: Case-Based Reasoning, Data Mining, Prediction, Retinopathy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3025
7551 A Data Mining Model for Detecting Financial and Operational Risk Indicators of SMEs

Authors: Ali Serhan Koyuncugil, Nermin Ozgulbas

Abstract:

In this paper, a data mining model to SMEs for detecting financial and operational risk indicators by data mining is presenting. The identification of the risk factors by clarifying the relationship between the variables defines the discovery of knowledge from the financial and operational variables. Automatic and estimation oriented information discovery process coincides the definition of data mining. During the formation of model; an easy to understand, easy to interpret and easy to apply utilitarian model that is far from the requirement of theoretical background is targeted by the discovery of the implicit relationships between the data and the identification of effect level of every factor. In addition, this paper is based on a project which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK).

Keywords: Risk Management, Financial Risk, Operational Risk, Financial Early Warning System, Data Mining, CHAID Decision Tree Algorithm, SMEs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3125
7550 A Network Traffic Prediction Algorithm Based On Data Mining Technique

Authors: D. Prangchumpol

Abstract:

This paper is a description approach to predict incoming and outgoing data rate in network system by using association rule discover, which is one of the data mining techniques. Information of incoming and outgoing data in each times and network bandwidth are network performance parameters, which needed to solve in the traffic problem. Since congestion and data loss are important network problems. The result of this technique can predicted future network traffic. In addition, this research is useful for network routing selection and network performance improvement.

Keywords: Traffic prediction, association rule, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3672
7549 Application of Association Rule Mining in Supplier Selection Criteria

Authors: A. Haery, N. Salmasi, M. Modarres Yazdi, H. Iranmanesh

Abstract:

In this paper the application of rule mining in order to review the effective factors on supplier selection is reviewed in the following three sections 1) criteria selecting and information gathering 2) performing association rule mining 3) validation and constituting rule base. Afterwards a few of applications of rule base is explained. Then, a numerical example is presented and analyzed by Clementine software. Some of extracted rules as well as the results are presented at the end.

Keywords: Association rule mining, data mining, supplierselection criteria.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1925
7548 Model Discovery and Validation for the Qsar Problem using Association Rule Mining

Authors: Luminita Dumitriu, Cristina Segal, Marian Craciun, Adina Cocu, Lucian P. Georgescu

Abstract:

There are several approaches in trying to solve the Quantitative 1Structure-Activity Relationship (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. Predictive data mining techniques use either neural networks, or genetic programming, or neuro-fuzzy knowledge. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.

Keywords: association rules, classification, data mining, Quantitative Structure - Activity Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1790
7547 Application of Neural Networks in Financial Data Mining

Authors: Defu Zhang, Qingshan Jiang, Xin Li

Abstract:

This paper deals with the application of a well-known neural network technique, multilayer back-propagation (BP) neural network, in financial data mining. A modified neural network forecasting model is presented, and an intelligent mining system is developed. The system can forecast the buying and selling signs according to the prediction of future trends to stock market, and provide decision-making for stock investors. The simulation result of seven years to Shanghai Composite Index shows that the return achieved by this mining system is about three times as large as that achieved by the buy and hold strategy, so it is advantageous to apply neural networks to forecast financial time series, the different investors could benefit from it.

Keywords: Data mining, neural network, stock forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3591
7546 Revised PLWAP Tree with Non-frequent Items for Mining Sequential Pattern

Authors: R. Vishnu Priya, A. Vadivel

Abstract:

Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.

Keywords: Sequential pattern mining, weblog, frequent and non-frequent items, incremental and interactive mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1932
7545 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

Authors: Mahdi Esmaeili, Mansour Tarafdar

Abstract:

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479
7544 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: Clustering algorithms, coastal engineering, data mining, data summarization, statistical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1246
7543 Using Data Mining for Learning and Clustering FCM

Authors: Somayeh Alizadeh, Mehdi Ghazanfari, Mohammad Fathian

Abstract:

Fuzzy Cognitive Maps (FCMs) have successfully been applied in numerous domains to show relations between essential components. In some FCM, there are more nodes, which related to each other and more nodes means more complex in system behaviors and analysis. In this paper, a novel learning method used to construct FCMs based on historical data and by using data mining and DEMATEL method, a new method defined to reduce nodes number. This method cluster nodes in FCM based on their cause and effect behaviors.

Keywords: Clustering, Data Mining, Fuzzy Cognitive Map(FCM), Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2018
7542 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: A classifier, Algorithms decision tree, knowledge extraction, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870
7541 Business-Intelligence Mining of Large Decentralized Multimedia Datasets with a Distributed Multi-Agent System

Authors: Karima Qayumi, Alex Norta

Abstract:

The rapid generation of high volume and a broad variety of data from the application of new technologies pose challenges for the generation of business-intelligence. Most organizations and business owners need to extract data from multiple sources and apply analytical methods for the purposes of developing their business. Therefore, the recently decentralized data management environment is relying on a distributed computing paradigm. While data are stored in highly distributed systems, the implementation of distributed data-mining techniques is a challenge. The aim of this technique is to gather knowledge from every domain and all the datasets stemming from distributed resources. As agent technologies offer significant contributions for managing the complexity of distributed systems, we consider this for next-generation data-mining processes. To demonstrate agent-based business intelligence operations, we use agent-oriented modeling techniques to develop a new artifact for mining massive datasets.

Keywords: Agent-oriented modeling, business Intelligence management, distributed data mining, multi-agent system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1375
7540 Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study

Authors: Faisal Aburub, Wael Hadi

Abstract:

Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.

Keywords: Classification, data mining, evaluation measures, groundwater.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2597
7539 Data Mining in Oral Medicine Using Decision Trees

Authors: Fahad Shahbaz Khan, Rao Muhammad Anwer, Olof Torgersson, Göran Falkman

Abstract:

Data mining has been used very frequently to extract hidden information from large databases. This paper suggests the use of decision trees for continuously extracting the clinical reasoning in the form of medical expert-s actions that is inherent in large number of EMRs (Electronic Medical records). In this way the extracted data could be used to teach students of oral medicine a number of orderly processes for dealing with patients who represent with different problems within the practice context over time.

Keywords: Data mining, Oral Medicine, Decision Trees, WEKA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2505
7538 Data Mining Using Learning Automata

Authors: M. R. Aghaebrahimi, S. H. Zahiri, M. Amiri

Abstract:

In this paper a data miner based on the learning automata is proposed and is called LA-miner. The LA-miner extracts classification rules from data sets automatically. The proposed algorithm is established based on the function optimization using learning automata. The experimental results on three benchmarks indicate that the performance of the proposed LA-miner is comparable with (sometimes better than) the Ant-miner (a data miner algorithm based on the Ant Colony optimization algorithm) and CNZ (a well-known data mining algorithm for classification).

Keywords: Data mining, Learning automata, Classification rules, Knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1936
7537 Analysis of DNA Microarray Data using Association Rules: A Selective Study

Authors: M. Anandhavalli Gauthaman

Abstract:

DNA microarrays allow the measurement of expression levels for a large number of genes, perhaps all genes of an organism, within a number of different experimental samples. It is very much important to extract biologically meaningful information from this huge amount of expression data to know the current state of the cell because most cellular processes are regulated by changes in gene expression. Association rule mining techniques are helpful to find association relationship between genes. Numerous association rule mining algorithms have been developed to analyze and associate this huge amount of gene expression data. This paper focuses on some of the popular association rule mining algorithms developed to analyze gene expression data.

Keywords: DNA microarray, gene expression, association rule mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2150
7536 Using Data Clustering in Oral Medicine

Authors: Fahad Shahbaz Khan, Rao Muhammad Anwer, Olof Torgersson

Abstract:

The vast amount of information hidden in huge databases has created tremendous interests in the field of data mining. This paper examines the possibility of using data clustering techniques in oral medicine to identify functional relationships between different attributes and classification of similar patient examinations. Commonly used data clustering algorithms have been reviewed and as a result several interesting results have been gathered.

Keywords: Oral Medicine, Cluto, Data Clustering, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1981
7535 Explorative Data Mining of Constructivist Learning Experiences and Activities with Multiple Dimensions

Authors: Patrick Wessa, Bart Baesens

Abstract:

This paper discusses the use of explorative data mining tools that allow the educator to explore new relationships between reported learning experiences and actual activities, even if there are multiple dimensions with a large number of measured items. The underlying technology is based on the so-called Compendium Platform for Reproducible Computing (http://www.freestatistics.org) which was built on top the computational R Framework (http://www.wessa.net).

Keywords: Reproducible computing, data mining, explorative data analysis, compendium technology, computer assisted education

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1255
7534 Incremental Mining of Shocking Association Patterns

Authors: Eiad Yafi, Ahmed Sultan Al-Hegami, M. A. Alam, Ranjit Biswas

Abstract:

Association rules are an important problem in data mining. Massively increasing volume of data in real life databases has motivated researchers to design novel and incremental algorithms for association rules mining. In this paper, we propose an incremental association rules mining algorithm that integrates shocking interestingness criterion during the process of building the model. A new interesting measure called shocking measure is introduced. One of the main features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The incremental model that reflects the changing data and the user beliefs is attractive in order to make the over all KDD process more effective and efficient. We implemented the proposed approach and experiment it with some public datasets and found the results quite promising.

Keywords: Knowledge discovery in databases (KDD), Data mining, Incremental Association rules, Domain knowledge, Interestingness, Shocking rules (SHR).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1867
7533 Modified Data Mining Approach for Defective Diagnosis in Hard Disk Drive Industry

Authors: S. Soommat, S. Patamatamkul, T. Prempridi, M. Sritulyachot, P. Ineure, S. Yimman

Abstract:

Currently, slider process of Hard Disk Drive Industry become more complex, defective diagnosis for yield improvement becomes more complicated and time-consumed. Manufacturing data analysis with data mining approach is widely used for solving that problem. The existing mining approach from combining of the KMean clustering, the machine oriented Kruskal-Wallis test and the multivariate chart were applied for defective diagnosis but it is still be a semiautomatic diagnosis system. This article aims to modify an algorithm to support an automatic decision for the existing approach. Based on the research framework, the new approach can do an automatic diagnosis and help engineer to find out the defective factors faster than the existing approach about 50%.

Keywords: Slider process, Defective diagnosis and Data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1199
7532 FCA-based Conceptual Knowledge Discovery in Folksonomy

Authors: Yu-Kyung Kang, Suk-Hyung Hwang, Kyoung-Mo Yang

Abstract:

The tagging data of (users, tags and resources) constitutes a folksonomy that is the user-driven and bottom-up approach to organizing and classifying information on the Web. Tagging data stored in the folksonomy include a lot of very useful information and knowledge. However, appropriate approach for analyzing tagging data and discovering hidden knowledge from them still remains one of the main problems on the folksonomy mining researches. In this paper, we have proposed a folksonomy data mining approach based on FCA for discovering hidden knowledge easily from folksonomy. Also we have demonstrated how our proposed approach can be applied in the collaborative tagging system through our experiment. Our proposed approach can be applied to some interesting areas such as social network analysis, semantic web mining and so on.

Keywords: Folksonomy data mining, formal concept analysis, collaborative tagging, conceptual knowledge discovery, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2031
7531 Auto Classification for Search Intelligence

Authors: Lilac A. E. Al-Safadi

Abstract:

This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.

Keywords: Information Processing on the Web, Data Mining, Document Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1619
7530 W3-Miner: Mining Weighted Frequent Subtree Patterns in a Collection of Trees

Authors: R. AliMohammadzadeh, M. Haghir Chehreghani, A. Zarnani, M. Rahgozar

Abstract:

Mining frequent tree patterns have many useful applications in XML mining, bioinformatics, network routing, etc. Most of the frequent subtree mining algorithms (i.e. FREQT, TreeMiner and CMTreeMiner) use anti-monotone property in the phase of candidate subtree generation. However, none of these algorithms have verified the correctness of this property in tree structured data. In this research it is shown that anti-monotonicity does not generally hold, when using weighed support in tree pattern discovery. As a result, tree mining algorithms that are based on this property would probably miss some of the valid frequent subtree patterns in a collection of trees. In this paper, we investigate the correctness of anti-monotone property for the problem of weighted frequent subtree mining. In addition we propose W3-Miner, a new algorithm for full extraction of frequent subtrees. The experimental results confirm that W3-Miner finds some frequent subtrees that the previously proposed algorithms are not able to discover.

Keywords: Semi-Structured Data Mining, Anti-Monotone Property, Trees.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1383
7529 Generator of Hypotheses an Approach of Data Mining Based on Monotone Systems Theory

Authors: Rein Kuusik, Grete Lind

Abstract:

Generator of hypotheses is a new method for data mining. It makes possible to classify the source data automatically and produces a particular enumeration of patterns. Pattern is an expression (in a certain language) describing facts in a subset of facts. The goal is to describe the source data via patterns and/or IF...THEN rules. Used evaluation criteria are deterministic (not probabilistic). The search results are trees - form that is easy to comprehend and interpret. Generator of hypotheses uses very effective algorithm based on the theory of monotone systems (MS) named MONSA (MONotone System Algorithm).

Keywords: data mining, monotone systems, pattern, rule.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1258
7528 An Application of the Data Mining Methods with Decision Rule

Authors: Xun Ge, Jianhua Gong

Abstract:

 

ankings for output of Chinese main agricultural commodity in the world for 1978, 1980, 1990, 2000, 2006, 2007 and 2008 have been released in United Nations FAO Database. Unfortunately, where the ranking of output of Chinese cotton lint in the world for 2008 was missed. This paper uses sequential data mining methods with decision rules filling this gap. This new data mining method will be help to give a further improvement for United Nations FAO Database.

Keywords: Ranking, output of the main agricultural commodity, gross domestic product, decision table, information system, data mining, decision rule

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1711