Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2073

Search results for: Web Content Mining

2073 A Comparative Analysis of Different Web Content Mining Tools

Authors: T. Suresh Kumar, M. Arthanari, N. Shanthi

Abstract:

Nowadays, the Web has become one of the most pervasive platforms for information change and retrieval. It collects the suitable and perfectly fitting information from websites that one requires. Data mining is the form of extracting data’s available in the internet. Web mining is one of the elements of data mining Technique, which relates to various research communities such as information recovery, folder managing system and simulated intellects. In this Paper we have discussed the concepts of Web mining. We contain generally focused on one of the categories of Web mining, specifically the Web Content Mining and its various farm duties. The mining tools are imperative to scanning the many images, text, and HTML documents and then, the result is used by the various search engines. We conclude by presenting a comparative table of these tools based on some pertinent criteria.

Keywords: Data Mining, Web Mining, Web Content Mining, Mining Tools, Information retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3058
2072 Deep Web Content Mining

Authors: Shohreh Ajoudanian, Mohammad Davarpanah Jazi

Abstract:

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.

Keywords: Content mining, complex matching, correlation mining, information extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2038
2071 Signed Approach for Mining Web Content Outliers

Authors: G. Poonkuzhali, K.Thiagarajan, K.Sarukesi, G.V.Uma

Abstract:

The emergence of the Internet has brewed the revolution of information storage and retrieval. As most of the data in the web is unstructured, and contains a mix of text, video, audio etc, there is a need to mine information to cater to the specific needs of the users without loss of important hidden information. Thus developing user friendly and automated tools for providing relevant information quickly becomes a major challenge in web mining research. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent ones that are likely to contain outlying data such as noise, irrelevant and redundant data. This paper mainly focuses on Signed approach and full word matching on the organized domain dictionary for mining web content outliers. This Signed approach gives the relevant web documents as well as outlying web documents. As the dictionary is organized based on the number of characters in a word, searching and retrieval of documents takes less time and less space.

Keywords: Outliers, Relevant document, , Signed Approach, Web content mining, Web documents..

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2085
2070 A Web Text Mining Flexible Architecture

Authors: M. Castellano, G. Mastronardi, A. Aprile, G. Tarricone

Abstract:

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.

Keywords: Web text mining, flexible architecture, knowledgediscovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2364
2069 Elimination of Redundant Links in Web Pages– Mathematical Approach

Authors: G. Poonkuzhali, K.Thiagarajan, K.Sarukesi

Abstract:

With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent one that are likely to contain the outlying data such as noise, irrelevant and redundant data. This paper proposes new algorithm for mining the web content by detecting the redundant links from the web documents using set theoretical(classical mathematics) such as subset, union, intersection etc,. Then the redundant links is removed from the original web content to get the required information by the user..

Keywords: Web documents, Web content mining, redundantlink, outliers, set theory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1573
2068 Web Content Mining: A Solution to Consumer's Product Hunt

Authors: Syed Salman Ahmed, Zahid Halim, Rauf Baig, Shariq Bashir

Abstract:

With the rapid growth in business size, today's businesses orient towards electronic technologies. Amazon.com and e-bay.com are some of the major stakeholders in this regard. Unfortunately the enormous size and hugely unstructured data on the web, even for a single commodity, has become a cause of ambiguity for consumers. Extracting valuable information from such an everincreasing data is an extremely tedious task and is fast becoming critical towards the success of businesses. Web content mining can play a major role in solving these issues. It involves using efficient algorithmic techniques to search and retrieve the desired information from a seemingly impossible to search unstructured data on the Internet. Application of web content mining can be very encouraging in the areas of Customer Relations Modeling, billing records, logistics investigations, product cataloguing and quality management. In this paper we present a review of some very interesting, efficient yet implementable techniques from the field of web content mining and study their impact in the area specific to business user needs focusing both on the customer as well as the producer. The techniques we would be reviewing include, mining by developing a knowledge-base repository of the domain, iterative refinement of user queries for personalized search, using a graphbased approach for the development of a web-crawler and filtering information for personalized search using website captions. These techniques have been analyzed and compared on the basis of their execution time and relevance of the result they produced against a particular search.

Keywords: Data mining, web mining, search engines, knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1819
2067 A Tree Based Association Rule Approach for XML Data with Semantic Integration

Authors: D. Sasikala, K. Premalatha

Abstract:

The use of eXtensible Markup Language (XML) in web, business and scientific databases lead to the development of methods, techniques and systems to manage and analyze XML data. Semi-structured documents suffer due to its heterogeneity and dimensionality. XML structure and content mining represent convergence for research in semi-structured data and text mining. As the information available on the internet grows drastically, extracting knowledge from XML documents becomes a harder task. Certainly, documents are often so large that the data set returned as answer to a query may also be very big to convey the required information. To improve the query answering, a Semantic Tree Based Association Rule (STAR) mining method is proposed. This method provides intentional information by considering the structure, content and the semantics of the content. The method is applied on Reuter’s dataset and the results show that the proposed method outperforms well.

Keywords: Semi--structured Document, Tree based Association Rule (TAR), Semantic Association Rule Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2063
2066 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1764
2065 A Study of Soil Heavy Metal Pollution in the Manganese Mining in Drama, Greece

Authors: A. Argiri, A. Molla, Tzouvalekas, E. Skoufogianni, N. Danalatos

Abstract:

The release of heavy metals into the environment has increased over the last years. In this study, 25 soil samples (0-15 cm) from the fields near the mining area in Drama region were selected. The samples were analyzed in the laboratory for their physicochemical properties and for seven “pseudo-total’’ heavy metals content, namely Pb, Zn, Cd, Cr, Cu, Ni, and Mn. The total metal concentrations (Pb, Zn, Cd, Cr, Cu, Ni and Mn) in digests were determined by using the atomic absorption spectrophotometer. According to the results, the mean concentration of the listed heavy metals in 25 soil samples are Cd 1.1 mg/kg, Cr 15 mg/kg, Cu 21.7 mg/kg, Ni 30.1 mg/kg, Pd 50.8 mg/kg, Zn 99.5 mg/kg and Mn 815.3 mg/kg. The results show that the heavy metals remain in the soil even if the mining closed many years ago.

Keywords: Greece, heavy metals, mining, pollution

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 85
2064 Integrated Method for Detection of Unknown Steganographic Content

Authors: Magdalena Pejas

Abstract:

This article concerns the presentation of an integrated method for detection of steganographic content embedded by new unknown programs. The method is based on data mining and aggregated hypothesis testing. The article contains the theoretical basics used to deploy the proposed detection system and the description of improvement proposed for the basic system idea. Further main results of experiments and implementation details are collected and described. Finally example results of the tests are presented.

Keywords: Steganography, steganalysis, data embedding, data mining, feature extraction, knowledge base, system learning, hypothesis testing, error estimation, black box program, file structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1313
2063 Web Application to Profiling Scientific Institutions through Citation Mining

Authors: Hector D. Cortes, Jesus A. del Rio, Esther O. Garcia, Miguel Robles

Abstract:

Recently the use of data mining to scientific bibliographic data bases has been implemented to analyze the pathways of the knowledge or the core scientific relevances of a laureated novel or a country. This specific case of data mining has been named citation mining, and it is the integration of citation bibliometrics and text mining. In this paper we present an improved WEB implementation of statistical physics algorithms to perform the text mining component of citation mining. In particular we use an entropic like distance between the compression of text as an indicator of the similarity between them. Finally, we have included the recently proposed index h to characterize the scientific production. We have used this web implementation to identify users, applications and impact of the Mexican scientific institutions located in the State of Morelos.

Keywords: Citation Mining, Text Mining, Science Impact

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479
2062 Survey on Image Mining Using Genetic Algorithm

Authors: Jyoti Dua

Abstract:

One image is worth more than thousand words. Images if analyzed can reveal useful information. Low level image processing deals with the extraction of specific feature from a single image. Now the question arises: What technique should be used to extract patterns of very large and detailed image database? The answer of the question is: “Image Mining”. Image Mining deals with the extraction of image data relationship, implicit knowledge, and another pattern from the collection of images or image database. It is nothing but the extension of Data Mining. In the following paper, not only we are going to scrutinize the current techniques of image mining but also present a new technique for mining images using Genetic Algorithm.

Keywords: Image Mining, Data Mining, Genetic Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1819
2061 Knowledge Mining in Web-based Learning Environments

Authors: Nittaya Kerdprasop, Kittisak Kerdprasop

Abstract:

The state of the art in instructional design for computer-assisted learning has been strongly influenced by advances in information technology, Internet and Web-based systems. The emphasis of educational systems has shifted from training to learning. The course delivered has also been changed from large inflexible content to sequential small chunks of learning objects. The concepts of learning objects together with the advanced technologies of Web and communications support the reusability, interoperability, and accessibility design criteria currently exploited by most learning systems. These concepts enable just-in-time learning. We propose to extend theses design criteria further to include the learnability concept that will help adapting content to the needs of learners. The learnability concept offers a better personalization leading to the creation and delivery of course content more appropriate to performance and interest of each learner. In this paper we present a new framework of learning environments containing knowledge discovery as a tool to automatically learn patterns of learning behavior from learners' profiles and history.

Keywords: Knowledge mining, Web-based learning, Learning environments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1494
2060 A Web Designer Agent, Based On Usage Mining Online Behavior of Visitors

Authors: Babak Abedin, Babak Sohrabi

Abstract:

Website plays a significant role in success of an e-business. It is the main start point of any organization and corporation for its customers, so it's important to customize and design it according to the visitors' preferences. Also, websites are a place to introduce services of an organization and highlight new service to the visitors and audiences. In this paper, we will use web usage mining techniques, as a new field of research in data mining and knowledge discovery, in an Iranian government website. Using the results, a framework for web content layour is proposed. An agent is designed to dynamically update and improve web links locations and layout. Then, we will explain how it is used to directly enable top managers of the organization to influence on the arrangement of web contents and also to enhance customization of web site navigation due to online users' behaviors.

Keywords: Web usage mining, website design, agent, website customization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673
2059 Concurrency in Web Access Patterns Mining

Authors: Jing Lu, Malcolm Keech, Weiru Chen

Abstract:

Web usage mining is an interesting application of data mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the key challenges for web access patterns mining is tackling the problem of mining richly structured patterns. This paper proposes a novel model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the search for and representation of concurrency between access patterns on the web. From experiments conducted on large-scale synthetic sequence data as well as real web access data, it is demonstrated that CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.

Keywords: concurrent access patterns (CAP), CAP mining and modelling, CAP-Graph, web access patterns (WAP), WAP-Graph, Web usage mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1434
2058 A Distributed Approach to Extract High Utility Itemsets from XML Data

Authors: S. Kannimuthu, K. Premalatha

Abstract:

This paper investigates a new data mining capability that entails mining of High Utility Itemsets (HUI) in a distributed environment. Existing research in data mining deals with only presence or absence of an items and do not consider the semantic measures like weight or cost of the items. Thus, HUI mining algorithm has evolved. HUI mining is the one kind of utility mining concept, aims to identify itemsets whose utility satisfies a given threshold. Although, the approach of mining HUIs in a distributed environment and mining of the same from XML data have not explored yet. In this work, a novel approach is proposed to mine HUIs from the XML based data in a distributed environment. This work utilizes Service Oriented Computing (SOC) paradigm which provides Knowledge as a Service (KaaS). The interesting patterns are provided via the web services with the help of knowledge server to answer the queries of the consumers. The performance of the approach is evaluated on various databases using execution time and memory consumption.

Keywords: Data mining, Knowledge as a Service, service oriented computing, utility mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2166
2057 Semantically Enriched Web Usage Mining for Personalization

Authors: Suresh Shirgave, Prakash Kulkarni, José Borges

Abstract:

The continuous growth in the size of the World Wide Web has resulted in intricate Web sites, demanding enhanced user skills and more sophisticated tools to help the Web user to find the desired information. In order to make Web more user friendly, it is necessary to provide personalized services and recommendations to the Web user. For discovering interesting and frequent navigation patterns from Web server logs many Web usage mining techniques have been applied. The recommendation accuracy of usage based techniques can be improved by integrating Web site content and site structure in the personalization process.

Herein, we propose semantically enriched Web Usage Mining method for Personalization (SWUMP), an extension to solely usage based technique. This approach is a combination of the fields of Web Usage Mining and Semantic Web. In the proposed method, we envisage enriching the undirected graph derived from usage data with rich semantic information extracted from the Web pages and the Web site structure. The experimental results show that the SWUMP generates accurate recommendations and is able to achieve 10-20% better accuracy than the solely usage based model. The SWUMP addresses the new item problem inherent to solely usage based techniques.

Keywords: Prediction, Recommendation, Semantic Web Usage Mining, Web Usage Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2754
2056 A Review: Comparative Study of Diverse Collection of Data Mining Tools

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

There have been a lot of efforts and researches undertaken in developing efficient tools for performing several tasks in data mining. Due to the massive amount of information embedded in huge data warehouses maintained in several domains, the extraction of meaningful pattern is no longer feasible. This issue turns to be more obligatory for developing several tools in data mining. Furthermore the major aspire of data mining software is to build a resourceful predictive or descriptive model for handling large amount of information more efficiently and user friendly. Data mining mainly contracts with excessive collection of data that inflicts huge rigorous computational constraints. These out coming challenges lead to the emergence of powerful data mining technologies. In this survey a diverse collection of data mining tools are exemplified and also contrasted with the salient features and performance behavior of each tool.

Keywords: Business Analytics, Data Mining, Data Analysis, Machine Learning, Text Mining, Predictive Analytics, Visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3082
2055 Evaluating 8D Reports Using Text-Mining

Authors: Benjamin Kuester, Bjoern Eilert, Malte Stonis, Ludger Overmeyer

Abstract:

Increasing quality requirements make reliable and effective quality management indispensable. This includes the complaint handling in which the 8D method is widely used. The 8D report as a written documentation of the 8D method is one of the key quality documents as it internally secures the quality standards and acts as a communication medium to the customer. In practice, however, the 8D report is mostly faulty and of poor quality. There is no quality control of 8D reports today. This paper describes the use of natural language processing for the automated evaluation of 8D reports. Based on semantic analysis and text-mining algorithms the presented system is able to uncover content and formal quality deficiencies and thus increases the quality of the complaint processing in the long term.

Keywords: 8D report, complaint management, evaluation system, text-mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 736
2054 Content Based Sampling over Transactional Data Streams

Authors: Mansour Tarafdar, Mohammad Saniee Abade

Abstract:

This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.

Keywords: Sampling, data streams, closed frequent item set mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457
2053 Powerful Tool to Expand Business Intelligence: Text Mining

Authors: Li Gao, Elizabeth Chang, Song Han

Abstract:

With the extensive inclusion of document, especially text, in the business systems, data mining does not cover the full scope of Business Intelligence. Data mining cannot deliver its impact on extracting useful details from the large collection of unstructured and semi-structured written materials based on natural languages. The most pressing issue is to draw the potential business intelligence from text. In order to gain competitive advantages for the business, it is necessary to develop the new powerful tool, text mining, to expand the scope of business intelligence. In this paper, we will work out the strong points of text mining in extracting business intelligence from huge amount of textual information sources within business systems. We will apply text mining to each stage of Business Intelligence systems to prove that text mining is the powerful tool to expand the scope of BI. After reviewing basic definitions and some related technologies, we will discuss the relationship and the benefits of these to text mining. Some examples and applications of text mining will also be given. The motivation behind is to develop new approach to effective and efficient textual information analysis. Thus we can expand the scope of Business Intelligence using the powerful tool, text mining.

Keywords: Business intelligence, document warehouse, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2373
2052 Application of Association Rule Mining in Supplier Selection Criteria

Authors: A. Haery, N. Salmasi, M. Modarres Yazdi, H. Iranmanesh

Abstract:

In this paper the application of rule mining in order to review the effective factors on supplier selection is reviewed in the following three sections 1) criteria selecting and information gathering 2) performing association rule mining 3) validation and constituting rule base. Afterwards a few of applications of rule base is explained. Then, a numerical example is presented and analyzed by Clementine software. Some of extracted rules as well as the results are presented at the end.

Keywords: Association rule mining, data mining, supplierselection criteria.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646
2051 Preliminary Overview of Data Mining Technology for Knowledge Management System in Institutions of Higher Learning

Authors: Muslihah Wook, Zawiyah M. Yusof, Mohd Zakree Ahmad Nazri

Abstract:

Data mining has been integrated into application systems to enhance the quality of the decision-making process. This study aims to focus on the integration of data mining technology and Knowledge Management System (KMS), due to the ability of data mining technology to create useful knowledge from large volumes of data. Meanwhile, KMS vitally support the creation and use of knowledge. The integration of data mining technology and KMS are popularly used in business for enhancing and sustaining organizational performance. However, there is a lack of studies that applied data mining technology and KMS in the education sector; particularly students- academic performance since this could reflect the IHL performance. Realizing its importance, this study seeks to integrate data mining technology and KMS to promote an effective management of knowledge within IHLs. Several concepts from literature are adapted, for proposing the new integrative data mining technology and KMS framework to an IHL.

Keywords: Data mining, Institutions of Higher Learning, Knowledge Management System, Students' academic performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1903
2050 A Hybrid Recommendation System Based On Association Rules

Authors: Ahmed Mohammed K. Alsalama

Abstract:

Recommendation systems are widely used in e-commerce applications. The engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose1 a hybrid framework recommendation system to be applied on two dimensional spaces (User × Item) with a large number of Users and a small number of Items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.

Keywords: Data Mining, Association Rules, Recommendation Systems, Hybrid Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3659
2049 A Multi-Agent Framework for Data Mining

Authors: Kamal Ali Albashiri, Khaled Ahmed Kadouh

Abstract:

A generic and extendible Multi-Agent Data Mining (MADM) framework, MADMF (the Multi-Agent Data Mining Framework) is described. The central feature of the framework is that it avoids the use of agreed meta-language formats by supporting a framework of wrappers. The advantage offered is that the framework is easily extendible, so that further data agents and mining agents can simply be added to the framework. A demonstration MADMF framework is currently available. The paper includes details of the MADMF architecture and the wrapper principle incorporated into it. A full description and evaluation of the framework-s operation is provided by considering two MADM scenarios.

Keywords: Multi-Agent Data Mining (MADM), Frequent Itemsets, Meta ARM, Association Rule Mining, Classifier generator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1759
2048 A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data

Authors: H. Baazaoui Zghal, S. Faiz, H. Ben Ghezala

Abstract:

Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.

Keywords: Databases, data mining, multi-agent, spatial datamart.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1813
2047 About Methods of Additional Mining Pressure Figuring while Reconstruction of Tunnels

Authors: M. Moistsrapishvili, I. Ugrekhelidze, T. Baramashvili, D. Malaghuradze

Abstract:

At the end of the 20th century it was actual the development of transport corridors and the improvement of their technical parameters. With this purpose, many countries and Georgia among them manufacture to construct new highways, railways and also reconstruction-modernization of the existing transport infrastructure. It is necessary to explore the artificial structures (bridges and tunnels) on the existing tracks as they are very old. Conference report includes the peculiarities of reconstruction of tunnels, because we think that this theme is important for the modernization of the existing road infrastructure. We must remark that the methods of determining mining pressure of tunnel reconstructions are worked out according to the jobs of new tunnels but it is necessary to foresee additional mining pressure which will be formed during their reconstruction. In this report there are given the methods of figuring the additional mining pressure while reconstruction of tunnels, there was worked out the computer program, it is determined that during reconstruction of tunnels the additional mining pressure is 1/3rd of main mining pressure.

Keywords: Mining pressure, Reconstruction of tunnels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1416
2046 Object-Centric Process Mining Using Process Cubes

Authors: Anahita Farhang Ghahfarokhi, Alessandro Berti, Wil M.P. van der Aalst

Abstract:

Process mining provides ways to analyze business processes. Common process mining techniques consider the process as a whole. However, in real-life business processes, different behaviors exist that make the overall process too complex to interpret. Process comparison is a branch of process mining that isolates different behaviors of the process from each other by using process cubes. Process cubes organize event data using different dimensions. Each cell contains a set of events that can be used as an input to apply process mining techniques. Existing work on process cubes assume single case notions. However, in real processes, several case notions (e.g., order, item, package, etc.) are intertwined. Object-centric process mining is a new branch of process mining addressing multiple case notions in a process. To make a bridge between object-centric process mining and process comparison, we propose a process cube framework, which supports process cube operations such as slice and dice on object-centric event logs. To facilitate the comparison, the framework is integrated with several object-centric process discovery approaches.

Keywords: Process mining, multidimensional process mining, multi-perspective business processes, OLAP, process cubes, process discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 481
2045 Revised PLWAP Tree with Non-frequent Items for Mining Sequential Pattern

Authors: R. Vishnu Priya, A. Vadivel

Abstract:

Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.

Keywords: Sequential pattern mining, weblog, frequent and non-frequent items, incremental and interactive mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
2044 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: Rough-sets, Classification, Feature Selection, Entropy, Outliers, Frequent itemset mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2174