Search results for: web structure mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3098

Search results for: web structure mining

3098 Bottom Up Text Mining through Hierarchical Document Representation

Authors: Y. Djouadi., F. Souam.

Abstract:

Most of the existing text mining approaches are proposed, keeping in mind, transaction databases model. Thus, the mined dataset is structured using just one concept: the “transaction", whereas the whole dataset is modeled using the “set" abstract type. In such cases, the structure of the whole dataset and the relationships among the transactions themselves are not modeled and consequently, not considered in the mining process. We believe that taking into account structure properties of hierarchically structured information (e.g. textual document, etc ...) in the mining process, can leads to best results. For this purpose, an hierarchical associations rule mining approach for textual documents is proposed in this paper and the classical set-oriented mining approach is reconsidered profits to a Direct Acyclic Graph (DAG) oriented approach. Natural languages processing techniques are used in order to obtain the DAG structure. Based on this graph model, an hierarchical bottom up algorithm is proposed. The main idea is that each node is mined with its parent node.

Keywords: Graph based association rules mining, Hierarchical document structure, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2009
3097 A Comparative Study of Page Ranking Algorithms for Information Retrieval

Authors: Ashutosh Kumar Singh, Ravi Kumar P

Abstract:

This paper gives an introduction to Web mining, then describes Web Structure mining in detail, and explores the data structure used by the Web. This paper also explores different Page Rank algorithms and compare those algorithms used for Information Retrieval. In Web Mining, the basics of Web mining and the Web mining categories are explained. Different Page Rank based algorithms like PageRank (PR), WPR (Weighted PageRank), HITS (Hyperlink-Induced Topic Search), DistanceRank and DirichletRank algorithms are discussed and compared. PageRanks are calculated for PageRank and Weighted PageRank algorithms for a given hyperlink structure. Simulation Program is developed for PageRank algorithm because PageRank is the only ranking algorithm implemented in the search engine (Google). The outputs are shown in a table and chart format.

Keywords: Web Mining, Web Structure, Web Graph, LinkAnalysis, PageRank, Weighted PageRank, HITS, DistanceRank, DirichletRank,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2770
3096 A New Model for Discovering XML Association Rules from XML Documents

Authors: R. AliMohammadzadeh, M. Rahgozar, A. Zarnani

Abstract:

The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. In this paper, we propose a new model for the effective extraction of generalized association rules form a XML document collection. We directly use frequent subtree mining techniques in the discovery process and do not ignore the tree structure of data in the final rules. The frequent subtrees based on the user provided support are split to complement subtrees to form the rules. We explain our model within multi-steps from data preparation to rule generation.

Keywords: XML, Data Mining, Association Rule Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
3095 A Web Text Mining Flexible Architecture

Authors: M. Castellano, G. Mastronardi, A. Aprile, G. Tarricone

Abstract:

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.

Keywords: Web text mining, flexible architecture, knowledgediscovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2613
3094 A Comparative Analysis of Different Web Content Mining Tools

Authors: T. Suresh Kumar, M. Arthanari, N. Shanthi

Abstract:

Nowadays, the Web has become one of the most pervasive platforms for information change and retrieval. It collects the suitable and perfectly fitting information from websites that one requires. Data mining is the form of extracting data’s available in the internet. Web mining is one of the elements of data mining Technique, which relates to various research communities such as information recovery, folder managing system and simulated intellects. In this Paper we have discussed the concepts of Web mining. We contain generally focused on one of the categories of Web mining, specifically the Web Content Mining and its various farm duties. The mining tools are imperative to scanning the many images, text, and HTML documents and then, the result is used by the various search engines. We conclude by presenting a comparative table of these tools based on some pertinent criteria.

Keywords: Data Mining, Web Mining, Web Content Mining, Mining Tools, Information retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3505
3093 Large-Dimensional Shells under Mining Tremors from Various Mining Regions in Poland

Authors: Joanna M. Dulińska, Maria Fabijańska

Abstract:

In the paper a detailed analysis of the dynamic response of a cooling tower shell to mining tremors originated from two main regions of mining activity in Poland (Upper Silesian Coal Basin and Legnica-Glogow Copper District) was presented. The representative time histories registered in the both regions were used as ground motion data in calculations of the dynamic response of the structure. It was proved that the dynamic response of the shell is strongly dependent not only on the level of vibration amplitudes but on the dominant frequency range of the mining shock typical for the mining region as well. Also a vertical component of vibrations occurred to have considerable influence on the total dynamic response of the shell. Finally, it turned out that non-uniformity of kinematic excitation resulting from spatial variety of ground motion plays a significant role in dynamic analysis of large-dimensional shells under mining shocks.

Keywords: Cooling towers, dynamic response, mining tremors, non-uniform kinematic excitation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1373
3092 Comparative Study of Universities’ Web Structure Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

This paper is meant to analyze the ranking of University of Malaysia Terengganu, UMT’s website in the World Wide Web. There are only few researches have been done on comparing the ranking of universities’ websites so this research will be able to determine whether the existing UMT’s website is serving its purpose which is to introduce UMT to the world. The ranking is based on hub and authority values which are accordance to the structure of the website. These values are computed using two websearching algorithms, HITS and SALSA. Three other universities’ websites are used as the benchmarks which are UM, Harvard and Stanford. The result is clearly showing that more work has to be done on the existing UMT’s website where important pages according to the benchmarks, do not exist in UMT’s pages. The ranking of UMT’s website will act as a guideline for the web-developer to develop a more efficient website.

Keywords: Algorithm, ranking, website, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629
3091 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2418
3090 Semantically Enriched Web Usage Mining for Personalization

Authors: Suresh Shirgave, Prakash Kulkarni, José Borges

Abstract:

The continuous growth in the size of the World Wide Web has resulted in intricate Web sites, demanding enhanced user skills and more sophisticated tools to help the Web user to find the desired information. In order to make Web more user friendly, it is necessary to provide personalized services and recommendations to the Web user. For discovering interesting and frequent navigation patterns from Web server logs many Web usage mining techniques have been applied. The recommendation accuracy of usage based techniques can be improved by integrating Web site content and site structure in the personalization process.

Herein, we propose semantically enriched Web Usage Mining method for Personalization (SWUMP), an extension to solely usage based technique. This approach is a combination of the fields of Web Usage Mining and Semantic Web. In the proposed method, we envisage enriching the undirected graph derived from usage data with rich semantic information extracted from the Web pages and the Web site structure. The experimental results show that the SWUMP generates accurate recommendations and is able to achieve 10-20% better accuracy than the solely usage based model. The SWUMP addresses the new item problem inherent to solely usage based techniques.

Keywords: Prediction, Recommendation, Semantic Web Usage Mining, Web Usage Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2977
3089 A Tree Based Association Rule Approach for XML Data with Semantic Integration

Authors: D. Sasikala, K. Premalatha

Abstract:

The use of eXtensible Markup Language (XML) in web, business and scientific databases lead to the development of methods, techniques and systems to manage and analyze XML data. Semi-structured documents suffer due to its heterogeneity and dimensionality. XML structure and content mining represent convergence for research in semi-structured data and text mining. As the information available on the internet grows drastically, extracting knowledge from XML documents becomes a harder task. Certainly, documents are often so large that the data set returned as answer to a query may also be very big to convey the required information. To improve the query answering, a Semantic Tree Based Association Rule (STAR) mining method is proposed. This method provides intentional information by considering the structure, content and the semantics of the content. The method is applied on Reuter’s dataset and the results show that the proposed method outperforms well.

Keywords: Semi--structured Document, Tree based Association Rule (TAR), Semantic Association Rule Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2292
3088 Web Application to Profiling Scientific Institutions through Citation Mining

Authors: Hector D. Cortes, Jesus A. del Rio, Esther O. Garcia, Miguel Robles

Abstract:

Recently the use of data mining to scientific bibliographic data bases has been implemented to analyze the pathways of the knowledge or the core scientific relevances of a laureated novel or a country. This specific case of data mining has been named citation mining, and it is the integration of citation bibliometrics and text mining. In this paper we present an improved WEB implementation of statistical physics algorithms to perform the text mining component of citation mining. In particular we use an entropic like distance between the compression of text as an indicator of the similarity between them. Finally, we have included the recently proposed index h to characterize the scientific production. We have used this web implementation to identify users, applications and impact of the Mexican scientific institutions located in the State of Morelos.

Keywords: Citation Mining, Text Mining, Science Impact

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1690
3087 Survey on Image Mining Using Genetic Algorithm

Authors: Jyoti Dua

Abstract:

One image is worth more than thousand words. Images if analyzed can reveal useful information. Low level image processing deals with the extraction of specific feature from a single image. Now the question arises: What technique should be used to extract patterns of very large and detailed image database? The answer of the question is: “Image Mining”. Image Mining deals with the extraction of image data relationship, implicit knowledge, and another pattern from the collection of images or image database. It is nothing but the extension of Data Mining. In the following paper, not only we are going to scrutinize the current techniques of image mining but also present a new technique for mining images using Genetic Algorithm.

Keywords: Image Mining, Data Mining, Genetic Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2399
3086 Model Discovery and Validation for the Qsar Problem using Association Rule Mining

Authors: Luminita Dumitriu, Cristina Segal, Marian Craciun, Adina Cocu, Lucian P. Georgescu

Abstract:

There are several approaches in trying to solve the Quantitative 1Structure-Activity Relationship (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. Predictive data mining techniques use either neural networks, or genetic programming, or neuro-fuzzy knowledge. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.

Keywords: association rules, classification, data mining, Quantitative Structure - Activity Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1737
3085 Concurrency in Web Access Patterns Mining

Authors: Jing Lu, Malcolm Keech, Weiru Chen

Abstract:

Web usage mining is an interesting application of data mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the key challenges for web access patterns mining is tackling the problem of mining richly structured patterns. This paper proposes a novel model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the search for and representation of concurrency between access patterns on the web. From experiments conducted on large-scale synthetic sequence data as well as real web access data, it is demonstrated that CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.

Keywords: concurrent access patterns (CAP), CAP mining and modelling, CAP-Graph, web access patterns (WAP), WAP-Graph, Web usage mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1675
3084 Discovering User Behaviour Patterns from Web Log Analysis to Enhance the Accessibility and Usability of Website

Authors: Harpreet Singh

Abstract:

Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.

Keywords: Web usage mining, log file, web mining, data mining, deep log analyser.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1013
3083 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: Data mining, data analysis, prediction, optimization, building operational performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3645
3082 A Distributed Approach to Extract High Utility Itemsets from XML Data

Authors: S. Kannimuthu, K. Premalatha

Abstract:

This paper investigates a new data mining capability that entails mining of High Utility Itemsets (HUI) in a distributed environment. Existing research in data mining deals with only presence or absence of an items and do not consider the semantic measures like weight or cost of the items. Thus, HUI mining algorithm has evolved. HUI mining is the one kind of utility mining concept, aims to identify itemsets whose utility satisfies a given threshold. Although, the approach of mining HUIs in a distributed environment and mining of the same from XML data have not explored yet. In this work, a novel approach is proposed to mine HUIs from the XML based data in a distributed environment. This work utilizes Service Oriented Computing (SOC) paradigm which provides Knowledge as a Service (KaaS). The interesting patterns are provided via the web services with the help of knowledge server to answer the queries of the consumers. The performance of the approach is evaluated on various databases using execution time and memory consumption.

Keywords: Data mining, Knowledge as a Service, service oriented computing, utility mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2405
3081 A Review: Comparative Study of Diverse Collection of Data Mining Tools

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

There have been a lot of efforts and researches undertaken in developing efficient tools for performing several tasks in data mining. Due to the massive amount of information embedded in huge data warehouses maintained in several domains, the extraction of meaningful pattern is no longer feasible. This issue turns to be more obligatory for developing several tools in data mining. Furthermore the major aspire of data mining software is to build a resourceful predictive or descriptive model for handling large amount of information more efficiently and user friendly. Data mining mainly contracts with excessive collection of data that inflicts huge rigorous computational constraints. These out coming challenges lead to the emergence of powerful data mining technologies. In this survey a diverse collection of data mining tools are exemplified and also contrasted with the salient features and performance behavior of each tool.

Keywords: Business Analytics, Data Mining, Data Analysis, Machine Learning, Text Mining, Predictive Analytics, Visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3324
3080 Response of the Residential Building Structureon Load Technical Seismicity due to Mining Activities

Authors: V. Salajka, Z. Kaláb, J. Kala, P. Hradil

Abstract:

In the territories where high-intensity earthquakes are frequent is paid attention to the solving of the seismic problems. In the paper are described two computational model variants based on finite element method of the construction with different subsoil simulation (rigid or elastic subsoil) is used. For simulation and calculations program system based on method final elements ANSYS was used. Seismic responses calculations of residential building structure were effected on loading characterized by accelerogram for comparing with the responses spectra method.

Keywords: Accelerogram, ANSYS, mining induced seismic, residential building structure, spectra, subsoil.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480
3079 Modelling of Powered Roof Supports Work

Authors: Marcin Michalak

Abstract:

Due to the increasing efforts on saving our natural environment a change in the structure of energy resources can be observed - an increasing fraction of a renewable energy sources. In many countries traditional underground coal mining loses its significance but there are still countries, like Poland or Germany, in which the coal based technologies have the greatest fraction in a total energy production. This necessitates to make an effort to limit the costs and negative effects of underground coal mining. The longwall complex is as essential part of the underground coal mining. The safety and the effectiveness of the work is strongly dependent of the diagnostic state of powered roof supports. The building of a useful and reliable diagnostic system requires a lot of data. As the acquisition of a data of any possible operating conditions it is important to have a possibility to generate a demanded artificial working characteristics. In this paper a new approach of modelling a leg pressure in the single unit of powered roof support. The model is a result of the analysis of a typical working cycles.

Keywords: Machine modelling, underground mining, coal mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1872
3078 Powerful Tool to Expand Business Intelligence: Text Mining

Authors: Li Gao, Elizabeth Chang, Song Han

Abstract:

With the extensive inclusion of document, especially text, in the business systems, data mining does not cover the full scope of Business Intelligence. Data mining cannot deliver its impact on extracting useful details from the large collection of unstructured and semi-structured written materials based on natural languages. The most pressing issue is to draw the potential business intelligence from text. In order to gain competitive advantages for the business, it is necessary to develop the new powerful tool, text mining, to expand the scope of business intelligence. In this paper, we will work out the strong points of text mining in extracting business intelligence from huge amount of textual information sources within business systems. We will apply text mining to each stage of Business Intelligence systems to prove that text mining is the powerful tool to expand the scope of BI. After reviewing basic definitions and some related technologies, we will discuss the relationship and the benefits of these to text mining. Some examples and applications of text mining will also be given. The motivation behind is to develop new approach to effective and efficient textual information analysis. Thus we can expand the scope of Business Intelligence using the powerful tool, text mining.

Keywords: Business intelligence, document warehouse, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2608
3077 Application of Association Rule Mining in Supplier Selection Criteria

Authors: A. Haery, N. Salmasi, M. Modarres Yazdi, H. Iranmanesh

Abstract:

In this paper the application of rule mining in order to review the effective factors on supplier selection is reviewed in the following three sections 1) criteria selecting and information gathering 2) performing association rule mining 3) validation and constituting rule base. Afterwards a few of applications of rule base is explained. Then, a numerical example is presented and analyzed by Clementine software. Some of extracted rules as well as the results are presented at the end.

Keywords: Association rule mining, data mining, supplierselection criteria.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1875
3076 Preliminary Overview of Data Mining Technology for Knowledge Management System in Institutions of Higher Learning

Authors: Muslihah Wook, Zawiyah M. Yusof, Mohd Zakree Ahmad Nazri

Abstract:

Data mining has been integrated into application systems to enhance the quality of the decision-making process. This study aims to focus on the integration of data mining technology and Knowledge Management System (KMS), due to the ability of data mining technology to create useful knowledge from large volumes of data. Meanwhile, KMS vitally support the creation and use of knowledge. The integration of data mining technology and KMS are popularly used in business for enhancing and sustaining organizational performance. However, there is a lack of studies that applied data mining technology and KMS in the education sector; particularly students- academic performance since this could reflect the IHL performance. Realizing its importance, this study seeks to integrate data mining technology and KMS to promote an effective management of knowledge within IHLs. Several concepts from literature are adapted, for proposing the new integrative data mining technology and KMS framework to an IHL.

Keywords: Data mining, Institutions of Higher Learning, Knowledge Management System, Students' academic performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2095
3075 Influence of Non-Structural Elements on Dynamic Response of Multi-Storey Rc Building to Mining Shock

Authors: Joanna M. Dulińska, Maria Fabijańska

Abstract:

In the paper the results of calculations of the dynamic response of a multi-storey reinforced concrete building to a strong mining shock originated from the main region of mining activity in Poland (i.e. the Legnica-Glogow Copper District) are presented. The representative time histories of accelerations registered in three directions were used as ground motion data in calculations of the dynamic response of the structure. Two variants of a numerical model were applied: the model including only structural elements of the building and the model including both structural and non-structural elements (i.e. partition walls and ventilation ducts made of brick). It turned out that non-structural elements of multi-storey RC buildings have a small impact of about 10 % on natural frequencies of these structures. It was also proved that the dynamic response of building to mining shock obtained in case of inclusion of all non-structural elements in the numerical model is about 20 % smaller than in case of consideration of structural elements only. The principal stresses obtained in calculations of dynamic response of multi-storey building to strong mining shock are situated on the level of about 30% of values obtained from static analysis (dead load).

Keywords: Dynamic characteristics of buildings, mining shocks, dynamic response of buildings, non-structural elements

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1828
3074 Elimination of Redundant Links in Web Pages– Mathematical Approach

Authors: G. Poonkuzhali, K.Thiagarajan, K.Sarukesi

Abstract:

With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent one that are likely to contain the outlying data such as noise, irrelevant and redundant data. This paper proposes new algorithm for mining the web content by detecting the redundant links from the web documents using set theoretical(classical mathematics) such as subset, union, intersection etc,. Then the redundant links is removed from the original web content to get the required information by the user..

Keywords: Web documents, Web content mining, redundantlink, outliers, set theory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1961
3073 Generating Concept Trees from Dynamic Self-organizing Map

Authors: Norashikin Ahmad, Damminda Alahakoon

Abstract:

Self-organizing map (SOM) provides both clustering and visualization capabilities in mining data. Dynamic self-organizing maps such as Growing Self-organizing Map (GSOM) has been developed to overcome the problem of fixed structure in SOM to enable better representation of the discovered patterns. However, in mining large datasets or historical data the hierarchical structure of the data is also useful to view the cluster formation at different levels of abstraction. In this paper, we present a technique to generate concept trees from the GSOM. The formation of tree from different spread factor values of GSOM is also investigated and the quality of the trees analyzed. The results show that concept trees can be generated from GSOM, thus, eliminating the need for re-clustering of the data from scratch to obtain a hierarchical view of the data under study.

Keywords: dynamic self-organizing map, concept formation, clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1411
3072 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

Authors: Ghada Badr, Arwa Alturki

Abstract:

The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.

Keywords: Alignment, RNA secondary structure, pairwise, component-based, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 924
3071 A Multi-Agent Framework for Data Mining

Authors: Kamal Ali Albashiri, Khaled Ahmed Kadouh

Abstract:

A generic and extendible Multi-Agent Data Mining (MADM) framework, MADMF (the Multi-Agent Data Mining Framework) is described. The central feature of the framework is that it avoids the use of agreed meta-language formats by supporting a framework of wrappers. The advantage offered is that the framework is easily extendible, so that further data agents and mining agents can simply be added to the framework. A demonstration MADMF framework is currently available. The paper includes details of the MADMF architecture and the wrapper principle incorporated into it. A full description and evaluation of the framework-s operation is provided by considering two MADM scenarios.

Keywords: Multi-Agent Data Mining (MADM), Frequent Itemsets, Meta ARM, Association Rule Mining, Classifier generator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2016
3070 A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data

Authors: H. Baazaoui Zghal, S. Faiz, H. Ben Ghezala

Abstract:

Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.

Keywords: Databases, data mining, multi-agent, spatial datamart.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2000
3069 About Methods of Additional Mining Pressure Figuring while Reconstruction of Tunnels

Authors: M. Moistsrapishvili, I. Ugrekhelidze, T. Baramashvili, D. Malaghuradze

Abstract:

At the end of the 20th century it was actual the development of transport corridors and the improvement of their technical parameters. With this purpose, many countries and Georgia among them manufacture to construct new highways, railways and also reconstruction-modernization of the existing transport infrastructure. It is necessary to explore the artificial structures (bridges and tunnels) on the existing tracks as they are very old. Conference report includes the peculiarities of reconstruction of tunnels, because we think that this theme is important for the modernization of the existing road infrastructure. We must remark that the methods of determining mining pressure of tunnel reconstructions are worked out according to the jobs of new tunnels but it is necessary to foresee additional mining pressure which will be formed during their reconstruction. In this report there are given the methods of figuring the additional mining pressure while reconstruction of tunnels, there was worked out the computer program, it is determined that during reconstruction of tunnels the additional mining pressure is 1/3rd of main mining pressure.

Keywords: Mining pressure, Reconstruction of tunnels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1619