Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 53

Search results for: Knowledge discovery

53 From Electroencephalogram to Epileptic Seizures Detection by Using Artificial Neural Networks

Authors: Gaetano Zazzaro, Angelo Martone, Roberto V. Montaquila, Luigi Pavone

Abstract:

Seizure is the main factor that affects the quality of life of epileptic patients. The diagnosis of epilepsy, and hence the identification of epileptogenic zone, is commonly made by using continuous Electroencephalogram (EEG) signal monitoring. Seizure identification on EEG signals is made manually by epileptologists and this process is usually very long and error prone. The aim of this paper is to describe an automated method able to detect seizures in EEG signals, using knowledge discovery in database process and data mining methods and algorithms, which can support physicians during the seizure detection process. Our detection method is based on Artificial Neural Network classifier, trained by applying the multilayer perceptron algorithm, and by using a software application, called Training Builder that has been developed for the massive extraction of features from EEG signals. This tool is able to cover all the data preparation steps ranging from signal processing to data analysis techniques, including the sliding window paradigm, the dimensionality reduction algorithms, information theory, and feature selection measures. The final model shows excellent performances, reaching an accuracy of over 99% during tests on data of a single patient retrieved from a publicly available EEG dataset.

Keywords: Signal Processing, Data Mining, Epilepsy, Feature Extraction, Artificial Neural Network, electroencephalogram, seizure detection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 258
52 Machine Learning Methods for Network Intrusion Detection

Authors: Mouhammad Alkasassbeh, Mohammad Almseidin

Abstract:

Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.

Keywords: IDS, KDD, DDoS, MLP

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 265
51 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: Data Mining, Knowledge Discovery, Machine Learning, supervised classification, Similarity measurement

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 715
50 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: Data Mining, classification, Decision trees, Textile Production

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 769
49 Causal Relation Identification Using Convolutional Neural Networks and Knowledge Based Features

Authors: Tharini N. de Silva, Xiao Zhibo, Zhao Rui, Mao Kezhi

Abstract:

Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.

Keywords: Machine Learning, natural language processing, Convolutional Neural Networks, Causal relation identification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472
48 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: Pattern Recognition, Knowledge Discovery, Clustering, unsupervised learning, k-means, categorical datasets

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2834
47 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: Data Mining, Knowledge Discovery, Tourism, Classification Algorithms

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1750
46 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic

Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam

Abstract:

In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.

Keywords: Data Mining, Knowledge Discovery, Fuzzy Logic, Decision Support System, data discovery

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1650
45 A General Framework for Knowledge Discovery Using High Performance Machine Learning Algorithms

Authors: S. Nandagopalan, N. Pradeep

Abstract:

The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.

Keywords: Bayesian, active contour, feature vector, Echocardiographic image

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1371
44 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: Data Mining, Remote Sensing, Object recognition, classification, Feature Extraction, waterbody identification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1615
43 Improved FP-growth Algorithm with Multiple Minimum Supports Using Maximum Constraints

Authors: Elsayeda M. Elgaml, Dina M. Ibrahim, Elsayed A. Sallam

Abstract:

Association rule mining is one of the most important fields of data mining and knowledge discovery. In this paper, we propose an efficient multiple support frequent pattern growth algorithm which we called “MSFP-growth” that enhancing the FPgrowth algorithm by making infrequent child node pruning step with multiple minimum support using maximum constrains. The algorithm is implemented, and it is compared with other common algorithms: Apriori-multiple minimum supports using maximum constraints and FP-growth. The experimental results show that the rule mining from the proposed algorithm are interesting and our algorithm achieved better performance than other algorithms without scarifying the accuracy. 

Keywords: Association Rules, FP-growth, Weka tool, Multiple minimum supports

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2836
42 What the Future Holds for Social Media Data Analysis

Authors: P. Wlodarczak, J. Soar, M. Ally

Abstract:

The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.

Keywords: Social Media, Knowledge Discovery, Machine Learning, Text Mining, predictive analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3405
41 Improving University Operations with Data Mining: Predicting Student Performance

Authors: Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević

Abstract:

The purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems.

Keywords: Data Mining, Student success, knowledge discovery in databases, prediction models

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2059
40 A Proposed Optimized and Efficient Intrusion Detection System for Wireless Sensor Network

Authors: Abdulaziz Alsadhan, Naveed Khan

Abstract:

In recent years intrusions on computer network are the major security threat. Hence, it is important to impede such intrusions. The hindrance of such intrusions entirely relies on its detection, which is primary concern of any security tool like Intrusion detection system (IDS). Therefore, it is imperative to accurately detect network attack. Numerous intrusion detection techniques are available but the main issue is their performance. The performance of IDS can be improved by increasing the accurate detection rate and reducing false positive. The existing intrusion detection techniques have the limitation of usage of raw dataset for classification. The classifier may get jumble due to redundancy, which results incorrect classification. To minimize this problem, Principle component analysis (PCA), Linear Discriminant Analysis (LDA) and Local Binary Pattern (LBP) can be applied to transform raw features into principle features space and select the features based on their sensitivity. Eigen values can be used to determine the sensitivity. To further classify, the selected features greedy search, back elimination, and Particle Swarm Optimization (PSO) can be used to obtain a subset of features with optimal sensitivity and highest discriminatory power. This optimal feature subset is used to perform classification. For classification purpose, Support Vector Machine (SVM) and Multilayer Perceptron (MLP) are used due to its proven ability in classification. The Knowledge Discovery and Data mining (KDD’99) cup dataset was considered as a benchmark for evaluating security detection mechanisms. The proposed approach can provide an optimal intrusion detection mechanism that outperforms the existing approaches and has the capability to minimize the number of features and maximize the detection rates.

Keywords: Particle Swarm Optimization (PSO), Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Local Binary Pattern (LBP), Support Vector Machine (SVM), Multilayer Perceptron (MLP)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2397
39 Knowledge Discovery from Production Databases for Hierarchical Process Control

Authors: Pavol Tanuska, Pavel Vazan, Michal Kebisek, Dominika Jurovata

Abstract:

The paper gives the results of the project that was oriented on the usage of knowledge discoveries from production systems for needs of the hierarchical process control. One of the main project goals was the proposal of knowledge discovery model for process control. Specifics data mining methods and techniques was used for defined problems of the process control. The gained knowledge was used on the real production system thus the proposed solution has been verified. The paper documents how is possible to apply the new discovery knowledge to use in the real hierarchical process control. There are specified the opportunities for application of the proposed knowledge discovery model for hierarchical process control.

Keywords: Neural Network, hierarchical process control, knowledge discovery from databases

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1305
38 A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection

Authors: E. N. Sathishkumar, K. Thangavel, T. Chandrasekhar

Abstract:

Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.

Keywords: Clustering, Rough Sets, Gene Selection, K-Mean-Quick Reduct

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1916
37 A Decision Matrix for the Evaluation of Triplestores for Use in a Virtual Research Environment

Authors: Tristan O’Neill, Trina Myers, Jarrod Trevathan

Abstract:

The Tropical Data Hub (TDH) is a virtual research environment that provides researchers with an e-research infrastructure to congregate significant tropical data sets for data reuse, integration, searching, and correlation. However, researchers often require data and metadata synthesis across disciplines for cross-domain analyses and knowledge discovery. A triplestore offers a semantic layer to achieve a more intelligent method of search to support the synthesis requirements by automating latent linkages in the data and metadata. Presently, the benchmarks to aid the decision of which triplestore is best suited for use in an application environment like the TDH are limited to performance. This paper describes a new evaluation tool developed to analyze both features and performance. The tool comprises a weighted decision matrix to evaluate the interoperability, functionality, performance, and support availability of a range of integrated and native triplestores to rank them according to requirements of the TDH.

Keywords: Semantic Web, Performance Analysis, Virtual research environment, tropical data hub

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1380
36 An Evaluation Model for Semantic Enablement of Virtual Research Environments

Authors: Tristan O'Neill, Trina Myers, Jarrod Trevathan

Abstract:

The Tropical Data Hub (TDH) is a virtual research environment that provides researchers with an e-research infrastructure to congregate significant tropical data sets for data reuse, integration, searching, and correlation. However, researchers often require data and metadata synthesis across disciplines for crossdomain analyses and knowledge discovery. A triplestore offers a semantic layer to achieve a more intelligent method of search to support the synthesis requirements by automating latent linkages in the data and metadata. Presently, the benchmarks to aid the decision of which triplestore is best suited for use in an application environment like the TDH are limited to performance. This paper describes a new evaluation tool developed to analyze both features and performance. The tool comprises a weighted decision matrix to evaluate the interoperability, functionality, performance, and support availability of a range of integrated and native triplestores to rank them according to requirements of the TDH.

Keywords: Semantic Web, Performance Analysis, Virtual research environment, tropical data hub

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1439
35 Organizational Decision Based on Business Intelligence

Authors: Pejman Hosseinioun, Rose Shayeghi, Ghasem Ghorbani Rostam

Abstract:

Nowadays, obtaining traditional statistics and reports is not adequate for the needs of organizational managers. The managers need to analyze and to transform the raw data into knowledge in the world filled with information. Therefore in this regard various processes have been developed. In the meantime the artificial intelligence-based processes are used and the new topics such as business intelligence and knowledge discovery have emerged. In the current paper it is sought to study the business intelligence and its applications in the organizations.

Keywords: Business Intelligence, business processes, business intelligence infrastructures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1592
34 Data-organization Before Learning Multi-Entity Bayesian Networks Structure

Authors: H. Bouhamed, A. Rebai, T. Lecroq, M. Jaoua

Abstract:

The objective of our work is to develop a new approach for discovering knowledge from a large mass of data, the result of applying this approach will be an expert system that will serve as diagnostic tools of a phenomenon related to a huge information system. We first recall the general problem of learning Bayesian network structure from data and suggest a solution for optimizing the complexity by using organizational and optimization methods of data. Afterward we proposed a new heuristic of learning a Multi-Entities Bayesian Networks structures. We have applied our approach to biological facts concerning hereditary complex illnesses where the literatures in biology identify the responsible variables for those diseases. Finally we conclude on the limits arched by this work.

Keywords: Data-organization, data-optimization, automatic knowledge discovery, Multi-Entities Bayesian networks, score merging

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1193
33 A Hybrid Metaheuristic Framework for Evolving the PROAFTN Classifier

Authors: Feras Al-Obeidat, Nabil Belacel, Juan A. Carretero, Prabhat Mahanti,

Abstract:

In this paper, a new learning algorithm based on a hybrid metaheuristic integrating Differential Evolution (DE) and Reduced Variable Neighborhood Search (RVNS) is introduced to train the classification method PROAFTN. To apply PROAFTN, values of several parameters need to be determined prior to classification. These parameters include boundaries of intervals and relative weights for each attribute. Based on these requirements, the hybrid approach, named DEPRO-RVNS, is presented in this study. In some cases, the major problem when applying DE to some classification problems was the premature convergence of some individuals to local optima. To eliminate this shortcoming and to improve the exploration and exploitation capabilities of DE, such individuals were set to iteratively re-explored using RVNS. Based on the generated results on both training and testing data, it is shown that the performance of PROAFTN is significantly improved. Furthermore, the experimental study shows that DEPRO-RVNS outperforms well-known machine learning classifiers in a variety of problems.

Keywords: Knowledge Discovery, Supervised Learning, Differential Evolution, Multiple Criteria Classification, Reduced Variable Neighborhood Search, PROAFTN

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1144
32 Improving Spatiotemporal Change Detection: A High Level Fusion Approach for Discovering Uncertain Knowledge from Satellite Image Database

Authors: Wadii Boulila, Imed Riadh Farah, Karim Saheb Ettabaa, Basel Solaiman, Henda Ben Ghezala

Abstract:

This paper investigates the problem of tracking spa¬tiotemporal changes of a satellite image through the use of Knowledge Discovery in Database (KDD). The purpose of this study is to help a given user effectively discover interesting knowledge and then build prediction and decision models. Unfortunately, the KDD process for spatiotemporal data is always marked by several types of imperfections. In our paper, we take these imperfections into consideration in order to provide more accurate decisions. To achieve this objective, different KDD methods are used to discover knowledge in satellite image databases. Each method presents a different point of view of spatiotemporal evolution of a query model (which represents an extracted object from a satellite image). In order to combine these methods, we use the evidence fusion theory which considerably improves the spatiotemporal knowledge discovery process and increases our belief in the spatiotemporal model change. Experimental results of satellite images representing the region of Auckland in New Zealand depict the improvement in the overall change detection as compared to using classical methods.

Keywords: Data Mining, Knowledge discovery in satellite databases, knowledge fusion, data imperfection, spatiotemporal change detection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1215
31 Concurrency in Web Access Patterns Mining

Authors: Jing Lu, Malcolm Keech, Weiru Chen

Abstract:

Web usage mining is an interesting application of data mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the key challenges for web access patterns mining is tackling the problem of mining richly structured patterns. This paper proposes a novel model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the search for and representation of concurrency between access patterns on the web. From experiments conducted on large-scale synthetic sequence data as well as real web access data, it is demonstrated that CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.

Keywords: Web Usage Mining, concurrent access patterns (CAP), CAP mining and modelling, CAP-Graph, web access patterns (WAP), WAP-Graph

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1319
30 Multidimensional Visualization Tools for Analysis of Expression Data

Authors: Urska Cvek, Marjan Trutschl, Randolph Stone II, Zanobia Syed, John L. Clifford, Anita L. Sabichi

Abstract:

Expression data analysis is based mostly on the statistical approaches that are indispensable for the study of biological systems. Large amounts of multidimensional data resulting from the high-throughput technologies are not completely served by biostatistical techniques and are usually complemented with visual, knowledge discovery and other computational tools. In many cases, in biological systems we only speculate on the processes that are causing the changes, and it is the visual explorative analysis of data during which a hypothesis is formed. We would like to show the usability of multidimensional visualization tools and promote their use in life sciences. We survey and show some of the multidimensional visualization tools in the process of data exploration, such as parallel coordinates and radviz and we extend them by combining them with the self-organizing map algorithm. We use a time course data set of transitional cell carcinoma of the bladder in our examples. Analysis of data with these tools has the potential to uncover additional relationships and non-trivial structures.

Keywords: Visualization, Self-Organizing Maps, microarrays, parallel coordinates, radviz

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2120
29 FCA-based Conceptual Knowledge Discovery in Folksonomy

Authors: Yu-Kyung Kang, Suk-Hyung Hwang, Kyoung-Mo Yang

Abstract:

The tagging data of (users, tags and resources) constitutes a folksonomy that is the user-driven and bottom-up approach to organizing and classifying information on the Web. Tagging data stored in the folksonomy include a lot of very useful information and knowledge. However, appropriate approach for analyzing tagging data and discovering hidden knowledge from them still remains one of the main problems on the folksonomy mining researches. In this paper, we have proposed a folksonomy data mining approach based on FCA for discovering hidden knowledge easily from folksonomy. Also we have demonstrated how our proposed approach can be applied in the collaborative tagging system through our experiment. Our proposed approach can be applied to some interesting areas such as social network analysis, semantic web mining and so on.

Keywords: classification, Formal Concept Analysis, Folksonomy data mining, collaborative tagging, conceptual knowledge discovery

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488
28 Knowledge Discovery Techniques for Talent Forecasting in Human Resource Application

Authors: Hamidah Jantan, Abdul Razak Hamdan, Zulaiha Ali Othman

Abstract:

Human Resource (HR) applications can be used to provide fair and consistent decisions, and to improve the effectiveness of decision making processes. Besides that, among the challenge for HR professionals is to manage organization talents, especially to ensure the right person for the right job at the right time. For that reason, in this article, we attempt to describe the potential to implement one of the talent management tasks i.e. identifying existing talent by predicting their performance as one of HR application for talent management. This study suggests the potential HR system architecture for talent forecasting by using past experience knowledge known as Knowledge Discovery in Database (KDD) or Data Mining. This article consists of three main parts; the first part deals with the overview of HR applications, the prediction techniques and application, the general view of Data mining and the basic concept of talent management in HRM. The second part is to understand the use of Data Mining technique in order to solve one of the talent management tasks, and the third part is to propose the potential HR system architecture for talent forecasting.

Keywords: HR Application, Knowledge Discovery inDatabase (KDD), Talent Forecasting

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3978
27 Data Mining Using Learning Automata

Authors: M. R. Aghaebrahimi, S. H. Zahiri, M. Amiri

Abstract:

In this paper a data miner based on the learning automata is proposed and is called LA-miner. The LA-miner extracts classification rules from data sets automatically. The proposed algorithm is established based on the function optimization using learning automata. The experimental results on three benchmarks indicate that the performance of the proposed LA-miner is comparable with (sometimes better than) the Ant-miner (a data miner algorithm based on the Ant Colony optimization algorithm) and CNZ (a well-known data mining algorithm for classification).

Keywords: Data Mining, Knowledge Discovery, learning automata, Classification rules

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1620
26 Deep Web Content Mining

Authors: Shohreh Ajoudanian, Mohammad Davarpanah Jazi

Abstract:

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.

Keywords: information extraction, Content mining, complex matching, correlation mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1929
25 Optimizing Spatial Trend Detection By Artificial Immune Systems

Authors: M. Derakhshanfar, B. Minaei-Bidgoli

Abstract:

Spatial trends are one of the valuable patterns in geo databases. They play an important role in data analysis and knowledge discovery from spatial data. A spatial trend is a regular change of one or more non spatial attributes when spatially moving away from a start object. Spatial trend detection is a graph search problem therefore heuristic methods can be good solution. Artificial immune system (AIS) is a special method for searching and optimizing. AIS is a novel evolutionary paradigm inspired by the biological immune system. The models based on immune system principles, such as the clonal selection theory, the immune network model or the negative selection algorithm, have been finding increasing applications in fields of science and engineering. In this paper, we develop a novel immunological algorithm based on clonal selection algorithm (CSA) for spatial trend detection. We are created neighborhood graph and neighborhood path, then select spatial trends that their affinity is high for antibody. In an evolutionary process with artificial immune algorithm, affinity of low trends is increased with mutation until stop condition is satisfied.

Keywords: Spatial Data Mining, Heuristic Methods, artificial immune system, Spatial Trend Detection, Clonal Selection Algorithm (CSA)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1550
24 Incremental Mining of Shocking Association Patterns

Authors: Eiad Yafi, Ahmed Sultan Al-Hegami, M. A. Alam, Ranjit Biswas

Abstract:

Association rules are an important problem in data mining. Massively increasing volume of data in real life databases has motivated researchers to design novel and incremental algorithms for association rules mining. In this paper, we propose an incremental association rules mining algorithm that integrates shocking interestingness criterion during the process of building the model. A new interesting measure called shocking measure is introduced. One of the main features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The incremental model that reflects the changing data and the user beliefs is attractive in order to make the over all KDD process more effective and efficient. We implemented the proposed approach and experiment it with some public datasets and found the results quite promising.

Keywords: domain knowledge, data mining, Interestingness, Knowledge Discovery in Databases (KDD), Incremental Association rules, Shocking rules (SHR)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1520