Search results for: Data Clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7634

Search results for: Data Clustering

7574 Efficient Mean Shift Clustering Using Exponential Integral Kernels

Authors: S. Sutor, R. Röhr, G. Pujolle, R. Reda

Abstract:

This paper presents a highly efficient algorithm for detecting and tracking humans and objects in video surveillance sequences. Mean shift clustering is applied on backgrounddifferenced image sequences. For efficiency, all calculations are performed on integral images. Novel corresponding exponential integral kernels are introduced to allow the application of nonuniform kernels for clustering, which dramatically increases robustness without giving up the efficiency of the integral data structures. Experimental results demonstrating the power of this approach are presented.

Keywords: Clustering, Integral Images, Kernels, Person Detection, Person Tracking, Intelligent Video Surveillance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527
7573 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1984
7572 Design and Implementation a New Energy Efficient Clustering Algorithm using Genetic Algorithm for Wireless Sensor Networks

Authors: Moslem Afrashteh Mehr

Abstract:

Wireless Sensor Networks consist of small battery powered devices with limited energy resources. once deployed, the small sensor nodes are usually inaccessible to the user, and thus replacement of the energy source is not feasible. Hence, One of the most important issues that needs to be enhanced in order to improve the life span of the network is energy efficiency. to overcome this demerit many research have been done. The clustering is the one of the representative approaches. in the clustering, the cluster heads gather data from nodes and sending them to the base station. In this paper, we introduce a dynamic clustering algorithm using genetic algorithm. This algorithm takes different parameters into consideration to increase the network lifetime. To prove efficiency of proposed algorithm, we simulated the proposed algorithm compared with LEACH algorithm using the matlab

Keywords: Wireless Sensor Networks, Clustering, Geneticalgorithm, Energy Consumption

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2883
7571 Automatic Clustering of Gene Ontology by Genetic Algorithm

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias, Zalmiyah Zakaria, Saberi M. Mohamad

Abstract:

Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.

Keywords: Automatic clustering, cohesion-and-coupling metric, gene ontology; genetic algorithm, split-and-merge algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1954
7570 Data Oriented Model of Image: as a Framework for Image Processing

Authors: A. Habibizad Navin, A. Sadighi, M. Naghian Fesharaki, M. Mirnia, M. Teshnelab, R. Keshmiri

Abstract:

This paper presents a new data oriented model of image. Then a representation of it, ADBT, is introduced. The ability of ADBT is clustering, segmentation, measuring similarity of images etc, with desired precision and corresponding speed.

Keywords: Data oriented modelling, image, clustering, segmentation, classification, ADBT and image processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1798
7569 Improved Wavelet Neural Networks for Early Cancer Diagnosis Using Clustering Algorithms

Authors: Zarita Zainuddin, Ong Pauline

Abstract:

Wavelet neural networks (WNNs) have emerged as a vital alternative to the vastly studied multilayer perceptrons (MLPs) since its first implementation. In this paper, we applied various clustering algorithms, namely, K-means (KM), Fuzzy C-means (FCM), symmetry-based K-means (SBKM), symmetry-based Fuzzy C-means (SBFCM) and modified point symmetry-based K-means (MPKM) clustering algorithms in choosing the translation parameter of a WNN. These modified WNNs are further applied to the heterogeneous cancer classification using benchmark microarray data and were compared against the conventional WNN with random initialization method. Experimental results showed that a WNN classifier with the MPKM algorithm is more precise than the conventional WNN as well as the WNNs with other clustering algorithms.

Keywords: Clustering, microarray, symmetry, wavelet neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1615
7568 Optimizing of Fuzzy C-Means Clustering Algorithm Using GA

Authors: Mohanad Alata, Mohammad Molhim, Abdullah Ramini

Abstract:

Fuzzy C-means Clustering algorithm (FCM) is a method that is frequently used in pattern recognition. It has the advantage of giving good modeling results in many cases, although, it is not capable of specifying the number of clusters by itself. In FCM algorithm most researchers fix weighting exponent (m) to a conventional value of 2 which might not be the appropriate for all applications. Consequently, the main objective of this paper is to use the subtractive clustering algorithm to provide the optimal number of clusters needed by FCM algorithm by optimizing the parameters of the subtractive clustering algorithm by an iterative search approach and then to find an optimal weighting exponent (m) for the FCM algorithm. In order to get an optimal number of clusters, the iterative search approach is used to find the optimal single-output Sugenotype Fuzzy Inference System (FIS) model by optimizing the parameters of the subtractive clustering algorithm that give minimum least square error between the actual data and the Sugeno fuzzy model. Once the number of clusters is optimized, then two approaches are proposed to optimize the weighting exponent (m) in the FCM algorithm, namely, the iterative search approach and the genetic algorithms. The above mentioned approach is tested on the generated data from the original function and optimal fuzzy models are obtained with minimum error between the real data and the obtained fuzzy models.

Keywords: Fuzzy clustering, Fuzzy C-Means, Genetic Algorithm, Sugeno fuzzy systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3255
7567 Grid–SVC: An Improvement in SVC Algorithm, Based On Grid Based Clustering

Authors: Farhad Hadinejad, Hasan Saberi, Saeed Kazem

Abstract:

Support vector clustering (SVC) is an important kernelbased clustering algorithm in multi applications. It has got two main bottle necks, the high computation price and labeling piece. In this paper, we presented a modified SVC method, named Grid–SVC, to improve the original algorithm computationally. First we normalized and then we parted the interval, where the SVC is processing, using a novel Grid–based clustering algorithm. The algorithm parts the intervals, based on the density function of the data set and then applying the cartesian multiply makes multi-dimensional grids. Eliminating many outliers and noise in the preprocess, we apply an improved SVC method to each parted grid in a parallel way. The experimental results show both improvement in time complexity order and the accuracy.

Keywords: Grid–based clustering, SVC, Density function, Radial basis function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1744
7566 A K-Means Based Clustering Approach for Finding Faulty Modules in Open Source Software Systems

Authors: Parvinder S. Sandhu, Jagdeep Singh, Vikas Gupta, Mandeep Kaur, Sonia Manhas, Ramandeep Sidhu

Abstract:

Prediction of fault-prone modules provides one way to support software quality engineering. Clustering is used to determine the intrinsic grouping in a set of unlabeled data. Among various clustering techniques available in literature K-Means clustering approach is most widely being used. This paper introduces K-Means based Clustering approach for software finding the fault proneness of the Object-Oriented systems. The contribution of this paper is that it has used Metric values of JEdit open source software for generation of the rules for the categorization of software modules in the categories of Faulty and non faulty modules and thereafter empirically validation is performed. The results are measured in terms of accuracy of prediction, probability of Detection and Probability of False Alarms.

Keywords: K-Means, Software Fault, Classification, ObjectOriented Metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2303
7565 Analyzing The Effect of Variable Round Time for Clustering Approach in Wireless Sensor Networks

Authors: Vipin Pal, Girdhari Singh, R P Yadav

Abstract:

As wireless sensor networks are energy constraint networks so energy efficiency of sensor nodes is the main design issue. Clustering of nodes is an energy efficient approach. It prolongs the lifetime of wireless sensor networks by avoiding long distance communication. Clustering algorithms operate in rounds. Performance of clustering algorithm depends upon the round time. A large round time consumes more energy of cluster heads while a small round time causes frequent re-clustering. So existing clustering algorithms apply a trade off to round time and calculate it from the initial parameters of networks. But it is not appropriate to use initial parameters based round time value throughout the network lifetime because wireless sensor networks are dynamic in nature (nodes can be added to the network or some nodes go out of energy). In this paper a variable round time approach is proposed that calculates round time depending upon the number of active nodes remaining in the field. The proposed approach makes the clustering algorithm adaptive to network dynamics. For simulation the approach is implemented with LEACH in NS-2 and the results show that there is 6% increase in network lifetime, 7% increase in 50% node death time and 5% improvement over the data units gathered at the base station.

Keywords: Wireless Sensor Network, Clustering, Energy Efficiency, Round Time.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1786
7564 Towards Clustering of Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf

Abstract:

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1505
7563 Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement

Authors: Wang Lin, Li Zhiqiang

Abstract:

The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.

Keywords: Behavior pattern, cooperative learning, data analyze, K-means clustering algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 814
7562 An Energy-Efficient Protocol with Static Clustering for Wireless Sensor Networks

Authors: Amir Sepasi Zahmati, Bahman Abolhassani, Ali Asghar Beheshti Shirazi, Ali Shojaee Bakhtiari

Abstract:

A wireless sensor network with a large number of tiny sensor nodes can be used as an effective tool for gathering data in various situations. One of the major issues in wireless sensor networks is developing an energy-efficient routing protocol which has a significant impact on the overall lifetime of the sensor network. In this paper, we propose a novel hierarchical with static clustering routing protocol called Energy-Efficient Protocol with Static Clustering (EEPSC). EEPSC, partitions the network into static clusters, eliminates the overhead of dynamic clustering and utilizes temporary-cluster-heads to distribute the energy load among high-power sensor nodes; thus extends network lifetime. We have conducted simulation-based evaluations to compare the performance of EEPSC against Low-Energy Adaptive Clustering Hierarchy (LEACH). Our experiment results show that EEPSC outperforms LEACH in terms of network lifetime and power consumption minimization.

Keywords: Clustering methods, energy efficiency, routingprotocol, wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2721
7561 Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance

Authors: S. Deelers, S. Auwatanamongkol

Abstract:

In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Cells are partitioned one at a time until the number of cells equals to the predefined number of clusters, K. The centers of the K cells become the initial cluster centers for K-means. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.

Keywords: Clustering algorithm, K-means algorithm, Datapartitioning, Initial cluster centers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2865
7560 STATISTICA Software: A State of the Art Review

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, P. Ranjetha

Abstract:

Data mining idea is mounting rapidly in admiration and also in their popularity. The foremost aspire of data mining method is to extract data from a huge data set into several forms that could be comprehended for additional use. The data mining is a technology that contains with rich potential resources which could be supportive for industries and businesses that pay attention to collect the necessary information of the data to discover their customer’s performances. For extracting data there are several methods are available such as Classification, Clustering, Association, Discovering, and Visualization… etc., which has its individual and diverse algorithms towards the effort to fit an appropriate model to the data. STATISTICA mostly deals with excessive groups of data that imposes vast rigorous computational constraints. These results trials challenge cause the emergence of powerful STATISTICA Data Mining technologies. In this survey an overview of the STATISTICA software is illustrated along with their significant features.

Keywords: Data Mining, STATISTICA Data Miner, Text Miner, Enterprise Server, Classification, Association, Clustering, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2607
7559 A Computational Cost-Effective Clustering Algorithm in Multidimensional Space Using the Manhattan Metric: Application to the Global Terrorism Database

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.

Keywords: Pattern recognition, partitional clustering, K-means clustering, Manhattan distance, terrorism data analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1359
7558 Using Data Mining for Learning and Clustering FCM

Authors: Somayeh Alizadeh, Mehdi Ghazanfari, Mohammad Fathian

Abstract:

Fuzzy Cognitive Maps (FCMs) have successfully been applied in numerous domains to show relations between essential components. In some FCM, there are more nodes, which related to each other and more nodes means more complex in system behaviors and analysis. In this paper, a novel learning method used to construct FCMs based on historical data and by using data mining and DEMATEL method, a new method defined to reduce nodes number. This method cluster nodes in FCM based on their cause and effect behaviors.

Keywords: Clustering, Data Mining, Fuzzy Cognitive Map(FCM), Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2015
7557 Clustering Based Formulation for Short Term Load Forecasting

Authors: Ajay Shekhar Pandey, D. Singh, S. K. Sinha

Abstract:

A clustering based technique has been developed and implemented for Short Term Load Forecasting, in this article. Formulation has been done using Mean Absolute Percentage Error (MAPE) as an objective function. Data Matrix and cluster size are optimization variables. Model designed, uses two temperature variables. This is compared with six input Radial Basis Function Neural Network (RBFNN) and Fuzzy Inference Neural Network (FINN) for the data of the same system, for same time period. The fuzzy inference system has the network structure and the training procedure of a neural network which initially creates a rule base from existing historical load data. It is observed that the proposed clustering based model is giving better forecasting accuracy as compared to the other two methods. Test results also indicate that the RBFNN can forecast future loads with accuracy comparable to that of proposed method, where as the training time required in the case of FINN is much less.

Keywords: Load forecasting, clustering, fuzzy inference.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625
7556 GCM Based Fuzzy Clustering to Identify Homogeneous Climatic Regions of North-East India

Authors: Arup K. Sarma, Jayshree Hazarika

Abstract:

The North-eastern part of India, which receives heavier rainfall than other parts of the subcontinent, is of great concern now-a-days with regard to climate change. High intensity rainfall for short duration and longer dry spell, occurring due to impact of climate change, affects river morphology too. In the present study, an attempt is made to delineate the North-eastern region of India into some homogeneous clusters based on the Fuzzy Clustering concept and to compare the resulting clusters obtained by using conventional methods and nonconventional methods of clustering. The concept of clustering is adapted in view of the fact that, impact of climate change can be studied in a homogeneous region without much variation, which can be helpful in studies related to water resources planning and management. 10 IMD (Indian Meteorological Department) stations, situated in various regions of the North-east, have been selected for making the clusters. The results of the Fuzzy C-Means (FCM) analysis show different clustering patterns for different conditions. From the analysis and comparison it can be concluded that nonconventional method of using GCM data is somehow giving better results than the others. However, further analysis can be done by taking daily data instead of monthly means to reduce the effect of standardization.

Keywords: Climate change, conventional and nonconventional methods of clustering, FCM analysis, homogeneous regions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2211
7555 Performance Comparison of Particle Swarm Optimization with Traditional Clustering Algorithms used in Self-Organizing Map

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. It can reveal structure in data sets through data visualization that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOM, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of an adaptive heuristic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOM. The application of our method to several standard data sets demonstrates its feasibility. PSO algorithm utilizes a so-called U-matrix of SOM to determine cluster boundaries; the results of this novel automatic method compare very favorably to boundary detection through traditional algorithms namely k-means and hierarchical based approach which are normally used to interpret the output of SOM.

Keywords: cluster boundaries, clustering, code vectors, data mining, particle swarm optimization, self-organizing maps, U-matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1908
7554 A Spanning Tree for Enhanced Cluster Based Routing in Wireless Sensor Network

Authors: M. Saravanan, M. Madheswaran

Abstract:

Wireless Sensor Network (WSN) clustering architecture enables features like network scalability, communication overhead reduction, and fault tolerance. After clustering, aggregated data is transferred to data sink and reducing unnecessary, redundant data transfer. It reduces nodes transmitting, and so saves energy consumption. Also, it allows scalability for many nodes, reduces communication overhead, and allows efficient use of WSN resources. Clustering based routing methods manage network energy consumption efficiently. Building spanning trees for data collection rooted at a sink node is a fundamental data aggregation method in sensor networks. The problem of determining Cluster Head (CH) optimal number is an NP-Hard problem. In this paper, we combine cluster based routing features for cluster formation and CH selection and use Minimum Spanning Tree (MST) for intra-cluster communication. The proposed method is based on optimizing MST using Simulated Annealing (SA). In this work, normalized values of mobility, delay, and remaining energy are considered for finding optimal MST. Simulation results demonstrate the effectiveness of the proposed method in improving the packet delivery ratio and reducing the end to end delay.

Keywords: Wireless sensor network, clustering, minimum spanning tree, genetic algorithm, low energy adaptive clustering hierarchy, simulated annealing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1786
7553 An Energy Aware Data Aggregation in Wireless Sensor Network Using Connected Dominant Set

Authors: M. Santhalakshmi, P Suganthi

Abstract:

Wireless Sensor Networks (WSNs) have many advantages. Their deployment is easier and faster than wired sensor networks or other wireless networks, as they do not need fixed infrastructure. Nodes are partitioned into many small groups named clusters to aggregate data through network organization. WSN clustering guarantees performance achievement of sensor nodes. Sensor nodes energy consumption is reduced by eliminating redundant energy use and balancing energy sensor nodes use over a network. The aim of such clustering protocols is to prolong network life. Low Energy Adaptive Clustering Hierarchy (LEACH) is a popular protocol in WSN. LEACH is a clustering protocol in which the random rotations of local cluster heads are utilized in order to distribute energy load among all sensor nodes in the network. This paper proposes Connected Dominant Set (CDS) based cluster formation. CDS aggregates data in a promising approach for reducing routing overhead since messages are transmitted only within virtual backbone by means of CDS and also data aggregating lowers the ratio of responding hosts to the hosts existing in virtual backbones. CDS tries to increase networks lifetime considering such parameters as sensors lifetime, remaining and consumption energies in order to have an almost optimal data aggregation within networks. Experimental results proved CDS outperformed LEACH regarding number of cluster formations, average packet loss rate, average end to end delay, life computation, and remaining energy computation.

Keywords: Wireless sensor network, connected dominant set, clustering, data aggregation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1129
7552 Hierarchical Clustering Analysis with SOM Networks

Authors: Diego Ordonez, Carlos Dafonte, Minia Manteiga, Bernardino Arcayy

Abstract:

This work presents a neural network model for the clustering analysis of data based on Self Organizing Maps (SOM). The model evolves during the training stage towards a hierarchical structure according to the input requirements. The hierarchical structure symbolizes a specialization tool that provides refinements of the classification process. The structure behaves like a single map with different resolutions depending on the region to analyze. The benefits and performance of the algorithm are discussed in application to the Iris dataset, a classical example for pattern recognition.

Keywords: Neural networks, Self-organizing feature maps, Hierarchicalsystems, Pattern clustering methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1946
7551 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method

Authors: P. Ashok, G. M. Kadhar Nawaz

Abstract:

Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.

Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1412
7550 Binary Classification Tree with Tuned Observation-based Clustering

Authors: Maythapolnun Athimethphat, Boontarika Lerteerawong

Abstract:

There are several approaches for handling multiclass classification. Aside from one-against-one (OAO) and one-against-all (OAA), hierarchical classification technique is also commonly used. A binary classification tree is a hierarchical classification structure that breaks down a k-class problem into binary sub-problems, each solved by a binary classifier. In each node, a set of classes is divided into two subsets. A good class partition should be able to group similar classes together. Many algorithms measure similarity in term of distance between class centroids. Classes are grouped together by a clustering algorithm when distances between their centroids are small. In this paper, we present a binary classification tree with tuned observation-based clustering (BCT-TOB) that finds a class partition by performing clustering on observations instead of class centroids. A merging step is introduced to merge any insignificant class split. The experiment shows that performance of BCT-TOB is comparable to other algorithms.

Keywords: multiclass classification, hierarchical classification, binary classification tree, clustering, observation-based clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1730
7549 3D Mesh Coarsening via Uniform Clustering

Authors: Shuhua Lai, Kairui Chen

Abstract:

In this paper, we present a fast and efficient mesh coarsening algorithm for 3D triangular meshes. Theis approach can be applied to very complex 3D meshes of arbitrary topology and with millions of vertices. The algorithm is based on the clustering of the input mesh elements, which divides the faces of an input mesh into a given number of clusters for clustering purpose by approximating the Centroidal Voronoi Tessellation of the input mesh. Once a clustering is achieved, it provides us an efficient way to construct uniform tessellations, and therefore leads to good coarsening of polygonal meshes. With proliferation of 3D scanners, this coarsening algorithm is particularly useful for reverse engineering applications of 3D models, which in many cases are dense, non-uniform, irregular and arbitrary topology. Examples demonstrating effectiveness of the new algorithm are also included in the paper.

Keywords: Coarsening, mesh clustering, shape approximation, mesh simplification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1404
7548 Spatial Clustering Model of Vessel Trajectory to Extract Sailing Routes Based on AIS Data

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

The automatic extraction of shipping routes is advantageous for intelligent traffic management systems to identify events and support decision-making in maritime surveillance. At present, there is a high demand for the extraction of maritime traffic networks that resemble the real traffic of vessels accurately, which is valuable for further analytical processing tasks for vessels trajectories (e.g., naval routing and voyage planning, anomaly detection, destination prediction, time of arrival estimation). With the help of big data and processing huge amounts of vessels’ trajectory data, it is possible to learn these shipping routes from the navigation history of past behaviour of other, similar ships that were travelling in a given area. In this paper, we propose a spatial clustering model of vessels’ trajectories (SPTCLUST) to extract spatial representations of sailing routes from historical Automatic Identification System (AIS) data. The whole model consists of three main parts: data preprocessing, path finding, and route extraction, which consists of clustering and representative trajectory extraction. The proposed clustering method provides techniques to overcome the problems of: (i) optimal input parameters selection; (ii) the high complexity of processing a huge volume of multidimensional data; (iii) and the spatial representation of complete representative trajectory detection in the context of trajectory clustering algorithms. The experimental evaluation showed the effectiveness of the proposed model by using a real-world AIS dataset from the Port of Halifax. The results contribute to further understanding of shipping route patterns. This could aid surveillance authorities in stable and sustainable vessel traffic management.

Keywords: Vessel trajectory clustering, trajectory mining, Spatial Clustering, marine intelligent navigation, maritime traffic network extraction, sdailing routes extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 453
7547 A Mobile Agent-based Clustering Data Fusion Algorithm in WSN

Authors: Xiangbin Zhu, Wenjuan Zhang

Abstract:

In wireless sensor networks,the mobile agent technology is used in data fusion. According to the node residual energy and the results of partial integration,we design the node clustering algorithm. Optimization of mobile agent in the routing within the cluster strategy for wireless sensor networks to further reduce the amount of data transfer. Through the experiments, using mobile agents in the integration process within the cluster can be reduced the path loss in some extent.

Keywords: wireless sensor networks, data fusion, mobile agent

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1510
7546 Simultaneous Clustering and Feature Selection Method for Gene Expression Data

Authors: T. Chandrasekhar, K. Thangavel, E. N. Sathishkumar

Abstract:

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.

Keywords: Clustering, Feature selection, Gene expression data, Quick reduct.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1967
7545 Initializing K-Means using Genetic Algorithms

Authors: Bashar Al-Shboul, Sung-Hyon Myaeng

Abstract:

K-Means (KM) is considered one of the major algorithms widely used in clustering. However, it still has some problems, and one of them is in its initialization step where it is normally done randomly. Another problem for KM is that it converges to local minima. Genetic algorithms are one of the evolutionary algorithms inspired from nature and utilized in the field of clustering. In this paper, we propose two algorithms to solve the initialization problem, Genetic Algorithm Initializes KM (GAIK) and KM Initializes Genetic Algorithm (KIGA). To show the effectiveness and efficiency of our algorithms, a comparative study was done among GAIK, KIGA, Genetic-based Clustering Algorithm (GCA), and FCM [19].

Keywords: Clustering, Genetic Algorithms, K-means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2101