Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2674

Search results for: Clustering Techniques

2674 A Survey: Clustering Ensembles Techniques

Authors: Reza Ghaemi , Md. Nasir Sulaiman , Hamidah Ibrahim , Norwati Mustapha

Abstract:

The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.

Keywords: Clustering Ensembles, Combinational Algorithm, Consensus Function, Unsupervised Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2964
2673 Application of a New Hybrid Optimization Algorithm on Cluster Analysis

Authors: T. Niknam, M. Nayeripour, B.Bahmani Firouzi

Abstract:

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Keywords: Ant Colony Optimization (ACO), Data clustering, Hybrid evolutionary optimization algorithm, K-means clustering, Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1925
2672 Analysis of Diverse Clustering Tools in Data Mining

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.

Keywords: Cluster Analysis, Clustering Algorithms, Clustering Techniques, Association, Visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1893
2671 Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data

Authors: Cristina G. Dascâlu, Corina Dima Cozma, Elena Carmen Cotrutz

Abstract:

The medical data statistical analysis often requires the using of some special techniques, because of the particularities of these data. The principal components analysis and the data clustering are two statistical methods for data mining very useful in the medical field, the first one as a method to decrease the number of studied parameters, and the second one as a method to analyze the connections between diagnosis and the data about the patient-s condition. In this paper we investigate the implications obtained from a specific data analysis technique: the data clustering preceded by a selection of the most relevant parameters, made using the principal components analysis. Our assumption was that, using the principal components analysis before data clustering - in order to select and to classify only the most relevant parameters – the accuracy of clustering is improved, but the practical results showed the opposite fact: the clustering accuracy decreases, with a percentage approximately equal with the percentage of information loss reported by the principal components analysis.

Keywords: Data clustering, medical data, principal components analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1217
2670 ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset

Authors: Sunita Jahirabadkar, Parag Kulkarni

Abstract:

Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.

Keywords: Density based clustering, high dimensional data, subspace clustering, dynamic parameter setting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751
2669 Journey on Image Clustering Based on Color Composition

Authors: Achmad Nizar Hidayanto, Elisabeth Martha Koeanan

Abstract:

Image clustering is a process of grouping images based on their similarity. The image clustering usually uses the color component, texture, edge, shape, or mixture of two components, etc. This research aims to explore image clustering using color composition. In order to complete this image clustering, three main components should be considered, which are color space, image representation (feature extraction), and clustering method itself. We aim to explore which composition of these factors will produce the best clustering results by combining various techniques from the three components. The color spaces use RGB, HSV, and L*a*b* method. The image representations use Histogram and Gaussian Mixture Model (GMM), whereas the clustering methods use KMeans and Agglomerative Hierarchical Clustering algorithm. The results of the experiment show that GMM representation is better combined with RGB and L*a*b* color space, whereas Histogram is better combined with HSV. The experiments also show that K-Means is better than Agglomerative Hierarchical for images clustering.

Keywords: Image clustering, feature extraction, RGB, HSV, L*a*b*, Gaussian Mixture Model (GMM), histogram, Agglomerative Hierarchical Clustering (AHC), K-Means, Expectation-Maximization (EM).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1920
2668 Using Data Clustering in Oral Medicine

Authors: Fahad Shahbaz Khan, Rao Muhammad Anwer, Olof Torgersson

Abstract:

The vast amount of information hidden in huge databases has created tremendous interests in the field of data mining. This paper examines the possibility of using data clustering techniques in oral medicine to identify functional relationships between different attributes and classification of similar patient examinations. Commonly used data clustering algorithms have been reviewed and as a result several interesting results have been gathered.

Keywords: Oral Medicine, Cluto, Data Clustering, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1709
2667 Fuzzy Types Clustering for Microarray Data

Authors: Seo Young Kim, Tai Myong Choi

Abstract:

The main goal of microarray experiments is to quantify the expression of every object on a slide as precisely as possible, with a further goal of clustering the objects. Recently, many studies have discussed clustering issues involving similar patterns of gene expression. This paper presents an application of fuzzy-type methods for clustering DNA microarray data that can be applied to typical comparisons. Clustering and analyses were performed on microarray and simulated data. The results show that fuzzy-possibility c-means clustering substantially improves the findings obtained by others.

Keywords: Clustering, microarray data, Fuzzy-type clustering, Validation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1256
2666 IMDC: An Image-Mapped Data Clustering Technique for Large Datasets

Authors: Faruq A. Al-Omari, Nabeel I. Al-Fayoumi

Abstract:

In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.

Keywords: Data clustering, Data mining, Image-mapping, Pattern discovery, Predictive analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1229
2665 A Genetic Algorithm for Clustering on Image Data

Authors: Qin Ding, Jim Gasvoda

Abstract:

Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.

Keywords: Clustering, data mining, genetic algorithm, image data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1780
2664 MCOKE: Multi-Cluster Overlapping K-Means Extension Algorithm

Authors: Said Baadel, Fadi Thabtah, Joan Lu

Abstract:

Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold be defined a priori which can be difficult to determine by novice users.

Keywords: Data mining, k-means, MCOKE, overlapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2362
2663 Similarity Measures and Weighted Fuzzy C-Mean Clustering Algorithm

Authors: Bainian Li, Kongsheng Zhang, Jian Xu

Abstract:

In this paper we study the fuzzy c-mean clustering algorithm combined with principal components method. Demonstratively analysis indicate that the new clustering method is well rather than some clustering algorithms. We also consider the validity of clustering method.

Keywords: FCM algorithm, Principal Components Analysis, Clustervalidity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1459
2662 Grid-based Supervised Clustering - GBSC

Authors: Pornpimol Bungkomkhun, Surapong Auwatanamongkol

Abstract:

This paper presents a supervised clustering algorithm, namely Grid-Based Supervised Clustering (GBSC), which is able to identify clusters of any shapes and sizes without presuming any canonical form for data distribution. The GBSC needs no prespecified number of clusters, is insensitive to the order of the input data objects, and is capable of handling outliers. Built on the combination of grid-based clustering and density-based clustering, under the assistance of the downward closure property of density used in bottom-up subspace clustering, the GBSC can notably reduce its search space to avoid the memory confinement situation during its execution. On two-dimension synthetic datasets, the GBSC can identify clusters with different shapes and sizes correctly. The GBSC also outperforms other five supervised clustering algorithms when the experiments are performed on some UCI datasets.

Keywords: supervised clustering, grid-based clustering, subspace clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1349
2661 Exponential Particle Swarm Optimization Approach for Improving Data Clustering

Authors: Neveen I. Ghali, Nahed El-Dessouki, Mervat A. N., Lamiaa Bakrawi

Abstract:

In this paper we use exponential particle swarm optimization (EPSO) to cluster data. Then we compare between (EPSO) clustering algorithm which depends on exponential variation for the inertia weight and particle swarm optimization (PSO) clustering algorithm which depends on linear inertia weight. This comparison is evaluated on five data sets. The experimental results show that EPSO clustering algorithm increases the possibility to find the optimal positions as it decrease the number of failure. Also show that (EPSO) clustering algorithm has a smaller quantization error than (PSO) clustering algorithm, i.e. (EPSO) clustering algorithm more accurate than (PSO) clustering algorithm.

Keywords: Particle swarm optimization, data clustering, exponential PSO.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1393
2660 A Comparison of Fuzzy Clustering Algorithms to Cluster Web Messages

Authors: Sara El Manar El Bouanani, Ismail Kassou

Abstract:

Our objective in this paper is to propose an approach capable of clustering web messages. The clustering is carried out by assigning, with a certain probability, texts written by the same web user to the same cluster based on Stylometric features and using fuzzy clustering algorithms. Focus in the present work is on comparing the most popular algorithms in fuzzy clustering theory namely, Fuzzy C-means, Possibilistic C-means and Fuzzy Possibilistic C-Means.

Keywords: Authorship detection, fuzzy clustering, profiling, stylometric features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1802
2659 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.

Keywords: Clustering, method, algorithm, hierarchical, survey.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2820
2658 A New Algorithm for Cluster Initialization

Authors: Moth'd Belal. Al-Daoud

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.

Keywords: clustering, k-means, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1819
2657 Analysis of Diverse Cluster Ensemble Techniques

Authors: S. Sarumathi, N. Shanthi, P. Ranjetha

Abstract:

Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.

Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1279
2656 A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Keywords: Clustering, Cluster Ensemble Methods, Coassociation matrix, Consensus Function, Median Partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1822
2655 A New Evolutionary Algorithm for Cluster Analysis

Authors: B.Bahmani Firouzi, T. Niknam, M. Nayeripour

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.

Keywords: Data clustering, Hybrid evolutionary optimization algorithm, K-means algorithm, Simulated Annealing (SA), Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1991
2654 Ontology-based Concept Weighting for Text Documents

Authors: Hmway Hmway Tar, Thi Thi Soe Nyaunt

Abstract:

Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.

Keywords: Clustering, Concept Weight, Document clustering, Feature Selection, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2166
2653 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1705
2652 Chemical Reaction Algorithm for Expectation Maximization Clustering

Authors: Li Ni, Pen ManMan, Li KenLi

Abstract:

Clustering is an intensive research for some years because of its multifaceted applications, such as biology, information retrieval, medicine, business and so on. The expectation maximization (EM) is a kind of algorithm framework in clustering methods, one of the ten algorithms of machine learning. Traditionally, optimization of objective function has been the standard approach in EM. Hence, research has investigated the utility of evolutionary computing and related techniques in the regard. Chemical Reaction Optimization (CRO) is a recently established method. So the property embedded in CRO is used to solve optimization problems. This paper presents an algorithm framework (EM-CRO) with modified CRO operators based on EM cluster problems. The hybrid algorithm is mainly to solve the problem of initial value sensitivity of the objective function optimization clustering algorithm. Our experiments mainly take the EM classic algorithm:k-means and fuzzy k-means as an example, through the CRO algorithm to optimize its initial value, get K-means-CRO and FKM-CRO algorithm. The experimental results of them show that there is improved efficiency for solving objective function optimization clustering problems.

Keywords: Chemical reaction optimization, expectation maximization, initial, objective function clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1002
2651 Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns

Authors: Haider A Ramadhan, Khalil Shihab

Abstract:

Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.

Keywords: Data mining, knowledge discovery, clustering, dataanalysis, Web log analysis, theme based searching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1210
2650 A Hybrid Method for Determination of Effective Poles Using Clustering Dominant Pole Algorithm

Authors: Anuj Abraham, N. Pappa, Daniel Honc, Rahul Sharma

Abstract:

In this paper, an analysis of some model order reduction techniques is presented. A new hybrid algorithm for model order reduction of linear time invariant systems is compared with the conventional techniques namely Balanced Truncation, Hankel Norm reduction and Dominant Pole Algorithm (DPA). The proposed hybrid algorithm is known as Clustering Dominant Pole Algorithm (CDPA), is able to compute the full set of dominant poles and its cluster center efficiently. The dominant poles of a transfer function are specific eigenvalues of the state space matrix of the corresponding dynamical system. The effectiveness of this novel technique is shown through the simulation results.

Keywords: Balanced truncation, Clustering, Dominant pole, Hankel norm, Model reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2169
2649 A K-Means Based Clustering Approach for Finding Faulty Modules in Open Source Software Systems

Authors: Parvinder S. Sandhu, Jagdeep Singh, Vikas Gupta, Mandeep Kaur, Sonia Manhas, Ramandeep Sidhu

Abstract:

Prediction of fault-prone modules provides one way to support software quality engineering. Clustering is used to determine the intrinsic grouping in a set of unlabeled data. Among various clustering techniques available in literature K-Means clustering approach is most widely being used. This paper introduces K-Means based Clustering approach for software finding the fault proneness of the Object-Oriented systems. The contribution of this paper is that it has used Metric values of JEdit open source software for generation of the rules for the categorization of software modules in the categories of Faulty and non faulty modules and thereafter empirically validation is performed. The results are measured in terms of accuracy of prediction, probability of Detection and Probability of False Alarms.

Keywords: K-Means, Software Fault, Classification, ObjectOriented Metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1991
2648 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2272
2647 Multi-Agent Systems for Intelligent Clustering

Authors: Jung-Eun Park, Kyung-Whan Oh

Abstract:

Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.

Keywords: Intelligent Clustering, Multi-Agent System, PCA, SOM, VC(Variance Criterion)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1471
2646 Sample-Weighted Fuzzy Clustering with Regularizations

Authors: Miin-Shen Yang, Yee-Shan Pan

Abstract:

Although there have been many researches in cluster analysis to consider on feature weights, little effort is made on sample weights. Recently, Yu et al. (2011) considered a probability distribution over a data set to represent its sample weights and then proposed sample-weighted clustering algorithms. In this paper, we give a sample-weighted version of generalized fuzzy clustering regularization (GFCR), called the sample-weighted GFCR (SW-GFCR). Some experiments are considered. These experimental results and comparisons demonstrate that the proposed SW-GFCR is more effective than the most clustering algorithms.

Keywords: Clustering; fuzzy c-means, fuzzy clustering, sample weights, regularization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1516
2645 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: Microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 910