Search results for: spectral clustering.
765 Scattering Operator and Spectral Clustering for Ultrasound Images: Application on Deep Venous Thrombi
Authors: Thibaud Berthomier, Ali Mansour, Luc Bressollette, Frédéric Le Roy, Dominique Mottier, Léo Fréchier, Barthélémy Hermenault
Abstract:
Deep Venous Thrombosis (DVT) occurs when a thrombus is formed within a deep vein (most often in the legs). This disease can be deadly if a part or the whole thrombus reaches the lung and causes a Pulmonary Embolism (PE). This disorder, often asymptomatic, has multifactorial causes: immobilization, surgery, pregnancy, age, cancers, and genetic variations. Our project aims to relate the thrombus epidemiology (origins, patient predispositions, PE) to its structure using ultrasound images. Ultrasonography and elastography were collected using Toshiba Aplio 500 at Brest Hospital. This manuscript compares two classification approaches: spectral clustering and scattering operator. The former is based on the graph and matrix theories while the latter cascades wavelet convolutions with nonlinear modulus and averaging operators.Keywords: Deep venous thrombosis, ultrasonography, elastography, scattering operator, wavelet, spectral clustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1178764 Visualization and Indexing of Spectral Databases
Authors: Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, Janos Abonyi
Abstract:
On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Keywords: indexing high dimensional databases, dimensional reduction, clustering, similarity, k-nn algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1769763 Fuzzy Types Clustering for Microarray Data
Authors: Seo Young Kim, Tai Myong Choi
Abstract:
The main goal of microarray experiments is to quantify the expression of every object on a slide as precisely as possible, with a further goal of clustering the objects. Recently, many studies have discussed clustering issues involving similar patterns of gene expression. This paper presents an application of fuzzy-type methods for clustering DNA microarray data that can be applied to typical comparisons. Clustering and analyses were performed on microarray and simulated data. The results show that fuzzy-possibility c-means clustering substantially improves the findings obtained by others.Keywords: Clustering, microarray data, Fuzzy-type clustering, Validation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1522762 The Ratios between the Spectral Norm, the Numerical Radius and the Spectral Radius
Authors: Kui Du
Abstract:
Recently, Uhlig [Numer. Algorithms, 52(3):335-353, 2009] proposed open questions about the ratios between the spectral norm, the numerical radius and the spectral radius of a square matrix. In this note, we provide some observations to answer these questions.
Keywords: Spectral norm, Numerical radius, Spectral radius, Ratios
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1826761 Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures
Authors: Do Phuc, Nguyen Thi Kim Phung
Abstract:
In this paper, we represent protein structure by using graph. A protein structure database will become a graph database. Each graph is represented by a spectral vector. We use Jacobi rotation algorithm to calculate the eigenvalues of the normalized Laplacian representation of adjacency matrix of graph. To measure the similarity between two graphs, we calculate the Euclidean distance between two graph spectral vectors. To cluster the graphs, we use M-tree with the Euclidean distance to cluster spectral vectors. Besides, M-tree can be used for graph searching in graph database. Our proposal method was tested with graph database of 100 graphs representing 100 protein structures downloaded from Protein Data Bank (PDB) and we compare the result with the SCOP hierarchical structure.Keywords: Eigenvalues, m-tree, graph database, protein structure, spectra graph theory.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1657760 Similarity Measures and Weighted Fuzzy C-Mean Clustering Algorithm
Authors: Bainian Li, Kongsheng Zhang, Jian Xu
Abstract:
In this paper we study the fuzzy c-mean clustering algorithm combined with principal components method. Demonstratively analysis indicate that the new clustering method is well rather than some clustering algorithms. We also consider the validity of clustering method.
Keywords: FCM algorithm, Principal Components Analysis, Clustervalidity
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1725759 Grid-based Supervised Clustering - GBSC
Authors: Pornpimol Bungkomkhun, Surapong Auwatanamongkol
Abstract:
This paper presents a supervised clustering algorithm, namely Grid-Based Supervised Clustering (GBSC), which is able to identify clusters of any shapes and sizes without presuming any canonical form for data distribution. The GBSC needs no prespecified number of clusters, is insensitive to the order of the input data objects, and is capable of handling outliers. Built on the combination of grid-based clustering and density-based clustering, under the assistance of the downward closure property of density used in bottom-up subspace clustering, the GBSC can notably reduce its search space to avoid the memory confinement situation during its execution. On two-dimension synthetic datasets, the GBSC can identify clusters with different shapes and sizes correctly. The GBSC also outperforms other five supervised clustering algorithms when the experiments are performed on some UCI datasets.Keywords: supervised clustering, grid-based clustering, subspace clustering
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1610758 Exponential Particle Swarm Optimization Approach for Improving Data Clustering
Authors: Neveen I. Ghali, Nahed El-Dessouki, Mervat A. N., Lamiaa Bakrawi
Abstract:
In this paper we use exponential particle swarm optimization (EPSO) to cluster data. Then we compare between (EPSO) clustering algorithm which depends on exponential variation for the inertia weight and particle swarm optimization (PSO) clustering algorithm which depends on linear inertia weight. This comparison is evaluated on five data sets. The experimental results show that EPSO clustering algorithm increases the possibility to find the optimal positions as it decrease the number of failure. Also show that (EPSO) clustering algorithm has a smaller quantization error than (PSO) clustering algorithm, i.e. (EPSO) clustering algorithm more accurate than (PSO) clustering algorithm.Keywords: Particle swarm optimization, data clustering, exponential PSO.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1691757 A Comparison of Fuzzy Clustering Algorithms to Cluster Web Messages
Authors: Sara El Manar El Bouanani, Ismail Kassou
Abstract:
Our objective in this paper is to propose an approach capable of clustering web messages. The clustering is carried out by assigning, with a certain probability, texts written by the same web user to the same cluster based on Stylometric features and using fuzzy clustering algorithms. Focus in the present work is on comparing the most popular algorithms in fuzzy clustering theory namely, Fuzzy C-means, Possibilistic C-means and Fuzzy Possibilistic C-Means.
Keywords: Authorship detection, fuzzy clustering, profiling, stylometric features.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2053756 Analysis of Diverse Clustering Tools in Data Mining
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
Abstract:
Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.
Keywords: Cluster Analysis, Clustering Algorithms, Clustering Techniques, Association, Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2202755 Hierarchical Clustering Algorithms in Data Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.Keywords: Clustering, method, algorithm, hierarchical, survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3378754 A Survey: Clustering Ensembles Techniques
Authors: Reza Ghaemi , Md. Nasir Sulaiman , Hamidah Ibrahim , Norwati Mustapha
Abstract:
The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.Keywords: Clustering Ensembles, Combinational Algorithm, Consensus Function, Unsupervised Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3454753 Ontology-based Concept Weighting for Text Documents
Authors: Hmway Hmway Tar, Thi Thi Soe Nyaunt
Abstract:
Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.Keywords: Clustering, Concept Weight, Document clustering, Feature Selection, Ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2406752 A PSO-based End-Member Selection Method for Spectral Unmixing of Multispectral Satellite Images
Authors: Mahamed G.H. Omran, Andries P Engelbrecht, Ayed Salman
Abstract:
An end-member selection method for spectral unmixing that is based on Particle Swarm Optimization (PSO) is developed in this paper. The algorithm uses the K-means clustering algorithm and a method of dynamic selection of end-members subsets to find the appropriate set of end-members for a given set of multispectral images. The proposed algorithm has been successfully applied to test image sets from various platforms such as LANDSAT 5 MSS and NOAA's AVHRR. The experimental results of the proposed algorithm are encouraging. The influence of different values of the algorithm control parameters on performance is studied. Furthermore, the performance of different versions of PSO is also investigated.
Keywords: End-members selection, multispectral satellite imagery, particle swarm optimization, spectral unmixing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2102751 Journey on Image Clustering Based on Color Composition
Authors: Achmad Nizar Hidayanto, Elisabeth Martha Koeanan
Abstract:
Image clustering is a process of grouping images based on their similarity. The image clustering usually uses the color component, texture, edge, shape, or mixture of two components, etc. This research aims to explore image clustering using color composition. In order to complete this image clustering, three main components should be considered, which are color space, image representation (feature extraction), and clustering method itself. We aim to explore which composition of these factors will produce the best clustering results by combining various techniques from the three components. The color spaces use RGB, HSV, and L*a*b* method. The image representations use Histogram and Gaussian Mixture Model (GMM), whereas the clustering methods use KMeans and Agglomerative Hierarchical Clustering algorithm. The results of the experiment show that GMM representation is better combined with RGB and L*a*b* color space, whereas Histogram is better combined with HSV. The experiments also show that K-Means is better than Agglomerative Hierarchical for images clustering.Keywords: Image clustering, feature extraction, RGB, HSV, L*a*b*, Gaussian Mixture Model (GMM), histogram, Agglomerative Hierarchical Clustering (AHC), K-Means, Expectation-Maximization (EM).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2206750 A Reconfigurable Distributed Multiagent System Optimized for Scalability
Authors: Summiya Moheuddin, Afzel Noore, Muhammad Choudhry
Abstract:
This paper proposes a novel solution for optimizing the size and communication overhead of a distributed multiagent system without compromising the performance. The proposed approach addresses the challenges of scalability especially when the multiagent system is large. A modified spectral clustering technique is used to partition a large network into logically related clusters. Agents are assigned to monitor dedicated clusters rather than monitor each device or node. The proposed scalable multiagent system is implemented using JADE (Java Agent Development Environment) for a large power system. The performance of the proposed topologyindependent decentralized multiagent system and the scalable multiagent system is compared by comprehensively simulating different fault scenarios. The time taken for reconfiguration, the overall computational complexity, and the communication overhead incurred are computed. The results of these simulations show that the proposed scalable multiagent system uses fewer agents efficiently, makes faster decisions to reconfigure when a fault occurs, and incurs significantly less communication overhead.Keywords: Multiagent system, scalable design, spectral clustering, reconfiguration.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1381749 Multi-Agent Systems for Intelligent Clustering
Authors: Jung-Eun Park, Kyung-Whan Oh
Abstract:
Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.
Keywords: Intelligent Clustering, Multi-Agent System, PCA, SOM, VC(Variance Criterion)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1727748 Sample-Weighted Fuzzy Clustering with Regularizations
Authors: Miin-Shen Yang, Yee-Shan Pan
Abstract:
Although there have been many researches in cluster analysis to consider on feature weights, little effort is made on sample weights. Recently, Yu et al. (2011) considered a probability distribution over a data set to represent its sample weights and then proposed sample-weighted clustering algorithms. In this paper, we give a sample-weighted version of generalized fuzzy clustering regularization (GFCR), called the sample-weighted GFCR (SW-GFCR). Some experiments are considered. These experimental results and comparisons demonstrate that the proposed SW-GFCR is more effective than the most clustering algorithms.
Keywords: Clustering; fuzzy c-means, fuzzy clustering, sample weights, regularization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1767747 Application of a New Hybrid Optimization Algorithm on Cluster Analysis
Authors: T. Niknam, M. Nayeripour, B.Bahmani Firouzi
Abstract:
Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.
Keywords: Ant Colony Optimization (ACO), Data clustering, Hybrid evolutionary optimization algorithm, K-means clustering, Particle Swarm Optimization (PSO).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2199746 A Similarity Measure for Clustering and its Applications
Authors: Guadalupe J. Torres, Ram B. Basnet, Andrew H. Sung, Srinivas Mukkamala, Bernardete M. Ribeiro
Abstract:
This paper introduces a measure of similarity between two clusterings of the same dataset produced by two different algorithms, or even the same algorithm (K-means, for instance, with different initializations usually produce different results in clustering the same dataset). We then apply the measure to calculate the similarity between pairs of clusterings, with special interest directed at comparing the similarity between various machine clusterings and human clustering of datasets. The similarity measure thus can be used to identify the best (in terms of most similar to human) clustering algorithm for a specific problem at hand. Experimental results pertaining to the text categorization problem of a Portuguese corpus (wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The significance and other potential applications of the proposed measure are discussed.Keywords: Clustering Algorithms, Clustering Applications, Similarity Measures, Text Clustering
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1573745 Clustering in WSN Based on Minimum Spanning Tree Using Divide and Conquer Approach
Authors: Uttam Vijay, Nitin Gupta
Abstract:
Due to heavy energy constraints in WSNs clustering is an efficient way to manage the energy in sensors. There are many methods already proposed in the area of clustering and research is still going on to make clustering more energy efficient. In our paper we are proposing a minimum spanning tree based clustering using divide and conquer approach. The MST based clustering was first proposed in 1970’s for large databases. Here we are taking divide and conquer approach and implementing it for wireless sensor networks with the constraints attached to the sensor networks. This Divide and conquer approach is implemented in a way that we don’t have to construct the whole MST before clustering but we just find the edge which will be the part of the MST to a corresponding graph and divide the graph in clusters there itself if that edge from the graph can be removed judging on certain constraints and hence saving lot of computation.
Keywords: Algorithm, Clustering, Edge-Weighted Graph, Weighted-LEACH.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2477744 Minimal Spanning Tree based Fuzzy Clustering
Authors: Ágnes Vathy-Fogarassy, Balázs Feil, János Abonyi
Abstract:
Most of fuzzy clustering algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they suffer from numerical problems, like sensitiveness to the initialization, etc. This paper studies the synergistic combination of the hierarchical and graph theoretic minimal spanning tree based clustering algorithm with the partitional Gath-Geva fuzzy clustering algorithm. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically defined parameters of these algorithms to decrease the influence of the user on the clustering results. For the analysis of the resulted fuzzy clusters a new fuzzy similarity measure based tool has been presented. The calculated similarities of the clusters can be used for the hierarchical clustering of the resulted fuzzy clusters, which information is useful for cluster merging and for the visualization of the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not suffer from the numerical problems of the classical Gath-Geva fuzzy clustering algorithm.Keywords: Clustering, fuzzy clustering, minimal spanning tree, cluster validity, fuzzy similarity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2410743 Using Data Clustering in Oral Medicine
Authors: Fahad Shahbaz Khan, Rao Muhammad Anwer, Olof Torgersson
Abstract:
The vast amount of information hidden in huge databases has created tremendous interests in the field of data mining. This paper examines the possibility of using data clustering techniques in oral medicine to identify functional relationships between different attributes and classification of similar patient examinations. Commonly used data clustering algorithms have been reviewed and as a result several interesting results have been gathered.Keywords: Oral Medicine, Cluto, Data Clustering, Data Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1978742 New Approach to Spectral Analysis of High Bit Rate PCM Signals
Authors: J. P. Dubois
Abstract:
Pulse code modulation is a widespread technique in digital communication with significant impact on existing modern and proposed future communication technologies. Its widespread utilization is due to its simplicity and attractive spectral characteristics. In this paper, we present a new approach to the spectral analysis of PCM signals using Riemann-Stieltjes integrals, which is very accurate for high bit rates. This approach can serve as a model for similar spectral analysis of other competing modulation schemes.Keywords: Coding, discrete Fourier, power spectral density, pulse code modulation, Riemann-Stieltjes integrals.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1592741 A Genetic Algorithm for Clustering on Image Data
Authors: Qin Ding, Jim Gasvoda
Abstract:
Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.
Keywords: Clustering, data mining, genetic algorithm, image data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2053740 A Modified Fuzzy C-Means Algorithm for Natural Data Exploration
Authors: Binu Thomas, Raju G., Sonam Wangmo
Abstract:
In Data mining, Fuzzy clustering algorithms have demonstrated advantage over crisp clustering algorithms in dealing with the challenges posed by large collections of vague and uncertain natural data. This paper reviews concept of fuzzy logic and fuzzy clustering. The classical fuzzy c-means algorithm is presented and its limitations are highlighted. Based on the study of the fuzzy c-means algorithm and its extensions, we propose a modification to the cmeans algorithm to overcome the limitations of it in calculating the new cluster centers and in finding the membership values with natural data. The efficiency of the new modified method is demonstrated on real data collected for Bhutan-s Gross National Happiness (GNH) program.Keywords: Adaptive fuzzy clustering, clustering, fuzzy logic, fuzzy clustering, c-means.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1992739 A Text Clustering System based on k-means Type Subspace Clustering and Ontology
Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang
Abstract:
This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2462738 ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset
Authors: Sunita Jahirabadkar, Parag Kulkarni
Abstract:
Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.
Keywords: Density based clustering, high dimensional data, subspace clustering, dynamic parameter setting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2018737 Energy Efficient Clustering Algorithm with Global and Local Re-clustering for Wireless Sensor Networks
Authors: Ashanie Guanathillake, Kithsiri Samarasinghe
Abstract:
Wireless Sensor Networks consist of inexpensive, low power sensor nodes deployed to monitor the environment and collect data. Gathering information in an energy efficient manner is a critical aspect to prolong the network lifetime. Clustering algorithms have an advantage of enhancing the network lifetime. Current clustering algorithms usually focus on global re-clustering and local re-clustering separately. This paper, proposed a combination of those two reclustering methods to reduce the energy consumption of the network. Furthermore, the proposed algorithm can apply to homogeneous as well as heterogeneous wireless sensor networks. In addition, the cluster head rotation happens, only when its energy drops below a dynamic threshold value computed by the algorithm. The simulation result shows that the proposed algorithm prolong the network lifetime compared to existing algorithms.
Keywords: Energy efficient, Global re-clustering, Local re-clustering, Wireless sensor networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2370736 Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data
Authors: Cristina G. Dascâlu, Corina Dima Cozma, Elena Carmen Cotrutz
Abstract:
The medical data statistical analysis often requires the using of some special techniques, because of the particularities of these data. The principal components analysis and the data clustering are two statistical methods for data mining very useful in the medical field, the first one as a method to decrease the number of studied parameters, and the second one as a method to analyze the connections between diagnosis and the data about the patient-s condition. In this paper we investigate the implications obtained from a specific data analysis technique: the data clustering preceded by a selection of the most relevant parameters, made using the principal components analysis. Our assumption was that, using the principal components analysis before data clustering - in order to select and to classify only the most relevant parameters – the accuracy of clustering is improved, but the practical results showed the opposite fact: the clustering accuracy decreases, with a percentage approximately equal with the percentage of information loss reported by the principal components analysis.Keywords: Data clustering, medical data, principal components analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502