Search results for: Grid–based clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 11677

Search results for: Grid–based clustering

11617 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

Authors: S.Aranganayagi, K.Thangavel

Abstract:

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.

Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3647
11616 Energy-Efficient Clustering Protocol in Wireless Sensor Networks for Healthcare Monitoring

Authors: Ebrahim Farahmand, Ali Mahani

Abstract:

Wireless sensor networks (WSNs) can facilitate continuous monitoring of patients and increase early detection of emergency conditions and diseases. High density WSNs helps us to accurately monitor a remote environment by intelligently combining the data from the individual nodes. Due to energy capacity limitation of sensors, enhancing the lifetime and the reliability of WSNs are important factors in designing of these networks. The clustering strategies are verified as effective and practical algorithms for reducing energy consumption in WSNs and can tackle WSNs limitations. In this paper, an Energy-efficient weight-based Clustering Protocol (EWCP) is presented. Artificial retina is selected as a case study of WSNs applied in body sensors. Cluster heads’ (CHs) selection is equipped with energy efficient parameters. Moreover, cluster members are selected based on their distance to the selected CHs. Comparing with the other benchmark protocols, the lifetime of EWCP is improved significantly.

Keywords: Clustering of WSNs, healthcare monitoring, weight-based clustering, wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1510
11615 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: Fuzzy C-means clustering, Fuzzy C-means clustering based attribute weighting, Pima Indians diabetes dataset, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1718
11614 Customer Segmentation in Foreign Trade based on Clustering Algorithms Case Study: Trade Promotion Organization of Iran

Authors: Samira Malekmohammadi Golsefid, Mehdi Ghazanfari, Somayeh Alizadeh

Abstract:

The goal of this paper is to segment the countries based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and use this distance function in K-means algorithm. The DEB function is defined based on the concepts of the association rules and the value of export group-commodities. In this paper, clustering quality function and clusters intraclass inertia are defined to, respectively, calculate the optimum number of clusters and to compare the functionality of DEB versus Euclidean distance. We have also study the effects of importance weight in DEB function to improve clustering quality. Lastly when segmentation is completed, a designated RFM model is used to analyze the relative profitability of each cluster.

Keywords: Customers segmentation, Customer relationship management, Clustering, Data Mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2258
11613 DCBOR: A Density Clustering Based on Outlier Removal

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1895
11612 A New Evolutionary Algorithm for Cluster Analysis

Authors: B.Bahmani Firouzi, T. Niknam, M. Nayeripour

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.

Keywords: Data clustering, Hybrid evolutionary optimization algorithm, K-means algorithm, Simulated Annealing (SA), Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2240
11611 Chemical Reaction Algorithm for Expectation Maximization Clustering

Authors: Li Ni, Pen ManMan, Li KenLi

Abstract:

Clustering is an intensive research for some years because of its multifaceted applications, such as biology, information retrieval, medicine, business and so on. The expectation maximization (EM) is a kind of algorithm framework in clustering methods, one of the ten algorithms of machine learning. Traditionally, optimization of objective function has been the standard approach in EM. Hence, research has investigated the utility of evolutionary computing and related techniques in the regard. Chemical Reaction Optimization (CRO) is a recently established method. So the property embedded in CRO is used to solve optimization problems. This paper presents an algorithm framework (EM-CRO) with modified CRO operators based on EM cluster problems. The hybrid algorithm is mainly to solve the problem of initial value sensitivity of the objective function optimization clustering algorithm. Our experiments mainly take the EM classic algorithm:k-means and fuzzy k-means as an example, through the CRO algorithm to optimize its initial value, get K-means-CRO and FKM-CRO algorithm. The experimental results of them show that there is improved efficiency for solving objective function optimization clustering problems.

Keywords: Chemical reaction optimization, expectation maximization, initial, objective function clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1248
11610 A Genetic Algorithm for Clustering on Image Data

Authors: Qin Ding, Jim Gasvoda

Abstract:

Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.

Keywords: Clustering, data mining, genetic algorithm, image data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2007
11609 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1907
11608 Resource Matching and a Matchmaking Service for an Intelligent Grid

Authors: Xin Bai, Han Yu, Yongchang Ji, Dan C. Marinescu

Abstract:

We discuss the application of matching in the area of resource discovery and resource allocation in grid computing. We present a formal definition of matchmaking, overview algorithms to evaluate different matchmaking expressions, and develop a matchmaking service for an intelligent grid environment.

Keywords: Grid, Matchmaking, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1539
11607 A Subtractive Clustering Based Approach for Early Prediction of Fault Proneness in Software Modules

Authors: Ramandeep S. Sidhu, Sunil Khullar, Parvinder S. Sandhu, R. P. S. Bedi, Kiranbir Kaur

Abstract:

In this paper, subtractive clustering based fuzzy inference system approach is used for early detection of faults in the function oriented software systems. This approach has been tested with real time defect datasets of NASA software projects named as PC1 and CM1. Both the code based model and joined model (combination of the requirement and code based metrics) of the datasets are used for training and testing of the proposed approach. The performance of the models is recorded in terms of Accuracy, MAE and RMSE values. The performance of the proposed approach is better in case of Joined Model. As evidenced from the results obtained it can be concluded that Clustering and fuzzy logic together provide a simple yet powerful means to model the earlier detection of faults in the function oriented software systems.

Keywords: Subtractive clustering, fuzzy inference system, fault proneness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2544
11606 Dynamic Load Balancing Strategy for Grid Computing

Authors: Belabbas Yagoubi, Yahya Slimani

Abstract:

Workload and resource management are two essential functions provided at the service level of the grid software infrastructure. To improve the global throughput of these software environments, workloads have to be evenly scheduled among the available resources. To realize this goal several load balancing strategies and algorithms have been proposed. Most strategies were developed in mind, assuming homogeneous set of sites linked with homogeneous and fast networks. However for computational grids we must address main new issues, namely: heterogeneity, scalability and adaptability. In this paper, we propose a layered algorithm which achieve dynamic load balancing in grid computing. Based on a tree model, our algorithm presents the following main features: (i) it is layered; (ii) it supports heterogeneity and scalability; and, (iii) it is totally independent from any physical architecture of a grid.

Keywords: Grid computing, load balancing, workload, tree based model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3077
11605 An Agent Based Dynamic Resource Scheduling Model with FCFS-Job Grouping Strategy in Grid Computing

Authors: Raksha Sharma, Vishnu Kant Soni, Manoj Kumar Mishra, Prachet Bhuyan, Utpal Chandra Dey

Abstract:

Grid computing is a group of clusters connected over high-speed networks that involves coordinating and sharing computational power, data storage and network resources operating across dynamic and geographically dispersed locations. Resource management and job scheduling are critical tasks in grid computing. Resource selection becomes challenging due to heterogeneity and dynamic availability of resources. Job scheduling is a NP-complete problem and different heuristics may be used to reach an optimal or near optimal solution. This paper proposes a model for resource and job scheduling in dynamic grid environment. The main focus is to maximize the resource utilization and minimize processing time of jobs. Grid resource selection strategy is based on Max Heap Tree (MHT) that best suits for large scale application and root node of MHT is selected for job submission. Job grouping concept is used to maximize resource utilization for scheduling of jobs in grid computing. Proposed resource selection model and job grouping concept are used to enhance scalability, robustness, efficiency and load balancing ability of the grid.

Keywords: Agent, Grid Computing, Job Grouping, Max Heap Tree (MHT), Resource Scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2056
11604 Towards Clustering of Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf

Abstract:

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472
11603 Control of Grid Connected PMSG-Based Wind Turbine System with Back-To-Back Converter Topology Using Resonant Controller

Authors: Fekkak Bouazza, Menaa Mohamed, Loukriz Abdelhamid, Krim Mohamed L.

Abstract:

This paper presents modeling and control strategy for the grid connected wind turbine system based on Permanent Magnet Synchronous Generator (PMSG). The considered system is based on back-to-back converter topology. The Grid Side Converter (GSC) achieves the DC bus voltage control and unity power factor. The Machine Side Converter (MSC) assures the PMSG speed control. The PMSG is used as a variable speed generator and connected directly to the turbine without gearbox. The pitch angle control is not either considered in this study. Further, Optimal Tip Speed Ratio (OTSR) based MPPT control strategy is used to ensure the most energy efficiency whatever the wind speed variations. A filter (L) is put between the GSC and the grid to reduce current ripple and to improve the injected power quality. The proposed grid connected wind system is built under MATLAB/Simulink environment. The simulation results show the feasibility of the proposed topology and performance of its control strategies.

Keywords: Wind, grid, PMSG, MPPT, OTSR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 835
11602 A Survey of Job Scheduling and Resource Management in Grid Computing

Authors: Raksha Sharma, Vishnu Kant Soni, Manoj Kumar Mishra, Prachet Bhuyan

Abstract:

Grid computing is a form of distributed computing that involves coordinating and sharing computational power, data storage and network resources across dynamic and geographically dispersed organizations. Scheduling onto the Grid is NP-complete, so there is no best scheduling algorithm for all grid computing systems. An alternative is to select an appropriate scheduling algorithm to use in a given grid environment because of the characteristics of the tasks, machines and network connectivity. Job and resource scheduling is one of the key research area in grid computing. The goal of scheduling is to achieve highest possible system throughput and to match the application need with the available computing resources. Motivation of the survey is to encourage the amateur researcher in the field of grid computing, so that they can understand easily the concept of scheduling and can contribute in developing more efficient scheduling algorithm. This will benefit interested researchers to carry out further work in this thrust area of research.

Keywords: Grid Computing, Job Scheduling, ResourceScheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3372
11601 Dynamic Clustering using Particle Swarm Optimization with Application in Unsupervised Image Classification

Authors: Mahamed G.H. Omran, Andries P Engelbrecht, Ayed Salman

Abstract:

A new dynamic clustering approach (DCPSO), based on Particle Swarm Optimization, is proposed. This approach is applied to unsupervised image classification. The proposed approach automatically determines the "optimum" number of clusters and simultaneously clusters the data set with minimal user interference. The algorithm starts by partitioning the data set into a relatively large number of clusters to reduce the effects of initial conditions. Using binary particle swarm optimization the "best" number of clusters is selected. The centers of the chosen clusters is then refined via the Kmeans clustering algorithm. The experiments conducted show that the proposed approach generally found the "optimum" number of clusters on the tested images.

Keywords: Clustering Validation, Particle Swarm Optimization, Unsupervised Clustering, Unsupervised Image Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2416
11600 Analyzing The Effect of Variable Round Time for Clustering Approach in Wireless Sensor Networks

Authors: Vipin Pal, Girdhari Singh, R P Yadav

Abstract:

As wireless sensor networks are energy constraint networks so energy efficiency of sensor nodes is the main design issue. Clustering of nodes is an energy efficient approach. It prolongs the lifetime of wireless sensor networks by avoiding long distance communication. Clustering algorithms operate in rounds. Performance of clustering algorithm depends upon the round time. A large round time consumes more energy of cluster heads while a small round time causes frequent re-clustering. So existing clustering algorithms apply a trade off to round time and calculate it from the initial parameters of networks. But it is not appropriate to use initial parameters based round time value throughout the network lifetime because wireless sensor networks are dynamic in nature (nodes can be added to the network or some nodes go out of energy). In this paper a variable round time approach is proposed that calculates round time depending upon the number of active nodes remaining in the field. The proposed approach makes the clustering algorithm adaptive to network dynamics. For simulation the approach is implemented with LEACH in NS-2 and the results show that there is 6% increase in network lifetime, 7% increase in 50% node death time and 5% improvement over the data units gathered at the base station.

Keywords: Wireless Sensor Network, Clustering, Energy Efficiency, Round Time.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1745
11599 Mobile Ad-Hoc Service Grid – MASGRID

Authors: Imran Ihsan, Muhammad Abdul Qadir, Nadeem Iftikhar

Abstract:

Mobile devices, which are progressively surrounded in our everyday life, have created a new paradigm where they interconnect, interact and collaborate with each other. This network can be used for flexible and secure coordinated sharing. On the other hand Grid computing provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities. In this paper, efforts are made to map the concepts of Grid on Ad-Hoc networks because both exhibit similar kind of characteristics like Scalability, Dynamism and Heterogeneity. In this context we propose “Mobile Ad-Hoc Services Grid – MASGRID".

Keywords: Mobile Ad-Hoc Networks, Grid Computing, Resource Discovery, Routing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1757
11598 Energy Efficient Clustering Algorithm with Global and Local Re-clustering for Wireless Sensor Networks

Authors: Ashanie Guanathillake, Kithsiri Samarasinghe

Abstract:

Wireless Sensor Networks consist of inexpensive, low power sensor nodes deployed to monitor the environment and collect data. Gathering information in an energy efficient manner is a critical aspect to prolong the network lifetime. Clustering  algorithms have an advantage of enhancing the network lifetime. Current clustering algorithms usually focus on global re-clustering and local re-clustering separately. This paper, proposed a combination of those two reclustering methods to reduce the energy consumption of the network. Furthermore, the proposed algorithm can apply to homogeneous as well as heterogeneous wireless sensor networks. In addition, the cluster head rotation happens, only when its energy drops below a dynamic threshold value computed by the algorithm. The simulation result shows that the proposed algorithm prolong the network lifetime compared to existing algorithms.

Keywords: Energy efficient, Global re-clustering, Local re-clustering, Wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
11597 Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data

Authors: Cristina G. Dascâlu, Corina Dima Cozma, Elena Carmen Cotrutz

Abstract:

The medical data statistical analysis often requires the using of some special techniques, because of the particularities of these data. The principal components analysis and the data clustering are two statistical methods for data mining very useful in the medical field, the first one as a method to decrease the number of studied parameters, and the second one as a method to analyze the connections between diagnosis and the data about the patient-s condition. In this paper we investigate the implications obtained from a specific data analysis technique: the data clustering preceded by a selection of the most relevant parameters, made using the principal components analysis. Our assumption was that, using the principal components analysis before data clustering - in order to select and to classify only the most relevant parameters – the accuracy of clustering is improved, but the practical results showed the opposite fact: the clustering accuracy decreases, with a percentage approximately equal with the percentage of information loss reported by the principal components analysis.

Keywords: Data clustering, medical data, principal components analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1462
11596 A Distributed Weighted Cluster Based Routing Protocol for Manets

Authors: Naveen Chauhan, L.K. Awasthi, Narottam chand, Vivek Katiyar, Ankit Chug

Abstract:

Mobile ad-hoc networks (MANETs) are a form of wireless networks which do not require a base station for providing network connectivity. Mobile ad-hoc networks have many characteristics which distinguish them from other wireless networks which make routing in such networks a challenging task. Cluster based routing is one of the routing schemes for MANETs in which various clusters of mobile nodes are formed with each cluster having its own clusterhead which is responsible for routing among clusters. In this paper we have proposed and implemented a distributed weighted clustering algorithm for MANETs. This approach is based on combined weight metric that takes into account several system parameters like the node degree, transmission range, energy and mobility of the nodes. We have evaluated the performance of proposed scheme through simulation in various network situations. Simulation results show that proposed scheme outperforms the original distributed weighted clustering algorithm (DWCA).

Keywords: MANETs, Clustering, Routing, WirelessCommunication, Distributed Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1852
11595 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: Microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1238
11594 An Energy-Efficient Protocol with Static Clustering for Wireless Sensor Networks

Authors: Amir Sepasi Zahmati, Bahman Abolhassani, Ali Asghar Beheshti Shirazi, Ali Shojaee Bakhtiari

Abstract:

A wireless sensor network with a large number of tiny sensor nodes can be used as an effective tool for gathering data in various situations. One of the major issues in wireless sensor networks is developing an energy-efficient routing protocol which has a significant impact on the overall lifetime of the sensor network. In this paper, we propose a novel hierarchical with static clustering routing protocol called Energy-Efficient Protocol with Static Clustering (EEPSC). EEPSC, partitions the network into static clusters, eliminates the overhead of dynamic clustering and utilizes temporary-cluster-heads to distribute the energy load among high-power sensor nodes; thus extends network lifetime. We have conducted simulation-based evaluations to compare the performance of EEPSC against Low-Energy Adaptive Clustering Hierarchy (LEACH). Our experiment results show that EEPSC outperforms LEACH in terms of network lifetime and power consumption minimization.

Keywords: Clustering methods, energy efficiency, routingprotocol, wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2680
11593 Fuzzy Hierarchical Clustering Applied for Quality Estimation in Manufacturing System

Authors: Y. Q. Lv, C.K.M. Lee

Abstract:

This paper develops a quality estimation method with the application of fuzzy hierarchical clustering. Quality estimation is essential to quality control and quality improvement as a precise estimation can promote a right decision-making in order to help better quality control. Normally the quality of finished products in manufacturing system can be differentiated by quality standards. In the real life situation, the collected data may be vague which is not easy to be classified and they are usually represented in term of fuzzy number. To estimate the quality of product presented by fuzzy number is not easy. In this research, the trapezoidal fuzzy numbers are collected in manufacturing process and classify the collected data into different clusters so as to get the estimation. Since normal hierarchical clustering methods can only be applied for real numbers, fuzzy hierarchical clustering is selected to handle this problem based on quality standards.

Keywords: Quality Estimation, Fuzzy Quality Mean, Fuzzy Hierarchical Clustering, Fuzzy Number, Manufacturing system

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629
11592 An Efficient and Generic Hybrid Framework for High Dimensional Data Clustering

Authors: Dharmveer Singh Rajput , P. K. Singh, Mahua Bhattacharya

Abstract:

Clustering in high dimensional space is a difficult problem which is recurrent in many fields of science and engineering, e.g., bioinformatics, image processing, pattern reorganization and data mining. In high dimensional space some of the dimensions are likely to be irrelevant, thus hiding the possible clustering. In very high dimensions it is common for all the objects in a dataset to be nearly equidistant from each other, completely masking the clusters. Hence, performance of the clustering algorithm decreases. In this paper, we propose an algorithmic framework which combines the (reduct) concept of rough set theory with the k-means algorithm to remove the irrelevant dimensions in a high dimensional space and obtain appropriate clusters. Our experiment on test data shows that this framework increases efficiency of the clustering process and accuracy of the results.

Keywords: High dimensional clustering, sub-space, k-means, rough set, discernibility matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1895
11591 Using Suffix Tree Document Representation in Hierarchical Agglomerative Clustering

Authors: Daniel I. Morariu, Radu G. Cretulescu, Lucian N. Vintan

Abstract:

In text categorization problem the most used method for documents representation is based on words frequency vectors called VSM (Vector Space Model). This representation is based only on words from documents and in this case loses any “word context" information found in the document. In this article we make a comparison between the classical method of document representation and a method called Suffix Tree Document Model (STDM) that is based on representing documents in the Suffix Tree format. For the STDM model we proposed a new approach for documents representation and a new formula for computing the similarity between two documents. Thus we propose to build the suffix tree only for any two documents at a time. This approach is faster, it has lower memory consumption and use entire document representation without using methods for disposing nodes. Also for this method is proposed a formula for computing the similarity between documents, which improves substantially the clustering quality. This representation method was validated using HAC - Hierarchical Agglomerative Clustering. In this context we experiment also the stemming influence in the document preprocessing step and highlight the difference between similarity or dissimilarity measures to find “closer" documents.

Keywords: Text Clustering, Suffix tree documentrepresentation, Hierarchical Agglomerative Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1871
11590 Influence of Ambiguity Cluster on Quality Improvement in Image Compression

Authors: Safaa Al-Ali, Ahmad Shahin, Fadi Chakik

Abstract:

Image coding based on clustering provides immediate access to targeted features of interest in a high quality decoded image. This approach is useful for intelligent devices, as well as for multimedia content-based description standards. The result of image clustering cannot be precise in some positions especially on pixels with edge information which produce ambiguity among the clusters. Even with a good enhancement operator based on PDE, the quality of the decoded image will highly depend on the clustering process. In this paper, we introduce an ambiguity cluster in image coding to represent pixels with vagueness properties. The presence of such cluster allows preserving some details inherent to edges as well for uncertain pixels. It will also be very useful during the decoding phase in which an anisotropic diffusion operator, such as Perona-Malik, enhances the quality of the restored image. This work also offers a comparative study to demonstrate the effectiveness of a fuzzy clustering technique in detecting the ambiguity cluster without losing lot of the essential image information. Several experiments have been carried out to demonstrate the usefulness of ambiguity concept in image compression. The coding results and the performance of the proposed algorithms are discussed in terms of the peak signal-tonoise ratio and the quantity of ambiguous pixels.

Keywords: Ambiguity Cluster, Anisotropic Diffusion, Fuzzy Clustering, Image Compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532
11589 Clustering Approach to Unveiling Relationships between Gene Regulatory Networks

Authors: Hiba Hasan, Khalid Raza

Abstract:

Reverse engineering of genetic regulatory network involves the modeling of the given gene expression data into a form of the network. Computationally it is possible to have the relationships between genes, so called gene regulatory networks (GRNs), that can help to find the genomics and proteomics based diagnostic approach for any disease. In this paper, clustering based method has been used to reconstruct genetic regulatory network from time series gene expression data. Supercoiled data set from Escherichia coli has been taken to demonstrate the proposed method.

Keywords: Gene expression, gene regulatory networks (GRNs), clustering, data preprocessing, network visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2106
11588 Metadata Update Mechanism Improvements in Data Grid

Authors: S. Farokhzad, M. Reza Salehnamadi

Abstract:

Grid environments include aggregation of geographical distributed resources. Grid is put forward in three types of computational, data and storage. This paper presents a research on data grid. Data grid is used for covering and securing accessibility to data from among many heterogeneous sources. Users are not worry on the place where data is located in it, provided that, they should get access to the data. Metadata is used for getting access to data in data grid. Presently, application metadata catalogue and SRB middle-ware package are used in data grids for management of metadata. At this paper, possibility of updating, streamlining and searching is provided simultaneously and rapidly through classified table of preserving metadata and conversion of each table to numerous tables. Meanwhile, with regard to the specific application, the most appropriate and best division is set and determined. Concurrency of implementation of some of requests and execution of pipeline is adaptability as a result of this technique.

Keywords: Grids, data grid, metadata, update.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1659