Search results for: Grid base clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1705

Search results for: Grid base clustering

1645 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1984
1644 Smart Grid Communication Architecture Modeling for Heterogeneous Network Based Advanced Metering Infrastructure

Authors: S. Prem Kumar, H. Thameemul Ansari, V. Saminadan

Abstract:

A smart grid is an emerging technology in the power delivery system which provides an intelligent, self-recovery and homeostatic grid in delivering power to the users. Smart grid communication network provides transmission capacity for information transformation within the connected nodes in the network, in favor of functional and operational needs. In the electric grids communication network delay is based on choosing the appropriate technology and the types of devices enforced. In distinction, the combination of IEEE 802.16 based WiMAX and IEEE 802.11 based WiFi technologies provides improved coverage and gives low delay performances to meet the smart grid needs. By incorporating this method in Wide Area Monitoring System (WAMS) and Advanced Metering Infrastructure (AMI) the performance of the smart grid will be considerably improved. This work deals with the implementation of WiMAX-WLAN integrated network architecture for WAMS and AMI in the smart grid.

Keywords: WiMAX, WLAN, WAMS, Smart Grid, HetNet, AMI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1016
1643 Evaluation of Clustering Based on Preprocessing in Gene Expression Data

Authors: Seo Young Kim, Toshimitsu Hamasaki

Abstract:

Microarrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we express and compare the performance of several clustering methods based on data preprocessing including strategies of normalization or noise clearness. We also evaluate each of these clustering methods with validation measures for both simulated data and real gene expression data. Consequently, clustering methods which are common used in microarray data analysis are affected by normalization and degree of noise and clearness for datasets.

Keywords: Gene expression, clustering, data preprocessing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1739
1642 A Context-Aware based Authorization System for Pervasive Grid Computing

Authors: Marilyn Lim Chien Hui, Nabil Elmarzouqi, Chan Huah Yong

Abstract:

This paper describes the authorization system architecture for Pervasive Grid environment. It discusses the characteristics of classical authorization system and requirements of the authorization system in pervasive grid environment as well. Based on our analysis of current systems and taking into account the main requirements of such pervasive environment, we propose new authorization system architecture as an extension of the existing grid authorization mechanisms. This architecture not only supports user attributes but also context attributes which act as a key concept for context-awareness thought. The architecture allows authorization of users dynamically when there are changes in the pervasive grid environment. For this, we opt for hybrid authorization method that integrates push and pull mechanisms to combine the existing grid authorization attributes with dynamic context assertions. We will investigate the proposed architecture using a real testing environment that includes heterogeneous pervasive grid infrastructures mapped over multiple virtual organizations. Various scenarios are described in the last section of the article to strengthen the proposed mechanism with different facilities for the authorization procedure.

Keywords: Pervasive Grid, Authorization System, Contextawareness, Ubiquity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2152
1641 DCBOR: A Density Clustering Based on Outlier Removal

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1932
1640 Grouping-Based Job Scheduling Model In Grid Computing

Authors: Vishnu Kant Soni, Raksha Sharma, Manoj Kumar Mishra

Abstract:

Grid computing is a high performance computing environment to solve larger scale computational applications. Grid computing contains resource management, job scheduling, security problems, information management and so on. Job scheduling is a fundamental and important issue in achieving high performance in grid computing systems. However, it is a big challenge to design an efficient scheduler and its implementation. In Grid Computing, there is a need of further improvement in Job Scheduling algorithm to schedule the light-weight or small jobs into a coarse-grained or group of jobs, which will reduce the communication time, processing time and enhance resource utilization. This Grouping strategy considers the processing power, memory-size and bandwidth requirements of each job to realize the real grid system. The experimental results demonstrate that the proposed scheduling algorithm efficiently reduces the processing time of jobs in comparison to others.

Keywords: Grid computing, Job grouping and Jobscheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1946
1639 Automatic Clustering of Gene Ontology by Genetic Algorithm

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias, Zalmiyah Zakaria, Saberi M. Mohamad

Abstract:

Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.

Keywords: Automatic clustering, cohesion-and-coupling metric, gene ontology; genetic algorithm, split-and-merge algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1954
1638 Study on a Nested Cartesian Grid Method

Authors: Yih-Ferng Peng

Abstract:

In this paper, the local grid refinement is focused by using a nested grid technique. The Cartesian grid numerical method is developed for simulating unsteady, viscous, incompressible flows with complex immersed boundaries. A finite volume method is used in conjunction with a two-step fractional-step procedure. The key aspects that need to be considered in developing such a nested grid solver are imposition of interface conditions on the inter-block and accurate discretization of the governing equation in cells that are with the inter-block as a control surface. A new interpolation procedure is presented which allows systematic development of a spatial discretization scheme that preserves the spatial accuracy of the underlying solver. The present nested grid method has been tested by two numerical examples to examine its performance in the two dimensional problems. The numerical examples include flow past a circular cylinder symmetrically installed in a Channel and flow past two circular cylinders with different diameters. From the numerical experiments, the ability of the solver to simulate flows with complicated immersed boundaries is demonstrated and the nested grid approach can efficiently speed up the numerical solutions.

Keywords: local grid refinement, Cartesian grid, nested grid, fractional-step method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1560
1637 An Overview of Islanding Detection Methods in Photovoltaic Systems

Authors: Wei Yee Teoh, Chee Wei Tan

Abstract:

The issue of unintentional islanding in PV grid interconnection still remains as a challenge in grid-connected photovoltaic (PV) systems. This paper discusses the overview of popularly used anti-islanding detection methods, practically applied in PV grid-connected systems. Anti-islanding methods generally can be classified into four major groups, which include passive methods, active methods, hybrid methods and communication base methods. Active methods have been the preferred detection technique over the years due to very small non-detected zone (NDZ) in small scale distribution generation. Passive method is comparatively simpler than active method in terms of circuitry and operations. However, it suffers from large NDZ that significantly reduces its performance. Communication base methods inherit the advantages of active and passive methods with reduced drawbacks. Hybrid method which evolved from the combination of both active and passive methods has been proven to achieve accurate anti-islanding detection by many researchers. For each of the studied anti-islanding methods, the operation analysis is described while the advantages and disadvantages are compared and discussed. It is difficult to pinpoint a generic method for a specific application, because most of the methods discussed are governed by the nature of application and system dependent elements. This study concludes that the setup and operation cost is the vital factor for anti-islanding method selection in order to achieve minimal compromising between cost and system quality.

Keywords: Active method, hybrid method, islanding detection, passive method, photovoltaic (PV), utility method

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9758
1636 Efficient Mean Shift Clustering Using Exponential Integral Kernels

Authors: S. Sutor, R. Röhr, G. Pujolle, R. Reda

Abstract:

This paper presents a highly efficient algorithm for detecting and tracking humans and objects in video surveillance sequences. Mean shift clustering is applied on backgrounddifferenced image sequences. For efficiency, all calculations are performed on integral images. Novel corresponding exponential integral kernels are introduced to allow the application of nonuniform kernels for clustering, which dramatically increases robustness without giving up the efficiency of the integral data structures. Experimental results demonstrating the power of this approach are presented.

Keywords: Clustering, Integral Images, Kernels, Person Detection, Person Tracking, Intelligent Video Surveillance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527
1635 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2602
1634 Chemical Reaction Algorithm for Expectation Maximization Clustering

Authors: Li Ni, Pen ManMan, Li KenLi

Abstract:

Clustering is an intensive research for some years because of its multifaceted applications, such as biology, information retrieval, medicine, business and so on. The expectation maximization (EM) is a kind of algorithm framework in clustering methods, one of the ten algorithms of machine learning. Traditionally, optimization of objective function has been the standard approach in EM. Hence, research has investigated the utility of evolutionary computing and related techniques in the regard. Chemical Reaction Optimization (CRO) is a recently established method. So the property embedded in CRO is used to solve optimization problems. This paper presents an algorithm framework (EM-CRO) with modified CRO operators based on EM cluster problems. The hybrid algorithm is mainly to solve the problem of initial value sensitivity of the objective function optimization clustering algorithm. Our experiments mainly take the EM classic algorithm:k-means and fuzzy k-means as an example, through the CRO algorithm to optimize its initial value, get K-means-CRO and FKM-CRO algorithm. The experimental results of them show that there is improved efficiency for solving objective function optimization clustering problems.

Keywords: Chemical reaction optimization, expectation maximization, initial, objective function clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1292
1633 A Graph-Based Approach for Placement of No-Replicated Databases in Grid

Authors: Cherif Haddad, Faouzi Ben Charrada

Abstract:

On a such wide-area environment as a Grid, data placement is an important aspect of distributed database systems. In this paper, we address the problem of initial placement of database no-replicated fragments in Grid architecture. We propose a graph based approach that considers resource restrictions. The goal is to optimize the use of computing, storage and communication resources. The proposed approach is developed in two phases: in the first phase, we perform fragment grouping using knowledge about fragments dependency and, in the second phase, we determine an efficient placement of the fragment groups on the Grid. We also show, via experimental analysis that our approach gives solutions that are close to being optimal for different databases and Grid configurations.

Keywords: Grid computing, Distributed systems, Data resourcesmanagement, Database systems, Database placement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1640
1632 Enhanced Clustering Analysis and Visualization Using Kohonen's Self-Organizing Feature Map Networks

Authors: Kasthurirangan Gopalakrishnan, Siddhartha Khaitan, Anshu Manik

Abstract:

Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.

Keywords: Artificial neural networks, cluster analysis, Kohonen maps, wine recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2121
1631 Improved Wavelet Neural Networks for Early Cancer Diagnosis Using Clustering Algorithms

Authors: Zarita Zainuddin, Ong Pauline

Abstract:

Wavelet neural networks (WNNs) have emerged as a vital alternative to the vastly studied multilayer perceptrons (MLPs) since its first implementation. In this paper, we applied various clustering algorithms, namely, K-means (KM), Fuzzy C-means (FCM), symmetry-based K-means (SBKM), symmetry-based Fuzzy C-means (SBFCM) and modified point symmetry-based K-means (MPKM) clustering algorithms in choosing the translation parameter of a WNN. These modified WNNs are further applied to the heterogeneous cancer classification using benchmark microarray data and were compared against the conventional WNN with random initialization method. Experimental results showed that a WNN classifier with the MPKM algorithm is more precise than the conventional WNN as well as the WNNs with other clustering algorithms.

Keywords: Clustering, microarray, symmetry, wavelet neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1615
1630 Increasing Replica Consistency Performances with Load Balancing Strategy in Data Grid Systems

Authors: Sarra Senhadji, Amar Kateb, Hafida Belbachir

Abstract:

Data replication in data grid systems is one of the important solutions that improve availability, scalability, and fault tolerance. However, this technique can also bring some involved issues such as maintaining replica consistency. Moreover, as grid environment are very dynamic some nodes can be more uploaded than the others to become eventually a bottleneck. The main idea of our work is to propose a complementary solution between replica consistency maintenance and dynamic load balancing strategy to improve access performances under a simulated grid environment.

Keywords: Consistency, replication, data grid, load balancing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2324
1629 Technical and Economic Analysis of Smart Micro-Grid Renewable Energy Systems: An Applicable Case Study

Authors: M. A. Fouad, M. A. Badr, Z. S. Abd El-Rehim, Taher Halawa, Mahmoud Bayoumi, M. M. Ibrahim

Abstract:

Renewable energy-based micro-grids are presently attracting significant consideration. The smart grid system is presently considered a reliable solution for the expected deficiency in the power required from future power systems. The purpose of this study is to determine the optimal components sizes of a micro-grid, investigating technical and economic performance with the environmental impacts. The micro grid load is divided into two small factories with electricity, both on-grid and off-grid modes are considered. The micro-grid includes photovoltaic cells, back-up diesel generator wind turbines, and battery bank. The estimated load pattern is 76 kW peak. The system is modeled and simulated by MATLAB/Simulink tool to identify the technical issues based on renewable power generation units. To evaluate system economy, two criteria are used: the net present cost and the cost of generated electricity. The most feasible system components for the selected application are obtained, based on required parameters, using HOMER simulation package. The results showed that a Wind/Photovoltaic (W/PV) on-grid system is more economical than a Wind/Photovoltaic/Diesel/Battery (W/PV/D/B) off-grid system as the cost of generated electricity (COE) is 0.266 $/kWh and 0.316 $/kWh, respectively. Considering the cost of carbon dioxide emissions, the off-grid will be competitive to the on-grid system as COE is found to be (0.256 $/kWh, 0.266 $/kWh), for on and off grid systems.

Keywords: Optimum energy systems, renewable energy sources, smart grid, micro-grid system, on- grid system, off-grid system, modeling and simulation, economical evaluation, net present value, cost of energy, environmental impacts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2422
1628 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

Authors: S.Aranganayagi, K.Thangavel

Abstract:

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.

Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3688
1627 Association Rule and Decision Tree based Methodsfor Fuzzy Rule Base Generation

Authors: Ferenc Peter Pach, Janos Abonyi

Abstract:

This paper focuses on the data-driven generation of fuzzy IF...THEN rules. The resulted fuzzy rule base can be applied to build a classifier, a model used for prediction, or it can be applied to form a decision support system. Among the wide range of possible approaches, the decision tree and the association rule based algorithms are overviewed, and two new approaches are presented based on the a priori fuzzy clustering based partitioning of the continuous input variables. An application study is also presented, where the developed methods are tested on the well known Wisconsin Breast Cancer classification problem.

Keywords:

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2302
1626 Significance of Splitting Method in Non-linear Grid system for the Solution of Navier-Stokes Equation

Authors: M. Zamani, O. Kahar

Abstract:

Solution to unsteady Navier-Stokes equation by Splitting method in physical orthogonal algebraic curvilinear coordinate system, also termed 'Non-linear grid system' is presented. The linear terms in Navier-Stokes equation are solved by Crank- Nicholson method while the non-linear term is solved by the second order Adams-Bashforth method. This work is meant to bring together the advantage of Splitting method as pressure-velocity solver of higher efficiency with the advantage of consuming Non-linear grid system which produce more accurate results in relatively equal number of grid points as compared to Cartesian grid. The validation of Splitting method as a solution of Navier-Stokes equation in Nonlinear grid system is done by comparison with the benchmark results for lid driven cavity flow by Ghia and some case studies including Backward Facing Step Flow Problem.

Keywords: Navier-Stokes, 'Non-linear grid system', Splitting method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527
1625 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1995
1624 Towards Design of Context-Aware Sensor Grid Framework for Agriculture

Authors: Aqeel-ur-Rehman, Zubair A. Shaikh

Abstract:

This paper is to present context-aware sensor grid framework for agriculture and its design challenges. Use of sensor networks in the domain of agriculture is not new. However, due to the unavailability of any common framework, solutions that are developed in this domain are location, environment and problem dependent. Keeping the need of common framework for agriculture, Context-Aware Sensor Grid Framework is proposed. It will be helpful in developing solutions for majority of the problems related to irrigation, pesticides spray, use of fertilizers, regular monitoring of plot and yield etc. due to the capability of adjusting according to location and environment. The proposed framework is composed of three layer architecture including context-aware application layer, grid middleware layer and sensor network layer.

Keywords: Agriculture, Context-Awareness, Grid Computing, and Sensor Grid.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2574
1623 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: Binary data, similarity measure, Tθ measures, Agglomerative Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3444
1622 A New Evolutionary Algorithm for Cluster Analysis

Authors: B.Bahmani Firouzi, T. Niknam, M. Nayeripour

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.

Keywords: Data clustering, Hybrid evolutionary optimization algorithm, K-means algorithm, Simulated Annealing (SA), Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2276
1621 A New Algorithm for Cluster Initialization

Authors: Moth'd Belal. Al-Daoud

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.

Keywords: clustering, k-means, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2102
1620 Fortification for P2P Grid Computing Used for Resource Discovery

Authors: Bhawneet Singh Marwah, Rishabh Rastogi, Shinon Kochar

Abstract:

Grid computing provides an effective infrastructure for massive computation among flexible and dynamic collection of individual system for resource discovery. The major challenge for grid computing is to prevent breaches and secure the data from trespassers. To overcome such conflicts a semantic approach can be designed which will filter the access requests of peers by checking the resource description specifying the data and the metadata as factual statements. Between every node in the grid a semantic firewall as a middleware will be present The intruder will be required to present an application specifying there needs to the firewall and hence accordingly the system will grant or deny the application request.

Keywords: Grid Computing, Metadata, Semantic, Peers, Resource Discovery, Firewall.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1565
1619 Density Clustering Based On Radius of Data (DCBRD)

Authors: A.M. Fahim, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, a density based clustering algorithm (DCBRD) is presented, relying on a knowledge acquired from the data by dividing the data space into overlapped regions. The proposed algorithm discovers arbitrary shaped clusters, requires no input parameters and uses the same definitions of DBSCAN algorithm. We performed an experimental evaluation of the effectiveness and efficiency of it, and compared this results with that of DBSCAN. The results of our experiments demonstrate that the proposed algorithm is significantly efficient in discovering clusters of arbitrary shape and size.

Keywords: Clustering Algorithms, Arbitrary Shape of clusters, cluster Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1874
1618 Categorical Clustering By Converting Associated Information

Authors: Dongmin Cai, Stephen S-T Yau

Abstract:

Lacking an inherent “natural" dissimilarity measure between objects in categorical dataset presents special difficulties in clustering analysis. However, each categorical attributes from a given dataset provides natural probability and information in the sense of Shannon. In this paper, we proposed a novel method which heuristically converts categorical attributes to numerical values by exploiting such associated information. We conduct an experimental study with real-life categorical dataset. The experiment demonstrates the effectiveness of our approach.

Keywords: Categorical, Clustering, Converting, Information

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1359
1617 Optimization of a Triangular Fin with Variable Fin Base Thickness

Authors: Hyung Suk Kang

Abstract:

A triangular fin with variable fin base thickness is analyzed and optimized using a two-dimensional analytical method. The influence of fin base height and fin base thickness on the temperature in the fin is listed. For the fixed fin volumes, the maximum heat loss, the corresponding optimum fin effectiveness, fin base height and fin tip length as a function of the fin base thickness, convection characteristic number and dimensionless fin volume are represented. One of the results shows that the optimum heat loss increases whereas the corresponding optimum fin effectiveness decreases with the increase of fin volume.

Keywords: A triangular fin, Convection characteristic number, Heat loss, Fin base thickness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4122
1616 Dynamic Clustering using Particle Swarm Optimization with Application in Unsupervised Image Classification

Authors: Mahamed G.H. Omran, Andries P Engelbrecht, Ayed Salman

Abstract:

A new dynamic clustering approach (DCPSO), based on Particle Swarm Optimization, is proposed. This approach is applied to unsupervised image classification. The proposed approach automatically determines the "optimum" number of clusters and simultaneously clusters the data set with minimal user interference. The algorithm starts by partitioning the data set into a relatively large number of clusters to reduce the effects of initial conditions. Using binary particle swarm optimization the "best" number of clusters is selected. The centers of the chosen clusters is then refined via the Kmeans clustering algorithm. The experiments conducted show that the proposed approach generally found the "optimum" number of clusters on the tested images.

Keywords: Clustering Validation, Particle Swarm Optimization, Unsupervised Clustering, Unsupervised Image Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2452