Search results for: Sequential block clustering.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1109

Search results for: Sequential block clustering.

1079 Application of a New Hybrid Optimization Algorithm on Cluster Analysis

Authors: T. Niknam, M. Nayeripour, B.Bahmani Firouzi

Abstract:

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Keywords: Ant Colony Optimization (ACO), Data clustering, Hybrid evolutionary optimization algorithm, K-means clustering, Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2157
1078 A Similarity Measure for Clustering and its Applications

Authors: Guadalupe J. Torres, Ram B. Basnet, Andrew H. Sung, Srinivas Mukkamala, Bernardete M. Ribeiro

Abstract:

This paper introduces a measure of similarity between two clusterings of the same dataset produced by two different algorithms, or even the same algorithm (K-means, for instance, with different initializations usually produce different results in clustering the same dataset). We then apply the measure to calculate the similarity between pairs of clusterings, with special interest directed at comparing the similarity between various machine clusterings and human clustering of datasets. The similarity measure thus can be used to identify the best (in terms of most similar to human) clustering algorithm for a specific problem at hand. Experimental results pertaining to the text categorization problem of a Portuguese corpus (wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The significance and other potential applications of the proposed measure are discussed.

Keywords: Clustering Algorithms, Clustering Applications, Similarity Measures, Text Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1520
1077 Conditions for Fault Recovery of Interconnected Asynchronous Sequential Machines with State Feedback

Authors: Jung–Min Yang

Abstract:

In this paper, fault recovery for parallel interconnected asynchronous sequential machines is studied. An adversarial input can infiltrate into one of two submachines comprising parallel composition of the considered asynchronous sequential machine, causing an unauthorized state transition. The control objective is to elucidate the condition for the existence of a corrective controller that makes the closed-loop system immune against any occurrence of adversarial inputs. In particular, an efficient existence condition is presented that does not need the complete modeling of the interconnected asynchronous sequential machine.

Keywords: Asynchronous sequential machines, parallel composition, corrective control, fault tolerance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 800
1076 Clustering in WSN Based on Minimum Spanning Tree Using Divide and Conquer Approach

Authors: Uttam Vijay, Nitin Gupta

Abstract:

Due to heavy energy constraints in WSNs clustering is an efficient way to manage the energy in sensors. There are many methods already proposed in the area of clustering and research is still going on to make clustering more energy efficient. In our paper we are proposing a minimum spanning tree based clustering using divide and conquer approach. The MST based clustering was first proposed in 1970’s for large databases. Here we are taking divide and conquer approach and implementing it for wireless sensor networks with the constraints attached to the sensor networks. This Divide and conquer approach is implemented in a way that we don’t have to construct the whole MST before clustering but we just find the edge which will be the part of the MST to a corresponding graph and divide the graph in clusters there itself if that edge from the graph can be removed judging on certain constraints and hence saving lot of computation.

Keywords: Algorithm, Clustering, Edge-Weighted Graph, Weighted-LEACH.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2421
1075 Minimal Spanning Tree based Fuzzy Clustering

Authors: Ágnes Vathy-Fogarassy, Balázs Feil, János Abonyi

Abstract:

Most of fuzzy clustering algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they suffer from numerical problems, like sensitiveness to the initialization, etc. This paper studies the synergistic combination of the hierarchical and graph theoretic minimal spanning tree based clustering algorithm with the partitional Gath-Geva fuzzy clustering algorithm. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically defined parameters of these algorithms to decrease the influence of the user on the clustering results. For the analysis of the resulted fuzzy clusters a new fuzzy similarity measure based tool has been presented. The calculated similarities of the clusters can be used for the hierarchical clustering of the resulted fuzzy clusters, which information is useful for cluster merging and for the visualization of the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not suffer from the numerical problems of the classical Gath-Geva fuzzy clustering algorithm.

Keywords: Clustering, fuzzy clustering, minimal spanning tree, cluster validity, fuzzy similarity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2351
1074 Using Data Clustering in Oral Medicine

Authors: Fahad Shahbaz Khan, Rao Muhammad Anwer, Olof Torgersson

Abstract:

The vast amount of information hidden in huge databases has created tremendous interests in the field of data mining. This paper examines the possibility of using data clustering techniques in oral medicine to identify functional relationships between different attributes and classification of similar patient examinations. Commonly used data clustering algorithms have been reviewed and as a result several interesting results have been gathered.

Keywords: Oral Medicine, Cluto, Data Clustering, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1930
1073 A Genetic Algorithm for Clustering on Image Data

Authors: Qin Ding, Jim Gasvoda

Abstract:

Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.

Keywords: Clustering, data mining, genetic algorithm, image data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2000
1072 A Modified Fuzzy C-Means Algorithm for Natural Data Exploration

Authors: Binu Thomas, Raju G., Sonam Wangmo

Abstract:

In Data mining, Fuzzy clustering algorithms have demonstrated advantage over crisp clustering algorithms in dealing with the challenges posed by large collections of vague and uncertain natural data. This paper reviews concept of fuzzy logic and fuzzy clustering. The classical fuzzy c-means algorithm is presented and its limitations are highlighted. Based on the study of the fuzzy c-means algorithm and its extensions, we propose a modification to the cmeans algorithm to overcome the limitations of it in calculating the new cluster centers and in finding the membership values with natural data. The efficiency of the new modified method is demonstrated on real data collected for Bhutan-s Gross National Happiness (GNH) program.

Keywords: Adaptive fuzzy clustering, clustering, fuzzy logic, fuzzy clustering, c-means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1932
1071 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2412
1070 Vortex Formation in Lid-driven Cavity with Disturbance Block

Authors: Maysam Saidi, Hassan Basirat Tabrizi, Reza Maddahian

Abstract:

In this paper, numerical simulations are performed to investigate the effect of disturbance block on flow field of the classical square lid-driven cavity. Attentions are focused on vortex formation and studying the effect of block position on its structure. Corner vortices are different upon block position and new vortices are produced because of the block. Finite volume method is used to solve Navier-Stokes equations and PISO algorithm is employed for the linkage of velocity and pressure. Verification and grid independency of results are reported. Stream lines are sketched to visualize vortex structure in different block positions.

Keywords: Disturbance Block, Finite Volume Method, Lid-Driven Cavity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1815
1069 ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset

Authors: Sunita Jahirabadkar, Parag Kulkarni

Abstract:

Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.

Keywords: Density based clustering, high dimensional data, subspace clustering, dynamic parameter setting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1973
1068 Energy Efficient Clustering Algorithm with Global and Local Re-clustering for Wireless Sensor Networks

Authors: Ashanie Guanathillake, Kithsiri Samarasinghe

Abstract:

Wireless Sensor Networks consist of inexpensive, low power sensor nodes deployed to monitor the environment and collect data. Gathering information in an energy efficient manner is a critical aspect to prolong the network lifetime. Clustering  algorithms have an advantage of enhancing the network lifetime. Current clustering algorithms usually focus on global re-clustering and local re-clustering separately. This paper, proposed a combination of those two reclustering methods to reduce the energy consumption of the network. Furthermore, the proposed algorithm can apply to homogeneous as well as heterogeneous wireless sensor networks. In addition, the cluster head rotation happens, only when its energy drops below a dynamic threshold value computed by the algorithm. The simulation result shows that the proposed algorithm prolong the network lifetime compared to existing algorithms.

Keywords: Energy efficient, Global re-clustering, Local re-clustering, Wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2327
1067 Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data

Authors: Cristina G. Dascâlu, Corina Dima Cozma, Elena Carmen Cotrutz

Abstract:

The medical data statistical analysis often requires the using of some special techniques, because of the particularities of these data. The principal components analysis and the data clustering are two statistical methods for data mining very useful in the medical field, the first one as a method to decrease the number of studied parameters, and the second one as a method to analyze the connections between diagnosis and the data about the patient-s condition. In this paper we investigate the implications obtained from a specific data analysis technique: the data clustering preceded by a selection of the most relevant parameters, made using the principal components analysis. Our assumption was that, using the principal components analysis before data clustering - in order to select and to classify only the most relevant parameters – the accuracy of clustering is improved, but the practical results showed the opposite fact: the clustering accuracy decreases, with a percentage approximately equal with the percentage of information loss reported by the principal components analysis.

Keywords: Data clustering, medical data, principal components analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456
1066 An Efficient and Generic Hybrid Framework for High Dimensional Data Clustering

Authors: Dharmveer Singh Rajput , P. K. Singh, Mahua Bhattacharya

Abstract:

Clustering in high dimensional space is a difficult problem which is recurrent in many fields of science and engineering, e.g., bioinformatics, image processing, pattern reorganization and data mining. In high dimensional space some of the dimensions are likely to be irrelevant, thus hiding the possible clustering. In very high dimensions it is common for all the objects in a dataset to be nearly equidistant from each other, completely masking the clusters. Hence, performance of the clustering algorithm decreases. In this paper, we propose an algorithmic framework which combines the (reduct) concept of rough set theory with the k-means algorithm to remove the irrelevant dimensions in a high dimensional space and obtain appropriate clusters. Our experiment on test data shows that this framework increases efficiency of the clustering process and accuracy of the results.

Keywords: High dimensional clustering, sub-space, k-means, rough set, discernibility matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1885
1065 Iterative Clustering Algorithm for Analyzing Temporal Patterns of Gene Expression

Authors: Seo Young Kim, Jae Won Lee, Jong Sung Bae

Abstract:

Microarray experiments are information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels. In this paper, we used an iterative clustering method to find temporal patterns of gene expression. We evaluated the performance of this method by applying it to real sporulation data and simulated data. The patterns obtained using the iterative clustering were found to be superior to those obtained using existing clustering algorithms.

Keywords: Clustering, microarray experiment, temporal pattern of gene expression data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1313
1064 On Finite Wordlength Properties of Block-Floating-Point Arithmetic

Authors: Abhijit Mitra

Abstract:

A special case of floating point data representation is block floating point format where a block of operands are forced to have a joint exponent term. This paper deals with the finite wordlength properties of this data format. The theoretical errors associated with the error model for block floating point quantization process is investigated with the help of error distribution functions. A fast and easy approximation formula for calculating signal-to-noise ratio in quantization to block floating point format is derived. This representation is found to be a useful compromise between fixed point and floating point format due to its acceptable numerical error properties over a wide dynamic range.

Keywords: Block floating point, Roundoff error, Block exponent dis-tribution fuction, Signal factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1949
1063 Development Partitioning Intervalwise Block Method for Solving Ordinary Differential Equations

Authors: K.H.Khairul Anuar, K.I.Othman, F.Ishak, Z.B.Ibrahim, Z.Majid

Abstract:

Solving Ordinary Differential Equations (ODEs) by using Partitioning Block Intervalwise (PBI) technique is our aim in this paper. The PBI technique is based on Block Adams Method and Backward Differentiation Formula (BDF). Block Adams Method only use the simple iteration for solving while BDF requires Newtonlike iteration involving Jacobian matrix of ODEs which consumes a considerable amount of computational effort. Therefore, PBI is developed in order to reduce the cost of iteration within acceptable maximum error

Keywords: Adam Block Method, BDF, Ordinary Differential Equations, Partitioning Block Intervalwise

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1624
1062 Clustering Categorical Data Using Hierarchies (CLUCDUH)

Authors: Gökhan Silahtaroğlu

Abstract:

Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).

Keywords: Clustering, tree, split, pruning, entropy, gini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502
1061 Secure Block-Based Video Authentication with Localization and Self-Recovery

Authors: Ammar M. Hassan, Ayoub Al-Hamadi, Yassin M. Y. Hasan, Mohamed A. A. Wahab, Bernd Michaelis

Abstract:

Because of the great advance in multimedia technology, digital multimedia is vulnerable to malicious manipulations. In this paper, a public key self-recovery block-based video authentication technique is proposed which can not only precisely localize the alteration detection but also recover the missing data with high reliability. In the proposed block-based technique, multiple description coding MDC is used to generate two codes (two descriptions) for each block. Although one block code (one description) is enough to rebuild the altered block, the altered block is rebuilt with better quality by the two block descriptions. So using MDC increases the ratability of recovering data. A block signature is computed using a cryptographic hash function and a doubly linked chain is utilized to embed the block signature copies and the block descriptions into the LSBs of distant blocks and the block itself. The doubly linked chain scheme gives the proposed technique the capability to thwart vector quantization attacks. In our proposed technique , anyone can check the authenticity of a given video using the public key. The experimental results show that the proposed technique is reliable for detecting, localizing and recovering the alterations.

Keywords: Authentication, hash function, multiple descriptioncoding, public key encryption, watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1886
1060 Incremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure

Authors: S.Aranganayagi, K.Thangavel

Abstract:

Clustering categorical data is more complicated than the numerical clustering because of its special properties. Scalability and memory constraint is the challenging problem in clustering large data set. This paper presents an incremental algorithm to cluster the categorical data. Frequencies of attribute values contribute much in clustering similar categorical objects. In this paper we propose new similarity measures based on the frequencies of attribute values and its cardinalities. The proposed measures and the algorithm are experimented with the data sets from UCI data repository. Results prove that the proposed method generates better clusters than the existing one.

Keywords: Clustering, Categorical, Incremental, Frequency, Domain

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1783
1059 A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Keywords: Clustering, Cluster Ensemble Methods, Coassociation matrix, Consensus Function, Median Partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2069
1058 Binary Classification Tree with Tuned Observation-based Clustering

Authors: Maythapolnun Athimethphat, Boontarika Lerteerawong

Abstract:

There are several approaches for handling multiclass classification. Aside from one-against-one (OAO) and one-against-all (OAA), hierarchical classification technique is also commonly used. A binary classification tree is a hierarchical classification structure that breaks down a k-class problem into binary sub-problems, each solved by a binary classifier. In each node, a set of classes is divided into two subsets. A good class partition should be able to group similar classes together. Many algorithms measure similarity in term of distance between class centroids. Classes are grouped together by a clustering algorithm when distances between their centroids are small. In this paper, we present a binary classification tree with tuned observation-based clustering (BCT-TOB) that finds a class partition by performing clustering on observations instead of class centroids. A merging step is introduced to merge any insignificant class split. The experiment shows that performance of BCT-TOB is comparable to other algorithms.

Keywords: multiclass classification, hierarchical classification, binary classification tree, clustering, observation-based clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1681
1057 3D Mesh Coarsening via Uniform Clustering

Authors: Shuhua Lai, Kairui Chen

Abstract:

In this paper, we present a fast and efficient mesh coarsening algorithm for 3D triangular meshes. Theis approach can be applied to very complex 3D meshes of arbitrary topology and with millions of vertices. The algorithm is based on the clustering of the input mesh elements, which divides the faces of an input mesh into a given number of clusters for clustering purpose by approximating the Centroidal Voronoi Tessellation of the input mesh. Once a clustering is achieved, it provides us an efficient way to construct uniform tessellations, and therefore leads to good coarsening of polygonal meshes. With proliferation of 3D scanners, this coarsening algorithm is particularly useful for reverse engineering applications of 3D models, which in many cases are dense, non-uniform, irregular and arbitrary topology. Examples demonstrating effectiveness of the new algorithm are also included in the paper.

Keywords: Coarsening, mesh clustering, shape approximation, mesh simplification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1363
1056 Initializing K-Means using Genetic Algorithms

Authors: Bashar Al-Shboul, Sung-Hyon Myaeng

Abstract:

K-Means (KM) is considered one of the major algorithms widely used in clustering. However, it still has some problems, and one of them is in its initialization step where it is normally done randomly. Another problem for KM is that it converges to local minima. Genetic algorithms are one of the evolutionary algorithms inspired from nature and utilized in the field of clustering. In this paper, we propose two algorithms to solve the initialization problem, Genetic Algorithm Initializes KM (GAIK) and KM Initializes Genetic Algorithm (KIGA). To show the effectiveness and efficiency of our algorithms, a comparative study was done among GAIK, KIGA, Genetic-based Clustering Algorithm (GCA), and FCM [19].

Keywords: Clustering, Genetic Algorithms, K-means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2044
1055 Quick Sequential Search Algorithm Used to Decode High-Frequency Matrices

Authors: Mohammed M. Siddeq, Mohammed H. Rasheed, Omar M. Salih, Marcos A. Rodrigues

Abstract:

This research proposes a data encoding and decoding method based on the Matrix Minimization algorithm. This algorithm is applied to high-frequency coefficients for compression/encoding. The algorithm starts by converting every three coefficients to a single value; this is accomplished based on three different keys. The decoding/decompression uses a search method called QSS (Quick Sequential Search) Decoding Algorithm presented in this research based on the sequential search to recover the exact coefficients. In the next step, the decoded data are saved in an auxiliary array. The basic idea behind the auxiliary array is to save all possible decoded coefficients; this is because another algorithm, such as conventional sequential search, could retrieve encoded/compressed data independently from the proposed algorithm. The experimental results showed that our proposed decoding algorithm retrieves original data faster than conventional sequential search algorithms.

Keywords: Matrix Minimization Algorithm, Decoding Sequential Search Algorithm, image compression, Discrete Cosine Transform, Discrete Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 177
1054 Multiple-Level Sequential Pattern Discovery from Customer Transaction Databases

Authors: An Chen, Huilin Ye

Abstract:

Mining sequential patterns from large customer transaction databases has been recognized as a key research topic in database systems. However, the previous works more focused on mining sequential patterns at a single concept level. In this study, we introduced concept hierarchies into this problem and present several algorithms for discovering multiple-level sequential patterns based on the hierarchies. An experiment was conducted to assess the performance of the proposed algorithms. The performances of the algorithms were measured by the relative time spent on completing the mining tasks on two different datasets. The experimental results showed that the performance depends on the characteristics of the datasets and the pre-defined threshold of minimal support for each level of the concept hierarchy. Based on the experimental results, some suggestions were also given for how to select appropriate algorithm for a certain datasets.

Keywords: Data Mining, Multiple-Level Sequential Pattern, Concept Hierarchy, Customer Transaction Database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1409
1053 Implementation of RC5 Block Cipher Algorithm for Image Cryptosystems

Authors: Hossam El-din H. Ahmed, Hamdy M. Kalash, Osama S. Farag Allah

Abstract:

This paper examines the implementation of RC5 block cipher for digital images along with its detailed security analysis. A complete specification for the method of application of the RC5 block cipher to digital images is given. The security analysis of RC5 block cipher for digital images against entropy attack, bruteforce, statistical, and differential attacks is explored from strict cryptographic viewpoint. Experiments and results verify and prove that RC5 block cipher is highly secure for real-time image encryption from cryptographic viewpoint. Thorough experimental tests are carried out with detailed analysis, demonstrating the high security of RC5 block cipher algorithm.

Keywords: Image encryption, security analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3618
1052 Applying Clustering of Hierarchical K-means-like Algorithm on Arabic Language

Authors: Sameh H. Ghwanmeh

Abstract:

In this study a clustering technique has been implemented which is K-Means like with hierarchical initial set (HKM). The goal of this study is to prove that clustering document sets do enhancement precision on information retrieval systems, since it was proved by Bellot & El-Beze on French language. A comparison is made between the traditional information retrieval system and the clustered one. Also the effect of increasing number of clusters on precision is studied. The indexing technique is Term Frequency * Inverse Document Frequency (TF * IDF). It has been found that the effect of Hierarchical K-Means Like clustering (HKM) with 3 clusters over 242 Arabic abstract documents from the Saudi Arabian National Computer Conference has significant results compared with traditional information retrieval system without clustering. Additionally it has been found that it is not necessary to increase the number of clusters to improve precision more.

Keywords: Hierarchical K-mean like clustering (HKM), Kmeans, cluster centroids, initial partition, and document distances

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2531
1051 An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data

Authors: Minsoo Lee, Yun-mi Kim, Yearn Jeong Kim, Yoon-kyung Lee, Hyejung Yoon

Abstract:

Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.

Keywords: Ant colony system, biological data, clustering, DNA chip.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1932
1050 Analysis of Diverse Cluster Ensemble Techniques

Authors: S. Sarumathi, N. Shanthi, P. Ranjetha

Abstract:

Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.

Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1794