Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 240

Search results for: Clusters

60 Towards Growing Self-Organizing Neural Networks with Fixed Dimensionality

Authors: Guojian Cheng, Tianshi Liu, Jiaxin Han, Zheng Wang

Abstract:

The competitive learning is an adaptive process in which the neurons in a neural network gradually become sensitive to different input pattern clusters. The basic idea behind the Kohonen-s Self-Organizing Feature Maps (SOFM) is competitive learning. SOFM can generate mappings from high-dimensional signal spaces to lower dimensional topological structures. The main features of this kind of mappings are topology preserving, feature mappings and probability distribution approximation of input patterns. To overcome some limitations of SOFM, e.g., a fixed number of neural units and a topology of fixed dimensionality, Growing Self-Organizing Neural Network (GSONN) can be used. GSONN can change its topological structure during learning. It grows by learning and shrinks by forgetting. To speed up the training and convergence, a new variant of GSONN, twin growing cell structures (TGCS) is presented here. This paper first gives an introduction to competitive learning, SOFM and its variants. Then, we discuss some GSONN with fixed dimensionality, which include growing cell structures, its variants and the author-s model: TGCS. It is ended with some testing results comparison and conclusions.

Keywords: Artificial Neural Networks, competitive learning, self-organizing feature maps, Growing cell structures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1191
59 Determining Cluster Boundaries Using Particle Swarm Optimization

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.

Keywords: Data Mining, Clustering, Self-Organizing Maps, Particle Swarm Optimization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1384
58 DCBOR: A Density Clustering Based on Outlier Removal

Authors: F. A. Torkey, A. M. Salem, M. A. Ramadan, A. M. Fahim, G. Saake

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Keywords: Data Clustering, Clustering Algorithms, Arbitrary Shape of clusters, Handling Noise

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1548
57 K-Means for Spherical Clusters with Large Variance in Sizes

Authors: F. A. Torkey, A. M. Salem, M. A. Ramadan, A. M. Fahim, G. Saake

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.

Keywords: Cluster Analysis, Data Clustering, k-means

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2948
56 A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data

Authors: Sauravjyoti Sarmah, Rosy Das, Dhruba Kr. Bhattacharyya

Abstract:

Clustering algorithms help to understand the hidden information present in datasets. A dataset may contain intrinsic and nested clusters, the detection of which is of utmost importance. This paper presents a Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets. For handling massive datasets, we implemented our method using a 'sharednothing' architecture where multiple computers are interconnected over a network. Experimental results are reported to establish the superiority of the technique in terms of scale-up, speedup as well as cluster quality.

Keywords: Clustering, Density-based, Grid-based, Adaptive Grid

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
55 Bioinformatic Analysis of Retroelement-Associated Sequences in Human and Mouse Promoters

Authors: Nadezhda M. Usmanova, Nikolai V. Tomilin

Abstract:

Mammalian genomes contain large number of retroelements (SINEs, LINEs and LTRs) which could affect expression of protein coding genes through associated transcription factor binding sites (TFBS). Activity of the retroelement-associated TFBS in many genes is confirmed experimentally but their global functional impact remains unclear. Human SINEs (Alu repeats) and mouse SINEs (B1 and B2 repeats) are known to be clustered in GCrich gene rich genome segments consistent with the view that they can contribute to regulation of gene expression. We have shown earlier that Alu are involved in formation of cis-regulatory modules (clusters of TFBS) in human promoters, and other authors reported that Alu located near promoter CpG islands have an increased frequency of CpG dinucleotides suggesting that these Alu are undermethylated. Human Alu and mouse B1/B2 elements have an internal bipartite promoter for RNA polymerase III containing conserved sequence motif called B-box which can bind basal transcription complex TFIIIC. It has been recently shown that TFIIIC binding to B-box leads to formation of a boundary which limits spread of repressive chromatin modifications in S. pombe. SINEassociated B-boxes may have similar function but conservation of TFIIIC binding sites in SINEs located near mammalian promoters has not been studied earlier. Here we analysed abundance and distribution of retroelements (SINEs, LINEs and LTRs) in annotated sequences of the Database of mammalian transcription start sites (DBTSS). Fractions of SINEs in human and mouse promoters are slightly lower than in all genome but >40% of human and mouse promoters contain Alu or B1/B2 elements within -1000 to +200 bp interval relative to transcription start site (TSS). Most of these SINEs is associated with distal segments of promoters (-1000 to -200 bp relative to TSS) indicating that their insertion at distances >200 bp upstream of TSS is tolerated during evolution. Distribution of SINEs in promoters correlates negatively with the distribution of CpG sequences. Using analysis of abundance of 12-mer motifs from the B1 and Alu consensus sequences in genome and DBTSS it has been confirmed that some subsegments of Alu and B1 elements are poorly conserved which depends in part on the presence of CpG dinucleotides. One of these CpG-containing subsegments in B1 elements overlaps with SINE-associated B-box and it shows better conservation in DBTSS compared to genomic sequences. It has been also studied conservation in DBTSS and genome of the B-box containing segments of old (AluJ, AluS) and young (AluY) Alu repeats and found that CpG sequence of the B-box of old Alu is better conserved in DBTSS than in genome. This indicates that Bbox- associated CpGs in promoters are better protected from methylation and mutation than B-box-associated CpGs in genomic SINEs. These results are consistent with the view that potential TFIIIC binding motifs in SINEs associated with human and mouse promoters may be functionally important. These motifs may protect promoters from repressive histone modifications which spread from adjacent sequences. This can potentially explain well known clustering of SINEs in GC-rich gene rich genome compartments and existence of unmethylated CpG islands.

Keywords: promoter, Retroelement, CpG island, DNAmethylation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1260
54 Symmetry Breaking and the Emergence of Branching Structures in Morphogenesis: Minimal Conditions and Mechanical Interactions between Cells

Authors: M. Margarida Costa, Jorge Simão

Abstract:

The minimal condition for symmetry breaking in morphogenesis of cellular population was investigated using cellular automata based on reaction-diffusion dynamics. In particular, the study looked for the possibility of the emergence of branching structures due to mechanical interactions. The model used two types of cells an external gradient. The results showed that the external gradient influenced movement of cell type-I, also revealed that clusters formed by cells type-II worked as barrier to movement of cells type-I.

Keywords: Morphogenesis, branching structures, symmetrybreaking

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 940
53 Observation of the Correlations between Pair Wise Interaction and Functional Organization of the Proteins, in the Protein Interaction Network of Saccaromyces Cerevisiae

Authors: N. Tuncbag, T. Haliloglu, O. Keskin

Abstract:

Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins.

Keywords: biclustering, Pair-wise protein interactions, DIP database, functional correlations

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1341
52 A New Method for Detection of Artificial Objects and Materials from Long Distance Environmental Images

Authors: H. Dujmic, V. Papic, H. Turic

Abstract:

The article presents a new method for detection of artificial objects and materials from images of the environmental (non-urban) terrain. Our approach uses the hue and saturation (or Cb and Cr) components of the image as the input to the segmentation module that uses the mean shift method. The clusters obtained as the output of this stage have been processed by the decision-making module in order to find the regions of the image with the significant possibility of representing human. Although this method will detect various non-natural objects, it is primarily intended and optimized for detection of humans; i.e. for search and rescue purposes in non-urban terrain where, in normal circumstances, non-natural objects shouldn-t be present. Real world images are used for the evaluation of the method.

Keywords: Image Segmentation, target detection, Landscape surveillance, mean shift algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1071
51 A Strategy to Optimize the SPC Scheme for Mass Production of HDD Arm with ClusteringTechnique and Three-Way Control Chart

Authors: W. Chattinnawat

Abstract:

Consider a mass production of HDD arms where hundreds of CNC machines are used to manufacturer the HDD arms. According to an overwhelming number of machines and models of arm, construction of separate control chart for monitoring each HDD arm model by each machine is not feasible. This research proposed a strategy to optimize the SPC management on shop floor. The procedure started from identifying the clusters of the machine with similar manufacturing performance using clustering technique. The three way control chart ( I - MR - R ) is then applied to each clustered group of machine. This proposed research has advantageous to the manufacturer in terms of not only better performance of the SPC but also the quality management paradigm.

Keywords: Three way control chart. I - MR - R, between/within variation, HDD arm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1374
50 Delay Preserving Substructures in Wireless Networks Using Edge Difference between a Graph and its Square Graph

Authors: T. N. Janakiraman, J. Janet Lourds Rani

Abstract:

In practice, wireless networks has the property that the signal strength attenuates with respect to the distance from the base station, it could be better if the nodes at two hop away are considered for better quality of service. In this paper, we propose a procedure to identify delay preserving substructures for a given wireless ad-hoc network using a new graph operation G 2 – E (G) = G* (Edge difference of square graph of a given graph and the original graph). This operation helps to analyze some induced substructures, which preserve delay in communication among them. This operation G* on a given graph will induce a graph, in which 1- hop neighbors of any node are at 2-hop distance in the original network. In this paper, we also identify some delay preserving substructures in G*, which are (i) set of all nodes, which are mutually at 2-hop distance in G that will form a clique in G*, (ii) set of nodes which forms an odd cycle C2k+1 in G, will form an odd cycle in G* and the set of nodes which form a even cycle C2k in G that will form two disjoint companion cycles ( of same parity odd/even) of length k in G*, (iii) every path of length 2k+1 or 2k in G will induce two disjoint paths of length k in G*, and (iv) set of nodes in G*, which induces a maximal connected sub graph with radius 1 (which identifies a substructure with radius equal 2 and diameter at most 4 in G). The above delay preserving sub structures will behave as good clusters in the original network.

Keywords: Cycles, clique, delay preserving substructures, maximal connected sub graph

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 957
49 Enhanced Clustering Analysis and Visualization Using Kohonen's Self-Organizing Feature Map Networks

Authors: Kasthurirangan Gopalakrishnan, Siddhartha Khaitan, Anshu Manik

Abstract:

Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.

Keywords: Artificial Neural Networks, Cluster Analysis, Kohonen maps, wine recognition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743
48 Grid Computing in Physics and Life Sciences

Authors: Heinz Stockinger

Abstract:

Certain sciences such as physics, chemistry or biology, have a strong computational aspect and use computing infrastructures to advance their scientific goals. Often, high performance and/or high throughput computing infrastructures such as clusters and computational Grids are applied to satisfy computational needs. In addition, these sciences are sometimes characterised by scientific collaborations requiring resource sharing which is typically provided by Grid approaches. In this article, I discuss Grid computing approaches in High Energy Physics as well as in bioinformatics and highlight some of my experience in both scientific domains.

Keywords: Web services, Bioinformatics, Physics, Grid Computing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1182
47 Adaptive Gaussian Mixture Model for Skin Color Segmentation

Authors: Reza Hassanpour, Asadollah Shahbahrami, Stephan Wong

Abstract:

Skin color based tracking techniques often assume a static skin color model obtained either from an offline set of library images or the first few frames of a video stream. These models can show a weak performance in presence of changing lighting or imaging conditions. We propose an adaptive skin color model based on the Gaussian mixture model to handle the changing conditions. Initial estimation of the number and weights of skin color clusters are obtained using a modified form of the general Expectation maximization algorithm, The model adapts to changes in imaging conditions and refines the model parameters dynamically using spatial and temporal constraints. Experimental results show that the method can be used in effectively tracking of hand and face regions.

Keywords: Segmentation, Adaptation, Tracking, Face Detection, Gaussian Mixture Model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2094
46 MIBiClus: Mutual Information based Biclustering Algorithm

Authors: Neelima Gupta, Seema Aggarwal

Abstract:

Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.

Keywords: biclustering, mutual information

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1306
45 Memory Leak Detection in Distributed System

Authors: Prabu D., Roohi Shabrin S., Devi Prasad B., Pallavi R. S., Revathi P.

Abstract:

Due to memory leaks, often-valuable system memory gets wasted and denied for other processes thereby affecting the computational performance. If an application-s memory usage exceeds virtual memory size, it can leads to system crash. Current memory leak detection techniques for clusters are reactive and display the memory leak information after the execution of the process (they detect memory leak only after it occur). This paper presents a Dynamic Memory Monitoring Agent (DMMA) technique. DMMA framework is a dynamic memory leak detection, that detects the memory leak while application is in execution phase, when memory leak in any process in the cluster is identified by DMMA it gives information to the end users to enable them to take corrective actions and also DMMA submit the affected process to healthy node in the system. Thus provides reliable service to the user. DMMA maintains information about memory consumption of executing processes and based on this information and critical states, DMMA can improve reliability and efficaciousness of cluster computing.

Keywords: Cluster Computing, Dynamic Memory Monitoring Agent (DMMA), Memory Leak, Fault Tolerant Framework, Dynamic Memory Leak Detection (DMLD)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1781
44 A Symbol by Symbol Clustering Based Blind Equalizer

Authors: Kristina Georgoulakis

Abstract:

A new blind symbol by symbol equalizer is proposed. The operation of the proposed equalizer is based on the geometric properties of the two dimensional data constellation. An unsupervised clustering technique is used to locate the clusters formed by the received data. The symmetric properties of the clusters labels are subsequently utilized in order to label the clusters. Following this step, the received data are compared to clusters and decisions are made on a symbol by symbol basis, by assigning to each data the label of the nearest cluster. The operation of the equalizer is investigated both in linear and nonlinear channels. The performance of the proposed equalizer is compared to the performance of a CMAbased blind equalizer.

Keywords: Channel Equalization, blind equalization, cluster based equalisers

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1086
43 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Ontology, Text Mining, Feature Weighting, Subspace Clustering, Cluster Interpretation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2094
42 Optimizing of Fuzzy C-Means Clustering Algorithm Using GA

Authors: Mohanad Alata, Mohammad Molhim, Abdullah Ramini

Abstract:

Fuzzy C-means Clustering algorithm (FCM) is a method that is frequently used in pattern recognition. It has the advantage of giving good modeling results in many cases, although, it is not capable of specifying the number of clusters by itself. In FCM algorithm most researchers fix weighting exponent (m) to a conventional value of 2 which might not be the appropriate for all applications. Consequently, the main objective of this paper is to use the subtractive clustering algorithm to provide the optimal number of clusters needed by FCM algorithm by optimizing the parameters of the subtractive clustering algorithm by an iterative search approach and then to find an optimal weighting exponent (m) for the FCM algorithm. In order to get an optimal number of clusters, the iterative search approach is used to find the optimal single-output Sugenotype Fuzzy Inference System (FIS) model by optimizing the parameters of the subtractive clustering algorithm that give minimum least square error between the actual data and the Sugeno fuzzy model. Once the number of clusters is optimized, then two approaches are proposed to optimize the weighting exponent (m) in the FCM algorithm, namely, the iterative search approach and the genetic algorithms. The above mentioned approach is tested on the generated data from the original function and optimal fuzzy models are obtained with minimum error between the real data and the obtained fuzzy models.

Keywords: Fuzzy Clustering, fuzzy c-means, Genetic Algorithm, Sugeno fuzzy systems

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2769
41 Self-Organization of Clusters having Locally Distributed Patterns for Synchronized Inputs

Authors: Toshio Akimitsu, Yoichi Okabe, Akira Hirose

Abstract:

Many experimental results suggest that more precise spike timing is significant in neural information processing. We construct a self-organization model using the spatiotemporal patterns, where Spike-Timing Dependent Plasticity (STDP) tunes the conduction delays between neurons. We show that the fluctuation of conduction delays causes globally continuous and locally distributed firing patterns through the self-organization.

Keywords: Self-Organization, synfire-chain, distributed information representation, Spike-Timing Dependent Plasticity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 837
40 Real Time Approach for Data Placement in Wireless Sensor Networks

Authors: SANJEEV GUPTA, Mayank Dave

Abstract:

The issue of real-time and reliable report delivery is extremely important for taking effective decision in a real world mission critical Wireless Sensor Network (WSN) based application. The sensor data behaves differently in many ways from the data in traditional databases. WSNs need a mechanism to register, process queries, and disseminate data. In this paper we propose an architectural framework for data placement and management. We propose a reliable and real time approach for data placement and achieving data integrity using self organized sensor clusters. Instead of storing information in individual cluster heads as suggested in some protocols, in our architecture we suggest storing of information of all clusters within a cell in the corresponding base station. For data dissemination and action in the wireless sensor network we propose to use Action and Relay Stations (ARS). To reduce average energy dissipation of sensor nodes, the data is sent to the nearest ARS rather than base station. We have designed our architecture in such a way so as to achieve greater energy savings, enhanced availability and reliability.

Keywords: Wireless Sensor Networks, cluster head, data reliability, real time communication

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1498
39 Generating Normally Distributed Clusters by Means of a Self-organizing Growing Neural Network– An Application to Market Segmentation –

Authors: Reinhold Decker, Christian Holsing, Sascha Lerke

Abstract:

This paper presents a new growing neural network for cluster analysis and market segmentation, which optimizes the size and structure of clusters by iteratively checking them for multivariate normality. We combine the recently published SGNN approach [8] with the basic principle underlying the Gaussian-means algorithm [13] and the Mardia test for multivariate normality [18, 19]. The new approach distinguishes from existing ones by its holistic design and its great autonomy regarding the clustering process as a whole. Its performance is demonstrated by means of synthetic 2D data and by real lifestyle survey data usable for market segmentation.

Keywords: Clustering, Self-Organization, Artificial Neural Network, Market Segmentation, multivariatenormality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 890
38 Non-Overlapping Hierarchical Index Structure for Similarity Search

Authors: Mounira Taileb, Sid Lamrous, Sami Touati

Abstract:

In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors.

Keywords: Similarity Search, multimedia databases, K-nearest neighbour search, multi-dimensional indexing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1237
37 Predictive Clustering Hybrid Regression(pCHR) Approach and Its Application to Sucrose-Based Biohydrogen Production

Authors: Nikhil, Ari Visa, Chiu-Yue Lin, Jaakko A. Puhakka, Olli Yli-Harja, Chin-Chao Chen

Abstract:

A predictive clustering hybrid regression (pCHR) approach was developed and evaluated using dataset from H2- producing sucrose-based bioreactor operated for 15 months. The aim was to model and predict the H2-production rate using information available about envirome and metabolome of the bioprocess. Selforganizing maps (SOM) and Sammon map were used to visualize the dataset and to identify main metabolic patterns and clusters in bioprocess data. Three metabolic clusters: acetate coupled with other metabolites, butyrate only, and transition phases were detected. The developed pCHR model combines principles of k-means clustering, kNN classification and regression techniques. The model performed well in modeling and predicting the H2-production rate with mean square error values of 0.0014 and 0.0032, respectively.

Keywords: Biohydrogen, Bioprocess Modeling, clusteringhybrid regression

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1436
36 A Computer Aided Detection (CAD) System for Microcalcifications in Mammograms - MammoScan mCaD

Authors: Kjersti Engan, Thor Ole Gulsrud, Karl Fredrik Fretheim, Barbro Furebotten Iversen, Liv Eriksen

Abstract:

Clusters of microcalcifications in mammograms are an important sign of breast cancer. This paper presents a complete Computer Aided Detection (CAD) scheme for automatic detection of clustered microcalcifications in digital mammograms. The proposed system, MammoScan μCaD, consists of three main steps. Firstly all potential microcalcifications are detected using a a method for feature extraction, VarMet, and adaptive thresholding. This will also give a number of false detections. The goal of the second step, Classifier level 1, is to remove everything but microcalcifications. The last step, Classifier level 2, uses learned dictionaries and sparse representations as a texture classification technique to distinguish single, benign microcalcifications from clustered microcalcifications, in addition to remove some remaining false detections. The system is trained and tested on true digital data from Stavanger University Hospital, and the results are evaluated by radiologists. The overall results are promising, with a sensitivity > 90 % and a low false detection rate (approx 1 unwanted pr. image, or 0.3 false pr. image).

Keywords: classification, Detection, CAD, Texture, mammogram, dictionary learning, adaptive thresholding, microcalcifications, FTCM, MammoScan μCaD, VarMet

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1459
35 A New Hybrid RMN Image Segmentation Algorithm

Authors: Abdelouahab Moussaoui, Nabila Ferahta, Victor Chen

Abstract:

The development of aid's systems for the medical diagnosis is not easy thing because of presence of inhomogeneities in the MRI, the variability of the data from a sequence to the other as well as of other different source distortions that accentuate this difficulty. A new automatic, contextual, adaptive and robust segmentation procedure by MRI brain tissue classification is described in this article. A first phase consists in estimating the density of probability of the data by the Parzen-Rozenblatt method. The classification procedure is completely automatic and doesn't make any assumptions nor on the clusters number nor on the prototypes of these clusters since these last are detected in an automatic manner by an operator of mathematical morphology called skeleton by influence zones detection (SKIZ). The problem of initialization of the prototypes as well as their number is transformed in an optimization problem; in more the procedure is adaptive since it takes in consideration the contextual information presents in every voxel by an adaptive and robust non parametric model by the Markov fields (MF). The number of bad classifications is reduced by the use of the criteria of MPM minimization (Maximum Posterior Marginal).

Keywords: Clustering, Automatic classification, Image Segmentation, SKIZ, MarkovFields, Maximum Posterior Marginal (MPM)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1100
34 Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance

Authors: S. Deelers, S. Auwatanamongkol

Abstract:

In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Cells are partitioned one at a time until the number of cells equals to the predefined number of clusters, K. The centers of the K cells become the initial cluster centers for K-means. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.

Keywords: clustering algorithm, K-means algorithm, Datapartitioning, Initial cluster centers

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2497
33 Multi-Agent Systems for Intelligent Clustering

Authors: Kyung-Whan Oh, Jung-Eun Park

Abstract:

Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.

Keywords: PCA, multi-agent system, SOM, Intelligent Clustering, VC(Variance Criterion)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1366
32 Modeling and Simulations of Complex Low- Dimensional systems: Testing the Efficiency of Parallelization

Authors: Ryszard Matysiak, Grzegorz Kamieniarz

Abstract:

The deterministic quantum transfer-matrix (QTM) technique and its mathematical background are presented. This important tool in computational physics can be applied to a class of the real physical low-dimensional magnetic systems described by the Heisenberg hamiltonian which includes the macroscopic molecularbased spin chains, small size magnetic clusters embedded in some supramolecules and other interesting compounds. Using QTM, the spin degrees of freedom are accurately taken into account, yielding the thermodynamical functions at finite temperatures. In order to test the application for the susceptibility calculations to run in the parallel environment, the speed-up and efficiency of parallelization are analyzed on our platform SGI Origin 3800 with p = 128 processor units. Using Message Parallel Interface (MPI) system libraries we find the efficiency of the code of 94% for p = 128 that makes our application highly scalable.

Keywords: parallelization, Deterministic simulations, low-dimensional magnets, modeling of complex systems

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1141
31 Influence of Textured Clusters on the Goss Grains Growth in Silicon Steels Consideration of Energy and Mobility

Authors: H. Afer, N. Rouag, R. Penelle

Abstract:

In the Fe-3%Si sheets, grade Hi-B, with AlN and MnS as inhibitors, the Goss grains which abnormally grow do not have a size greater than the average size of the primary matrix. In this heterogeneous microstructure, the size factor is not a required condition for the secondary recrystallization. The onset of the small Goss grain abnormal growth appears to be related to a particular behavior of their grain boundaries, to the local texture and to the distribution of the inhibitors. The presence and the evolution of oriented clusters ensure to the small Goss grains a favorable neighborhood to grow. The modified Monte-Carlo approach, which is applied, considers the local environment of each grain. The grain growth is dependent of its real spatial position; the matrix heterogeneity is then taken into account. The grain growth conditions are considered in the global matrix and in different matrixes corresponding to A component clusters. The grain growth behaviour is considered with introduction of energy only, energy and mobility, energy and mobility and precipitates.

Keywords: abnormal grain growth, neighbourhood, grain boundary energy andmobility, oriented clusters

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1073