Search results for: Data Clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7536

Search results for: Data Clustering

7446 Binary Classification Tree with Tuned Observation-based Clustering

Authors: Maythapolnun Athimethphat, Boontarika Lerteerawong

Abstract:

There are several approaches for handling multiclass classification. Aside from one-against-one (OAO) and one-against-all (OAA), hierarchical classification technique is also commonly used. A binary classification tree is a hierarchical classification structure that breaks down a k-class problem into binary sub-problems, each solved by a binary classifier. In each node, a set of classes is divided into two subsets. A good class partition should be able to group similar classes together. Many algorithms measure similarity in term of distance between class centroids. Classes are grouped together by a clustering algorithm when distances between their centroids are small. In this paper, we present a binary classification tree with tuned observation-based clustering (BCT-TOB) that finds a class partition by performing clustering on observations instead of class centroids. A merging step is introduced to merge any insignificant class split. The experiment shows that performance of BCT-TOB is comparable to other algorithms.

Keywords: multiclass classification, hierarchical classification, binary classification tree, clustering, observation-based clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1679
7445 3D Mesh Coarsening via Uniform Clustering

Authors: Shuhua Lai, Kairui Chen

Abstract:

In this paper, we present a fast and efficient mesh coarsening algorithm for 3D triangular meshes. Theis approach can be applied to very complex 3D meshes of arbitrary topology and with millions of vertices. The algorithm is based on the clustering of the input mesh elements, which divides the faces of an input mesh into a given number of clusters for clustering purpose by approximating the Centroidal Voronoi Tessellation of the input mesh. Once a clustering is achieved, it provides us an efficient way to construct uniform tessellations, and therefore leads to good coarsening of polygonal meshes. With proliferation of 3D scanners, this coarsening algorithm is particularly useful for reverse engineering applications of 3D models, which in many cases are dense, non-uniform, irregular and arbitrary topology. Examples demonstrating effectiveness of the new algorithm are also included in the paper.

Keywords: Coarsening, mesh clustering, shape approximation, mesh simplification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1362
7444 Initializing K-Means using Genetic Algorithms

Authors: Bashar Al-Shboul, Sung-Hyon Myaeng

Abstract:

K-Means (KM) is considered one of the major algorithms widely used in clustering. However, it still has some problems, and one of them is in its initialization step where it is normally done randomly. Another problem for KM is that it converges to local minima. Genetic algorithms are one of the evolutionary algorithms inspired from nature and utilized in the field of clustering. In this paper, we propose two algorithms to solve the initialization problem, Genetic Algorithm Initializes KM (GAIK) and KM Initializes Genetic Algorithm (KIGA). To show the effectiveness and efficiency of our algorithms, a comparative study was done among GAIK, KIGA, Genetic-based Clustering Algorithm (GCA), and FCM [19].

Keywords: Clustering, Genetic Algorithms, K-means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2040
7443 Information Gain Ratio Based Clustering for Investigation of Environmental Parameters Effects on Human Mental Performance

Authors: H. Mehdi, Kh. S. Karimov, A. A. Kavokin

Abstract:

Methods of clustering which were developed in the data mining theory can be successfully applied to the investigation of different kinds of dependencies between the conditions of environment and human activities. It is known, that environmental parameters such as temperature, relative humidity, atmospheric pressure and illumination have significant effects on the human mental performance. To investigate these parameters effect, data mining technique of clustering using entropy and Information Gain Ratio (IGR) K(Y/X) = (H(X)–H(Y/X))/H(Y) is used, where H(Y)=-ΣPi ln(Pi). This technique allows adjusting the boundaries of clusters. It is shown that the information gain ratio (IGR) grows monotonically and simultaneously with degree of connectivity between two variables. This approach has some preferences if compared, for example, with correlation analysis due to relatively smaller sensitivity to shape of functional dependencies. Variant of an algorithm to implement the proposed method with some analysis of above problem of environmental effects is also presented. It was shown that proposed method converges with finite number of steps.

Keywords: Clustering, Correlation analysis, EnvironmentalParameters, Information Gain Ratio, Mental Performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1781
7442 Segmentation of Images through Clustering to Extract Color Features: An Application forImage Retrieval

Authors: M. V. Sudhamani, C. R. Venugopal

Abstract:

This paper deals with the application for contentbased image retrieval to extract color feature from natural images stored in the image database by segmenting the image through clustering. We employ a class of nonparametric techniques in which the data points are regarded as samples from an unknown probability density. Explicit computation of the density is avoided by using the mean shift procedure, a robust clustering technique, which does not require prior knowledge of the number of clusters, and does not constrain the shape of the clusters. A non-parametric technique for the recovery of significant image features is presented and segmentation module is developed using the mean shift algorithm to segment each image. In these algorithms, the only user set parameter is the resolution of the analysis and either gray level or color images are accepted as inputs. Extensive experimental results illustrate excellent performance.

Keywords: Segmentation, Clustering, Image Retrieval, Features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1419
7441 Applying Clustering of Hierarchical K-means-like Algorithm on Arabic Language

Authors: Sameh H. Ghwanmeh

Abstract:

In this study a clustering technique has been implemented which is K-Means like with hierarchical initial set (HKM). The goal of this study is to prove that clustering document sets do enhancement precision on information retrieval systems, since it was proved by Bellot & El-Beze on French language. A comparison is made between the traditional information retrieval system and the clustered one. Also the effect of increasing number of clusters on precision is studied. The indexing technique is Term Frequency * Inverse Document Frequency (TF * IDF). It has been found that the effect of Hierarchical K-Means Like clustering (HKM) with 3 clusters over 242 Arabic abstract documents from the Saudi Arabian National Computer Conference has significant results compared with traditional information retrieval system without clustering. Additionally it has been found that it is not necessary to increase the number of clusters to improve precision more.

Keywords: Hierarchical K-mean like clustering (HKM), Kmeans, cluster centroids, initial partition, and document distances

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2529
7440 Fuzzy Clustering Analysis in Real Estate Companies in China

Authors: Jianfeng Li, Feng Jin, Xiaoyu Yang

Abstract:

This paper applies fuzzy clustering algorithm in classifying real estate companies in China according to some general financial indexes, such as income per share, share accumulation fund, net profit margins, weighted net assets yield and shareholders' equity. By constructing and normalizing initial partition matrix, getting fuzzy similar matrix with Minkowski metric and gaining the transitive closure, the dynamic fuzzy clustering analysis for real estate companies is shown clearly that different clustered result change gradually with the threshold reducing, and then, it-s shown there is the similar relationship with the prices of those companies in stock market. In this way, it-s great valuable in contrasting the real estate companies- financial condition in order to grasp some good chances of investment, and so on.

Keywords: Fuzzy clustering algorithm, data mining, real estate company, financial analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870
7439 An Efficient Clustering Technique for Copy-Paste Attack Detection

Authors: N. Chaitawittanun, M. Munlin

Abstract:

Due to rapid advancement of powerful image processing software, digital images are easy to manipulate and modify by ordinary people. Lots of digital images are edited for a specific purpose and more difficult to distinguish form their original ones. We propose a clustering method to detect a copy-move image forgery of JPEG, BMP, TIFF, and PNG. The process starts with reducing the color of the photos. Then, we use the clustering technique to divide information of measuring data by Hausdorff Distance. The result shows that the purposed methods is capable of inspecting the image file and correctly identify the forgery.

Keywords: Image detection, forgery image, copy-paste.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1274
7438 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: Clustering algorithms, coastal engineering, data mining, data summarization, statistical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1187
7437 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: Fuzzy C-means clustering, Fuzzy C-means clustering based attribute weighting, Pima Indians diabetes dataset, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1710
7436 MCOKE: Multi-Cluster Overlapping K-Means Extension Algorithm

Authors: Said Baadel, Fadi Thabtah, Joan Lu

Abstract:

Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold be defined a priori which can be difficult to determine by novice users.

Keywords: Data mining, k-means, MCOKE, overlapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2679
7435 Applying Hybrid Graph Drawing and Clustering Methods on Stock Investment Analysis

Authors: Mouataz Zreika, Maria Estela Varua

Abstract:

Stock investment decisions are often made based on current events of the global economy and the analysis of historical data. Conversely, visual representation could assist investors’ gain deeper understanding and better insight on stock market trends more efficiently. The trend analysis is based on long-term data collection. The study adopts a hybrid method that combines the Clustering algorithm and Force-directed algorithm to overcome the scalability problem when visualizing large data. This method exemplifies the potential relationships between each stock, as well as determining the degree of strength and connectivity, which will provide investors another understanding of the stock relationship for reference. Information derived from visualization will also help them make an informed decision. The results of the experiments show that the proposed method is able to produced visualized data aesthetically by providing clearer views for connectivity and edge weights.

Keywords: Clustering, force-directed, graph drawing, stock investment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1552
7434 A Software Framework for Predicting Oil-Palm Yield from Climate Data

Authors: Mohd. Noor Md. Sap, A. Majid Awan

Abstract:

Intelligent systems based on machine learning techniques, such as classification, clustering, are gaining wide spread popularity in real world applications. This paper presents work on developing a software system for predicting crop yield, for example oil-palm yield, from climate and plantation data. At the core of our system is a method for unsupervised partitioning of data for finding spatio-temporal patterns in climate data using kernel methods which offer strength to deal with complex data. This work gets inspiration from the notion that a non-linear data transformation into some high dimensional feature space increases the possibility of linear separability of the patterns in the transformed space. Therefore, it simplifies exploration of the associated structure in the data. Kernel methods implicitly perform a non-linear mapping of the input data into a high dimensional feature space by replacing the inner products with an appropriate positive definite function. In this paper we present a robust weighted kernel k-means algorithm incorporating spatial constraints for clustering the data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data, and thus can be used for predicting oil-palm yield by analyzing various factors affecting the yield.

Keywords: Pattern analysis, clustering, kernel methods, spatial data, crop yield

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1922
7433 Using Data Mining Techniques for Finding Cardiac Outlier Patients

Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi

Abstract:

In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.

Keywords: Data Mining, Clustering, Classification, Drug Utilization..

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1850
7432 Selection Initial modes for Belief K-modes Method

Authors: Sarra Ben Hariz, Zied Elouedi, Khaled Mellouli

Abstract:

The belief K-modes method (BKM) approach is a new clustering technique handling uncertainty in the attribute values of objects in both the cluster construction task and the classification one. Like the standard version of this method, the BKM results depend on the chosen initial modes. So, one selection method of initial modes is developed, in this paper, aiming at improving the performances of the BKM approach. Experiments with several sets of real data show that by considered the developed selection initial modes method, the clustering algorithm produces more accurate results.

Keywords: Clustering, Uncertainty, Belief function theory, Belief K-modes Method, Initial modes selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1770
7431 Energy-Efficient Clustering Protocol in Wireless Sensor Networks for Healthcare Monitoring

Authors: Ebrahim Farahmand, Ali Mahani

Abstract:

Wireless sensor networks (WSNs) can facilitate continuous monitoring of patients and increase early detection of emergency conditions and diseases. High density WSNs helps us to accurately monitor a remote environment by intelligently combining the data from the individual nodes. Due to energy capacity limitation of sensors, enhancing the lifetime and the reliability of WSNs are important factors in designing of these networks. The clustering strategies are verified as effective and practical algorithms for reducing energy consumption in WSNs and can tackle WSNs limitations. In this paper, an Energy-efficient weight-based Clustering Protocol (EWCP) is presented. Artificial retina is selected as a case study of WSNs applied in body sensors. Cluster heads’ (CHs) selection is equipped with energy efficient parameters. Moreover, cluster members are selected based on their distance to the selected CHs. Comparing with the other benchmark protocols, the lifetime of EWCP is improved significantly.

Keywords: Clustering of WSNs, healthcare monitoring, weight-based clustering, wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1503
7430 Secure Data Aggregation Using Clusters in Sensor Networks

Authors: Prakash G L, Thejaswini M, S H Manjula, K R Venugopal, L M Patnaik

Abstract:

Wireless sensor network can be applied to both abominable and military environments. A primary goal in the design of wireless sensor networks is lifetime maximization, constrained by the energy capacity of batteries. One well-known method to reduce energy consumption in such networks is data aggregation. Providing efcient data aggregation while preserving data privacy is a challenging problem in wireless sensor networks research. In this paper, we present privacy-preserving data aggregation scheme for additive aggregation functions. The Cluster-based Private Data Aggregation (CPDA)leverages clustering protocol and algebraic properties of polynomials. It has the advantage of incurring less communication overhead. The goal of our work is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy. We present simulation results of our schemes and compare their performance to a typical data aggregation scheme TAG, where no data privacy protection is provided. Results show the efficacy and efficiency of our schemes.

Keywords: Aggregation, Clustering, Query Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1690
7429 Customer Segmentation in Foreign Trade based on Clustering Algorithms Case Study: Trade Promotion Organization of Iran

Authors: Samira Malekmohammadi Golsefid, Mehdi Ghazanfari, Somayeh Alizadeh

Abstract:

The goal of this paper is to segment the countries based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and use this distance function in K-means algorithm. The DEB function is defined based on the concepts of the association rules and the value of export group-commodities. In this paper, clustering quality function and clusters intraclass inertia are defined to, respectively, calculate the optimum number of clusters and to compare the functionality of DEB versus Euclidean distance. We have also study the effects of importance weight in DEB function to improve clustering quality. Lastly when segmentation is completed, a designated RFM model is used to analyze the relative profitability of each cluster.

Keywords: Customers segmentation, Customer relationship management, Clustering, Data Mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2240
7428 Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

This paper introduces new algorithms (Fuzzy relative of the CLARANS algorithm FCLARANS and Fuzzy c Medoids based on randomized search FCMRANS) for fuzzy clustering of relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd) in which the within cluster dissimilarity of each cluster is minimized in each iteration by recomputing new medoids given current memberships, FCLARANS minimizes the same objective function minimized by FCMdd by changing current medoids in such away that that the sum of the within cluster dissimilarities is minimized. Computing new medoids may be effected by noise because outliers may join the computation of medoids while the choice of medoids in FCLARANS is dictated by the location of a predominant fraction of points inside a cluster and, therefore, it is less sensitive to the presence of outliers. In FCMRANS the step of computing new medoids in FCMdd is modified to be based on randomized search. Furthermore, a new initialization procedure is developed that add randomness to the initialization procedure used with FCMdd. Both FCLARANS and FCMRANS are compared with the robust and linearized version of fuzzy c-medoids (RFCMdd). Experimental results with different samples of the Reuter-21578, Newsgroups (20NG) and generated datasets with noise show that FCLARANS is more robust than both RFCMdd and FCMRANS. Finally, both FCMRANS and FCLARANS are more efficient and their outputs are almost the same as that of RFCMdd in terms of classification rate.

Keywords: Data Mining, Fuzzy Clustering, Relational Clustering, Medoid-Based Clustering, Cluster Analysis, Unsupervised Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2356
7427 Chemical Reaction Algorithm for Expectation Maximization Clustering

Authors: Li Ni, Pen ManMan, Li KenLi

Abstract:

Clustering is an intensive research for some years because of its multifaceted applications, such as biology, information retrieval, medicine, business and so on. The expectation maximization (EM) is a kind of algorithm framework in clustering methods, one of the ten algorithms of machine learning. Traditionally, optimization of objective function has been the standard approach in EM. Hence, research has investigated the utility of evolutionary computing and related techniques in the regard. Chemical Reaction Optimization (CRO) is a recently established method. So the property embedded in CRO is used to solve optimization problems. This paper presents an algorithm framework (EM-CRO) with modified CRO operators based on EM cluster problems. The hybrid algorithm is mainly to solve the problem of initial value sensitivity of the objective function optimization clustering algorithm. Our experiments mainly take the EM classic algorithm:k-means and fuzzy k-means as an example, through the CRO algorithm to optimize its initial value, get K-means-CRO and FKM-CRO algorithm. The experimental results of them show that there is improved efficiency for solving objective function optimization clustering problems.

Keywords: Chemical reaction optimization, expectation maximization, initial, objective function clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1235
7426 Spatial-Temporal Clustering Characteristics of Dengue in the Northern Region of Sri Lanka, 2010-2013

Authors: Sumiko Anno, Keiji Imaoka, Takeo Tadono, Tamotsu Igarashi, Subramaniam Sivaganesh, Selvam Kannathasan, Vaithehi Kumaran, Sinnathamby Noble Surendran

Abstract:

Dengue outbreaks are affected by biological, ecological, socio-economic and demographic factors that vary over time and space. These factors have been examined separately and still require systematic clarification. The present study aimed to investigate the spatial-temporal clustering relationships between these factors and dengue outbreaks in the northern region of Sri Lanka. Remote sensing (RS) data gathered from a plurality of satellites were used to develop an index comprising rainfall, humidity and temperature data. RS data gathered by ALOS/AVNIR-2 were used to detect urbanization, and a digital land cover map was used to extract land cover information. Other data on relevant factors and dengue outbreaks were collected through institutions and extant databases. The analyzed RS data and databases were integrated into geographic information systems, enabling temporal analysis, spatial statistical analysis and space-time clustering analysis. Our present results showed that increases in the number of the combination of ecological factor and socio-economic and demographic factors with above the average or the presence contribute to significantly high rates of space-time dengue clusters.

Keywords: ALOS/AVNIR-2, Dengue, Space-time clustering analysis, Sri Lanka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2234
7425 Granulation using Clustering and Rough Set Theory and its Tree Representation

Authors: Girish Kumar Singh, Sonajharia Minz

Abstract:

Granular computing deals with representation of information in the form of some aggregates and related methods for transformation and analysis for problem solving. A granulation scheme based on clustering and Rough Set Theory is presented with focus on structured conceptualization of information has been presented in this paper. Experiments for the proposed method on four labeled data exhibit good result with reference to classification problem. The proposed granulation technique is semi-supervised imbibing global as well as local information granulation. To represent the results of the attribute oriented granulation a tree structure is proposed in this paper.

Keywords: Granular computing, clustering, Rough sets, datamining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
7424 Multidimensional Data Mining by Means of Randomly Travelling Hyper-Ellipsoids

Authors: Pavel Y. Tabakov, Kevin Duffy

Abstract:

The present study presents a new approach to automatic data clustering and classification problems in large and complex databases and, at the same time, derives specific types of explicit rules describing each cluster. The method works well in both sparse and dense multidimensional data spaces. The members of the data space can be of the same nature or represent different classes. A number of N-dimensional ellipsoids are used for enclosing the data clouds. Due to the geometry of an ellipsoid and its free rotation in space the detection of clusters becomes very efficient. The method is based on genetic algorithms that are used for the optimization of location, orientation and geometric characteristics of the hyper-ellipsoids. The proposed approach can serve as a basis for the development of general knowledge systems for discovering hidden knowledge and unexpected patterns and rules in various large databases.

Keywords: Classification, clustering, data minig, genetic algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1731
7423 A New Approach for Image Segmentation using Pillar-Kmeans Algorithm

Authors: Ali Ridho Barakbah, Yasushi Kiyoki

Abstract:

This paper presents a new approach for image segmentation by applying Pillar-Kmeans algorithm. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after optimized by Pillar Algorithm. The Pillar algorithm considers the pillars- placement which should be located as far as possible from each other to withstand against the pressure distribution of a roof, as identical to the number of centroids amongst the data distribution. This algorithm is able to optimize the K-means clustering for image segmentation in aspects of precision and computation time. It designates the initial centroids- positions by calculating the accumulated distance metric between each data point and all previous centroids, and then selects data points which have the maximum distance as new initial centroids. This algorithm distributes all initial centroids according to the maximum accumulated distance metric. This paper evaluates the proposed approach for image segmentation by comparing with K-means and Gaussian Mixture Model algorithm and involving RGB, HSV, HSL and CIELAB color spaces. The experimental results clarify the effectiveness of our approach to improve the segmentation quality in aspects of precision and computational time.

Keywords: Image segmentation, K-means clustering, Pillaralgorithm, color spaces.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3326
7422 A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data

Authors: Sauravjyoti Sarmah, Rosy Das, Dhruba Kr. Bhattacharyya

Abstract:

Clustering algorithms help to understand the hidden information present in datasets. A dataset may contain intrinsic and nested clusters, the detection of which is of utmost importance. This paper presents a Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets. For handling massive datasets, we implemented our method using a 'sharednothing' architecture where multiple computers are interconnected over a network. Experimental results are reported to establish the superiority of the technique in terms of scale-up, speedup as well as cluster quality.

Keywords: Clustering, Density-based, Grid-based, Adaptive Grid.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559
7421 Categorical Clustering By Converting Associated Information

Authors: Dongmin Cai, Stephen S-T Yau

Abstract:

Lacking an inherent “natural" dissimilarity measure between objects in categorical dataset presents special difficulties in clustering analysis. However, each categorical attributes from a given dataset provides natural probability and information in the sense of Shannon. In this paper, we proposed a novel method which heuristically converts categorical attributes to numerical values by exploiting such associated information. We conduct an experimental study with real-life categorical dataset. The experiment demonstrates the effectiveness of our approach.

Keywords: Categorical, Clustering, Converting, Information

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1317
7420 A Symbol by Symbol Clustering Based Blind Equalizer

Authors: Kristina Georgoulakis

Abstract:

A new blind symbol by symbol equalizer is proposed. The operation of the proposed equalizer is based on the geometric properties of the two dimensional data constellation. An unsupervised clustering technique is used to locate the clusters formed by the received data. The symmetric properties of the clusters labels are subsequently utilized in order to label the clusters. Following this step, the received data are compared to clusters and decisions are made on a symbol by symbol basis, by assigning to each data the label of the nearest cluster. The operation of the equalizer is investigated both in linear and nonlinear channels. The performance of the proposed equalizer is compared to the performance of a CMAbased blind equalizer.

Keywords: Blind equalization, channel equalization, cluster based equalisers

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1374
7419 TOSOM: A Topic-Oriented Self-Organizing Map for Text Organization

Authors: Hsin-Chang Yang, Chung-Hong Lee, Kuo-Lung Ke

Abstract:

The self-organizing map (SOM) model is a well-known neural network model with wide spread of applications. The main characteristics of SOM are two-fold, namely dimension reduction and topology preservation. Using SOM, a high-dimensional data space will be mapped to some low-dimensional space. Meanwhile, the topological relations among data will be preserved. With such characteristics, the SOM was usually applied on data clustering and visualization tasks. However, the SOM has main disadvantage of the need to know the number and structure of neurons prior to training, which are difficult to be determined. Several schemes have been proposed to tackle such deficiency. Examples are growing/expandable SOM, hierarchical SOM, and growing hierarchical SOM. These schemes could dynamically expand the map, even generate hierarchical maps, during training. Encouraging results were reported. Basically, these schemes adapt the size and structure of the map according to the distribution of training data. That is, they are data-driven or dataoriented SOM schemes. In this work, a topic-oriented SOM scheme which is suitable for document clustering and organization will be developed. The proposed SOM will automatically adapt the number as well as the structure of the map according to identified topics. Unlike other data-oriented SOMs, our approach expands the map and generates the hierarchies both according to the topics and their characteristics of the neurons. The preliminary experiments give promising result and demonstrate the plausibility of the method.

Keywords: Self-organizing map, topic identification, learning algorithm, text clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1975
7418 An Educational Data Mining System for Advising Higher Education Students

Authors: Heba Mohammed Nagy, Walid Mohamed Aly, Osama Fathy Hegazy

Abstract:

Educational  data mining  is  a  specific  data   mining field applied to data originating from educational environments, it relies on different  approaches to discover hidden knowledge  from  the  available   data. Among these approaches are   machine   learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems.

In  our  research, we propose  a “Student  Advisory  Framework” that  utilizes  classification  and  clustering  to  build  an  intelligent system. This system can be used to provide pieces of consultations to a first year  university  student to  pursue a  certain   education   track   where  he/she  will  likely  succeed  in, aiming  to  decrease   the  high  rate   of  academic  failure   among these  students.  A real case study  in Cairo  Higher  Institute  for Engineering, Computer  Science  and  Management  is  presented using  real  dataset   collected  from  2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.

Keywords: Classification, Clustering, Educational Data Mining (EDM), Machine Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5164
7417 Generalization of Clustering Coefficient on Lattice Networks Applied to Criminal Networks

Authors: Christian H. Sanabria-Montaña, Rodrigo Huerta-Quintanilla

Abstract:

A lattice network is a special type of network in which all nodes have the same number of links, and its boundary conditions are periodic. The most basic lattice network is the ring, a one-dimensional network with periodic border conditions. In contrast, the Cartesian product of d rings forms a d-dimensional lattice network. An analytical expression currently exists for the clustering coefficient in this type of network, but the theoretical value is valid only up to certain connectivity value; in other words, the analytical expression is incomplete. Here we obtain analytically the clustering coefficient expression in d-dimensional lattice networks for any link density. Our analytical results show that the clustering coefficient for a lattice network with density of links that tend to 1, leads to the value of the clustering coefficient of a fully connected network. We developed a model on criminology in which the generalized clustering coefficient expression is applied. The model states that delinquents learn the know-how of crime business by sharing knowledge, directly or indirectly, with their friends of the gang. This generalization shed light on the network properties, which is important to develop new models in different fields where network structure plays an important role in the system dynamic, such as criminology, evolutionary game theory, econophysics, among others.

Keywords: Clustering coefficient, criminology, generalized, regular network d-dimensional.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1581