Search results for: structure clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8034

Search results for: structure clustering

8004 Decision Trees Constructing Based on K-Means Clustering Algorithm

Authors: Loai Abdallah, Malik Yousef

Abstract:

A domain space for the data should reflect the actual similarity between objects. Since objects belonging to the same cluster usually share some common traits even though their geometric distance might be relatively large. In general, the Euclidean distance of data points that represented by large number of features is not capturing the actual relation between those points. In this study, we propose a new method to construct a different space that is based on clustering to form a new distance metric. The new distance space is based on ensemble clustering (EC). The EC distance space is defined by tracking the membership of the points over multiple runs of clustering algorithm metric. Over this distance, we train the decision trees classifier (DT-EC). The results obtained by applying DT-EC on 10 datasets confirm our hypotheses that embedding the EC space as a distance metric would improve the performance.

Keywords: ensemble clustering, decision trees, classification, K nearest neighbors

Procedia PDF Downloads 160
8003 Power Iteration Clustering Based on Deflation Technique on Large Scale Graphs

Authors: Taysir Soliman

Abstract:

One of the current popular clustering techniques is Spectral Clustering (SC) because of its advantages over conventional approaches such as hierarchical clustering, k-means, etc. and other techniques as well. However, one of the disadvantages of SC is the time consuming process because it requires computing the eigenvectors. In the past to overcome this disadvantage, a number of attempts have been proposed such as the Power Iteration Clustering (PIC) technique, which is one of versions from SC; some of PIC advantages are: 1) its scalability and efficiency, 2) finding one pseudo-eigenvectors instead of computing eigenvectors, and 3) linear combination of the eigenvectors in linear time. However, its worst disadvantage is an inter-class collision problem because it used only one pseudo-eigenvectors which is not enough. Previous researchers developed Deflation-based Power Iteration Clustering (DPIC) to overcome problems of PIC technique on inter-class collision with the same efficiency of PIC. In this paper, we developed Parallel DPIC (PDPIC) to improve the time and memory complexity which is run on apache spark framework using sparse matrix. To test the performance of PDPIC, we compared it to SC, ESCG, ESCALG algorithms on four small graph benchmark datasets and nine large graph benchmark datasets, where PDPIC proved higher accuracy and better time consuming than other compared algorithms.

Keywords: spectral clustering, power iteration clustering, deflation-based power iteration clustering, Apache spark, large graph

Procedia PDF Downloads 150
8002 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering

Procedia PDF Downloads 444
8001 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 250
8000 K-Means Clustering-Based Infinite Feature Selection Method

Authors: Seyyedeh Faezeh Hassani Ziabari, Sadegh Eskandari, Maziar Salahi

Abstract:

Infinite Feature Selection (IFS) algorithm is an efficient feature selection algorithm that selects a subset of features of all sizes (including infinity). In this paper, we present an improved version of it, called clustering IFS (CIFS), by clustering the dataset in advance. To do so, first, we apply the K-means algorithm to cluster the dataset, then we apply IFS. In the CIFS method, the spatial and temporal complexities are reduced compared to the IFS method. Experimental results on 6 datasets show the superiority of CIFS compared to IFS in terms of accuracy, running time, and memory consumption.

Keywords: feature selection, infinite feature selection, clustering, graph

Procedia PDF Downloads 94
7999 Clustering Color Space, Time Interest Points for Moving Objects

Authors: Insaf Bellamine, Hamid Tairi

Abstract:

Detecting moving objects in sequences is an essential step for video analysis. This paper mainly contributes to the Color Space-Time Interest Points (CSTIP) extraction and detection. We propose a new method for detection of moving objects. Two main steps compose the proposed method. First, we suggest to apply the algorithm of the detection of Color Space-Time Interest Points (CSTIP) on both components of the Color Structure-Texture Image Decomposition which is based on a Partial Differential Equation (PDE): a color geometric structure component and a color texture component. A descriptor is associated to each of these points. In a second stage, we address the problem of grouping the points (CSTIP) into clusters. Experiments and comparison to other motion detection methods on challenging sequences show the performance of the proposed method and its utility for video analysis. Experimental results are obtained from very different types of videos, namely sport videos and animation movies.

Keywords: Color Space-Time Interest Points (CSTIP), Color Structure-Texture Image Decomposition, Motion Detection, clustering

Procedia PDF Downloads 350
7998 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 232
7997 A Relative Entropy Regularization Approach for Fuzzy C-Means Clustering Problem

Authors: Ouafa Amira, Jiangshe Zhang

Abstract:

Clustering is an unsupervised machine learning technique; its aim is to extract the data structures, in which similar data objects are grouped in the same cluster, whereas dissimilar objects are grouped in different clusters. Clustering methods are widely utilized in different fields, such as: image processing, computer vision , and pattern recognition, etc. Fuzzy c-means clustering (fcm) is one of the most well known fuzzy clustering methods. It is based on solving an optimization problem, in which a minimization of a given cost function has been studied. This minimization aims to decrease the dissimilarity inside clusters, where the dissimilarity here is measured by the distances between data objects and cluster centers. The degree of belonging of a data point in a cluster is measured by a membership function which is included in the interval [0, 1]. In fcm clustering, the membership degree is constrained with the condition that the sum of a data object’s memberships in all clusters must be equal to one. This constraint can cause several problems, specially when our data objects are included in a noisy space. Regularization approach took a part in fuzzy c-means clustering technique. This process introduces an additional information in order to solve an ill-posed optimization problem. In this study, we focus on regularization by relative entropy approach, where in our optimization problem we aim to minimize the dissimilarity inside clusters. Finding an appropriate membership degree to each data object is our objective, because an appropriate membership degree leads to an accurate clustering result. Our clustering results in synthetic data sets, gaussian based data sets, and real world data sets show that our proposed model achieves a good accuracy.

Keywords: clustering, fuzzy c-means, regularization, relative entropy

Procedia PDF Downloads 240
7996 Comparing Community Detection Algorithms in Bipartite Networks

Authors: Ehsan Khademi, Mahdi Jalili

Abstract:

Despite the special features of bipartite networks, they are common in many systems. Real-world bipartite networks may show community structure, similar to what one can find in one-mode networks. However, the interpretation of the community structure in bipartite networks is different as compared to one-mode networks. In this manuscript, we compare a number of available methods that are frequently used to discover community structure of bipartite networks. These networks are categorized into two broad classes. One class is the methods that, first, transfer the network into a one-mode network, and then apply community detection algorithms. The other class is the algorithms that have been developed specifically for bipartite networks. These algorithms are applied on a model network with prescribed community structure.

Keywords: community detection, bipartite networks, co-clustering, modularity, network projection, complex networks

Procedia PDF Downloads 585
7995 Max-Entropy Feed-Forward Clustering Neural Network

Authors: Xiaohan Bookman, Xiaoyan Zhu

Abstract:

The outputs of non-linear feed-forward neural network are positive, which could be treated as probability when they are normalized to one. If we take Entropy-Based Principle into consideration, the outputs for each sample could be represented as the distribution of this sample for different clusters. Entropy-Based Principle is the principle with which we could estimate the unknown distribution under some limited conditions. As this paper defines two processes in Feed-Forward Neural Network, our limited condition is the abstracted features of samples which are worked out in the abstraction process. And the final outputs are the probability distribution for different clusters in the clustering process. As Entropy-Based Principle is considered into the feed-forward neural network, a clustering method is born. We have conducted some experiments on six open UCI data sets, comparing with a few baselines and applied purity as the measurement. The results illustrate that our method outperforms all the other baselines that are most popular clustering methods.

Keywords: feed-forward neural network, clustering, max-entropy principle, probabilistic models

Procedia PDF Downloads 407
7994 An Integrated Label Propagation Network for Structural Condition Assessment

Authors: Qingsong Xiong, Cheng Yuan, Qingzhao Kong, Haibei Xiong

Abstract:

Deep-learning-driven approaches based on vibration responses have attracted larger attention in rapid structural condition assessment while obtaining sufficient measured training data with corresponding labels is relevantly costly and even inaccessible in practical engineering. This study proposes an integrated label propagation network for structural condition assessment, which is able to diffuse the labels from continuously-generating measurements by intact structure to those of missing labels of damage scenarios. The integrated network is embedded with damage-sensitive features extraction by deep autoencoder and pseudo-labels propagation by optimized fuzzy clustering, the architecture and mechanism which are elaborated. With a sophisticated network design and specified strategies for improving performance, the present network achieves to extends the superiority of self-supervised representation learning, unsupervised fuzzy clustering and supervised classification algorithms into an integration aiming at assessing damage conditions. Both numerical simulations and full-scale laboratory shaking table tests of a two-story building structure were conducted to validate its capability of detecting post-earthquake damage. The identifying accuracy of a present network was 0.95 in numerical validations and an average 0.86 in laboratory case studies, respectively. It should be noted that the whole training procedure of all involved models in the network stringently doesn’t rely upon any labeled data of damage scenarios but only several samples of intact structure, which indicates a significant superiority in model adaptability and feasible applicability in practice.

Keywords: autoencoder, condition assessment, fuzzy clustering, label propagation

Procedia PDF Downloads 73
7993 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 434
7992 Knowledge Representation Based on Interval Type-2 CFCM Clustering

Authors: Lee Myung-Won, Kwak Keun-Chang

Abstract:

This paper is concerned with knowledge representation and extraction of fuzzy if-then rules using Interval Type-2 Context-based Fuzzy C-Means clustering (IT2-CFCM) with the aid of fuzzy granulation. This proposed clustering algorithm is based on information granulation in the form of IT2 based Fuzzy C-Means (IT2-FCM) clustering and estimates the cluster centers by preserving the homogeneity between the clustered patterns from the IT2 contexts produced in the output space. Furthermore, we can obtain the automatic knowledge representation in the design of Radial Basis Function Networks (RBFN), Linguistic Model (LM), and Adaptive Neuro-Fuzzy Networks (ANFN) from the numerical input-output data pairs. We shall focus on a design of ANFN in this paper. The experimental results on an estimation problem of energy performance reveal that the proposed method showed a good knowledge representation and performance in comparison with the previous works.

Keywords: IT2-FCM, IT2-CFCM, context-based fuzzy clustering, adaptive neuro-fuzzy network, knowledge representation

Procedia PDF Downloads 293
7991 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM

Procedia PDF Downloads 382
7990 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering

Authors: K. Umbleja, M. Ichino

Abstract:

Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.

Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis

Procedia PDF Downloads 131
7989 Method of Cluster Based Cross-Domain Knowledge Acquisition for Biologically Inspired Design

Authors: Shen Jian, Hu Jie, Ma Jin, Peng Ying Hong, Fang Yi, Liu Wen Hai

Abstract:

Biologically inspired design inspires inventions and new technologies in the field of engineering by mimicking functions, principles, and structures in the biological domain. To deal with the obstacles of cross-domain knowledge acquisition in the existing biologically inspired design process, functional semantic clustering based on functional feature semantic correlation and environmental constraint clustering composition based on environmental characteristic constraining adaptability are proposed. A knowledge cell clustering algorithm and the corresponding prototype system is developed. Finally, the effectiveness of the method is verified by the visual prosthetic device design.

Keywords: knowledge clustering, knowledge acquisition, knowledge based engineering, knowledge cell, biologically inspired design

Procedia PDF Downloads 403
7988 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process

Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek

Abstract:

Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.

Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process

Procedia PDF Downloads 374
7987 A Weighted K-Medoids Clustering Algorithm for Effective Stability in Vehicular Ad Hoc Networks

Authors: Rejab Hajlaoui, Tarek Moulahi, Hervé Guyennet

Abstract:

In a highway scenario, the vehicle speed can exceed 120 kmph. Therefore, any vehicle can enter or leave the network within a very short time. This mobility adversely affects the network connectivity and decreases the life time of all established links. To ensure an effective stability in vehicular ad hoc networks with minimum broadcasting storm, we have developed a weighted algorithm based on the k-medoids clustering algorithm (WKCA). Indeed, the number of clusters and the initial cluster heads will not be selected randomly as usual, but considering the available transmission range and the environment size. Then, to ensure optimal assignment of nodes to clusters in both k-medoids phases, the combined weight of any node will be computed according to additional metrics including direction, relative speed and proximity. Empirical results prove that in addition to the convergence speed that characterizes the k-medoids algorithm, our proposed model performs well both AODV-Clustering and OLSR-Clustering protocols under different densities and velocities in term of end-to-end delay, packet delivery ratio, and throughput.

Keywords: communication, clustering algorithm, k-medoids, sensor, vehicular ad hoc network

Procedia PDF Downloads 203
7986 Hierarchical Cluster Analysis of Raw Milk Samples Obtained from Organic and Conventional Dairy Farming in Autonomous Province of Vojvodina, Serbia

Authors: Lidija Jevrić, Denis Kučević, Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Milica Karadžić

Abstract:

In the present study, the Hierarchical Cluster Analysis (HCA) was applied in order to determine the differences between the milk samples originating from a conventional dairy farm (CF) and an organic dairy farm (OF) in AP Vojvodina, Republic of Serbia. The clustering was based on the basis of the average values of saturated fatty acids (SFA) content and unsaturated fatty acids (UFA) content obtained for every season. Therefore, the HCA included the annual SFA and UFA content values. The clustering procedure was carried out on the basis of Euclidean distances and Single linkage algorithm. The obtained dendrograms indicated that the clustering of UFA in OF was much more uniform compared to clustering of UFA in CF. In OF, spring stands out from the other months of the year. The same case can be noticed for CF, where winter is separated from the other months. The results could be expected because the composition of fatty acids content is greatly influenced by the season and nutrition of dairy cows during the year.

Keywords: chemometrics, clustering, food engineering, milk quality

Procedia PDF Downloads 252
7985 Sales Patterns Clustering Analysis on Seasonal Product Sales Data

Authors: Soojin Kim, Jiwon Yang, Sungzoon Cho

Abstract:

As a seasonal product is only in demand for a short time, inventory management is critical to profits. Both markdowns and stockouts decrease the return on perishable products; therefore, researchers have been interested in the distribution of seasonal products with the aim of maximizing profits. In this study, we propose a data-driven seasonal product sales pattern analysis method for individual retail outlets based on observed sales data clustering; the proposed method helps in determining distribution strategies.

Keywords: clustering, distribution, sales pattern, seasonal product

Procedia PDF Downloads 568
7984 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 301
7983 Proposing an Algorithm to Cluster Ad Hoc Networks, Modulating Two Levels of Learning Automaton and Nodes Additive Weighting

Authors: Mohammad Rostami, Mohammad Reza Forghani, Elahe Neshat, Fatemeh Yaghoobi

Abstract:

An Ad Hoc network consists of wireless mobile equipment which connects to each other without any infrastructure, using connection equipment. The best way to form a hierarchical structure is clustering. Various methods of clustering can form more stable clusters according to nodes' mobility. In this research we propose an algorithm, which allocates some weight to nodes based on factors, i.e. link stability and power reduction rate. According to the allocated weight in the previous phase, the cellular learning automaton picks out in the second phase nodes which are candidates for being cluster head. In the third phase, learning automaton selects cluster head nodes, member nodes and forms the cluster. Thus, this automaton does the learning from the setting and can form optimized clusters in terms of power consumption and link stability. To simulate the proposed algorithm we have used omnet++4.2.2. Simulation results indicate that newly formed clusters have a longer lifetime than previous algorithms and decrease strongly network overload by reducing update rate.

Keywords: mobile Ad Hoc networks, clustering, learning automaton, cellular automaton, battery power

Procedia PDF Downloads 376
7982 Clustering the Wheat Seeds Using SOM Artificial Neural Networks

Authors: Salah Ghamari

Abstract:

In this study, the ability of self organizing map artificial (SOM) neural networks in clustering the wheat seeds varieties according to morphological properties of them was considered. The SOM is one type of unsupervised competitive learning. Experimentally, five morphological features of 300 seeds (including three varieties: gaskozhen, Md and sardari) were obtained using image processing technique. The results show that the artificial neural network has a good performance (90.33% accuracy) in classification of the wheat varieties despite of high similarity in them. The highest classification accuracy (100%) was achieved for sardari.

Keywords: artificial neural networks, clustering, self organizing map, wheat variety

Procedia PDF Downloads 610
7981 GCM Based Fuzzy Clustering to Identify Homogeneous Climatic Regions of North-East India

Authors: Arup K. Sarma, Jayshree Hazarika

Abstract:

The North-eastern part of India, which receives heavier rainfall than other parts of the subcontinent, is of great concern now-a-days with regard to climate change. High intensity rainfall for short duration and longer dry spell, occurring due to impact of climate change, affects river morphology too. In the present study, an attempt is made to delineate the North-Eastern region of India into some homogeneous clusters based on the Fuzzy Clustering concept and to compare the resulting clusters obtained by using conventional methods and non conventional methods of clustering. The concept of clustering is adapted in view of the fact that, impact of climate change can be studied in a homogeneous region without much variation, which can be helpful in studies related to water resources planning and management. 10 IMD (Indian Meteorological Department) stations, situated in various regions of the North-east, have been selected for making the clusters. The results of the Fuzzy C-Means (FCM) analysis show different clustering patterns for different conditions. From the analysis and comparison it can be concluded that non conventional method of using GCM data is somehow giving better results than the others. However, further analysis can be done by taking daily data instead of monthly means to reduce the effect of standardization.

Keywords: climate change, conventional and nonconventional methods of clustering, FCM analysis, homogeneous regions

Procedia PDF Downloads 354
7980 The Clustering of Multiple Sclerosis Subgroups through L2 Norm Multifractal Denoising Technique

Authors: Yeliz Karaca, Rana Karabudak

Abstract:

Multifractal Denoising techniques are used in the identification of significant attributes by removing the noise of the dataset. Magnetic resonance (MR) image technique is the most sensitive method so as to identify chronic disorders of the nervous system such as Multiple Sclerosis. MRI and Expanded Disability Status Scale (EDSS) data belonging to 120 individuals who have one of the subgroups of MS (Relapsing Remitting MS (RRMS), Secondary Progressive MS (SPMS), Primary Progressive MS (PPMS)) as well as 19 healthy individuals in the control group have been used in this study. The study is comprised of the following stages: (i) L2 Norm Multifractal Denoising technique, one of the multifractal technique, has been used with the application on the MS data (MRI and EDSS). In this way, the new dataset has been obtained. (ii) The new MS dataset obtained from the MS dataset and L2 Multifractal Denoising technique has been applied to the K-Means and Fuzzy C Means clustering algorithms which are among the unsupervised methods. Thus, the clustering performances have been compared. (iii) In the identification of significant attributes in the MS dataset through the Multifractal denoising (L2 Norm) technique using K-Means and FCM algorithms on the MS subgroups and control group of healthy individuals, excellent performance outcome has been yielded. According to the clustering results based on the MS subgroups obtained in the study, successful clustering results have been obtained in the K-Means and FCM algorithms by applying the L2 norm of multifractal denoising technique for the MS dataset. Clustering performance has been more successful with the MS Dataset (L2_Norm MS Data Set) K-Means and FCM in which significant attributes are obtained by applying L2 Norm Denoising technique.

Keywords: clinical decision support, clustering algorithms, multiple sclerosis, multifractal techniques

Procedia PDF Downloads 133
7979 A Polynomial Time Clustering Algorithm for Solving the Assignment Problem in the Vehicle Routing Problem

Authors: Lydia Wahid, Mona F. Ahmed, Nevin Darwish

Abstract:

The vehicle routing problem (VRP) consists of a group of customers that needs to be served. Each customer has a certain demand of goods. A central depot having a fleet of vehicles is responsible for supplying the customers with their demands. The problem is composed of two subproblems: The first subproblem is an assignment problem where the number of vehicles that will be used as well as the customers assigned to each vehicle are determined. The second subproblem is the routing problem in which for each vehicle having a number of customers assigned to it, the order of visits of the customers is determined. Optimal number of vehicles, as well as optimal total distance, should be achieved. In this paper, an approach for solving the first subproblem (the assignment problem) is presented. In the approach, a clustering algorithm is proposed for finding the optimal number of vehicles by grouping the customers into clusters where each cluster is visited by one vehicle. Finding the optimal number of clusters is NP-hard. This work presents a polynomial time clustering algorithm for finding the optimal number of clusters and solving the assignment problem.

Keywords: vehicle routing problems, clustering algorithms, Clarke and Wright Saving Method, agglomerative hierarchical clustering

Procedia PDF Downloads 357
7978 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 480
7977 Application of Fuzzy Clustering on Classification Agile Supply Chain

Authors: Hamidreza Fallah Lajimi , Elham Karami, Fatemeh Ali nasab, Mostafa Mahdavikia

Abstract:

Being responsive is an increasingly important skill for firms in today’s global economy; thus firms must be agile. Naturally, it follows that an organization’s agility depends on its supply chain being agile. However, achieving supply chain agility is a function of other abilities within the organization. This paper analyses results from a survey of 71 Iran manufacturing companies in order to identify some of the factors for agile organizations in managing their supply chains. Then we classification this company in four cluster with fuzzy c-mean technique and with four validations functional determine automatically the optimal number of clusters.

Keywords: agile supply chain, clustering, fuzzy clustering

Procedia PDF Downloads 416
7976 Sustainability and Clustering: A Bibliometric Assessment

Authors: Fernanda M. Assef, Maria Teresinha A. Steiner, David Gabriel F. Barros

Abstract:

Review researches are useful in terms of analysis of research problems. Between the types of review documents, we commonly find bibliometric studies. This type of application often helps the global visualization of a research problem and helps academics worldwide to understand the context of a research area better. In this document, a bibliometric view surrounding clustering techniques and sustainability problems is presented. The authors aimed at which issues mostly use clustering techniques, and, even which sustainability issue would be more impactful on today’s moment of research. During the bibliometric analysis, we found ten different groups of research in clustering applications for sustainability issues: Energy; Environmental; Non-urban planning; Sustainable Development; Sustainable Supply Chain; Transport; Urban Planning; Water; Waste Disposal; and, Others. And, by analyzing the citations of each group, we discovered that the Environmental group could be classified as the most impactful research cluster in the area mentioned. Now, after the content analysis of each paper classified in the environmental group, we found that the k-means technique is preferred for solving sustainability problems with clustering methods since it appeared the most amongst the documents. The authors finally conclude that a bibliometric assessment could help indicate a gap of researches on waste disposal – which was the group with the least amount of publications – and the most impactful research on environmental problems.

Keywords: bibliometric assessment, clustering, sustainability, territorial partitioning

Procedia PDF Downloads 82
7975 Multi-Cluster Overlapping K-Means Extension Algorithm (MCOKE)

Authors: Said Baadel, Fadi Thabtah, Joan Lu

Abstract:

Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper, we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold to be defined as a priority which can be difficult to determine by novice users.

Keywords: data mining, k-means, MCOKE, overlapping

Procedia PDF Downloads 536