Search results for: weighted based clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 28755

Search results for: weighted based clustering

28755 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM

Procedia PDF Downloads 415
28754 A Weighted K-Medoids Clustering Algorithm for Effective Stability in Vehicular Ad Hoc Networks

Authors: Rejab Hajlaoui, Tarek Moulahi, Hervé Guyennet

Abstract:

In a highway scenario, the vehicle speed can exceed 120 kmph. Therefore, any vehicle can enter or leave the network within a very short time. This mobility adversely affects the network connectivity and decreases the life time of all established links. To ensure an effective stability in vehicular ad hoc networks with minimum broadcasting storm, we have developed a weighted algorithm based on the k-medoids clustering algorithm (WKCA). Indeed, the number of clusters and the initial cluster heads will not be selected randomly as usual, but considering the available transmission range and the environment size. Then, to ensure optimal assignment of nodes to clusters in both k-medoids phases, the combined weight of any node will be computed according to additional metrics including direction, relative speed and proximity. Empirical results prove that in addition to the convergence speed that characterizes the k-medoids algorithm, our proposed model performs well both AODV-Clustering and OLSR-Clustering protocols under different densities and velocities in term of end-to-end delay, packet delivery ratio, and throughput.

Keywords: communication, clustering algorithm, k-medoids, sensor, vehicular ad hoc network

Procedia PDF Downloads 239
28753 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.

Keywords: clustering, unsupervised learning, algorithms, hierarchical

Procedia PDF Downloads 885
28752 Clustering for Detection of the Population at Risk of Anticholinergic Medication

Authors: A. Shirazibeheshti, T. Radwan, A. Ettefaghian, G. Wilson, C. Luca, Farbod Khanizadeh

Abstract:

Anticholinergic medication has been associated with events such as falls, delirium, and cognitive impairment in older patients. To further assess this, anticholinergic burden scores have been developed to quantify risk. A risk model based on clustering was deployed in a healthcare management system to cluster patients into multiple risk groups according to anticholinergic burden scores of multiple medicines prescribed to patients to facilitate clinical decision-making. To do so, anticholinergic burden scores of drugs were extracted from the literature, which categorizes the risk on a scale of 1 to 3. Given the patients’ prescription data on the healthcare database, a weighted anticholinergic risk score was derived per patient based on the prescription of multiple anticholinergic drugs. This study was conducted on over 300,000 records of patients currently registered with a major regional UK-based healthcare provider. The weighted risk scores were used as inputs to an unsupervised learning algorithm (mean-shift clustering) that groups patients into clusters that represent different levels of anticholinergic risk. To further evaluate the performance of the model, any association between the average risk score within each group and other factors such as socioeconomic status (i.e., Index of Multiple Deprivation) and an index of health and disability were investigated. The clustering identifies a group of 15 patients at the highest risk from multiple anticholinergic medication. Our findings also show that this group of patients is located within more deprived areas of London compared to the population of other risk groups. Furthermore, the prescription of anticholinergic medicines is more skewed to female than male patients, indicating that females are more at risk from this kind of multiple medications. The risk may be monitored and controlled in well artificial intelligence-equipped healthcare management systems.

Keywords: anticholinergic medicines, clustering, deprivation, socioeconomic status

Procedia PDF Downloads 212
28751 Hybrid Hierarchical Clustering Approach for Community Detection in Social Network

Authors: Radhia Toujani, Jalel Akaichi

Abstract:

Social Networks generally present a hierarchy of communities. To determine these communities and the relationship between them, detection algorithms should be applied. Most of the existing algorithms, proposed for hierarchical communities identification, are based on either agglomerative clustering or divisive clustering. In this paper, we present a hybrid hierarchical clustering approach for community detection based on both bottom-up and bottom-down clustering. Obviously, our approach provides more relevant community structure than hierarchical method which considers only divisive or agglomerative clustering to identify communities. Moreover, we performed some comparative experiments to enhance the quality of the clustering results and to show the effectiveness of our algorithm.

Keywords: agglomerative hierarchical clustering, community structure, divisive hierarchical clustering, hybrid hierarchical clustering, opinion mining, social network, social network analysis

Procedia PDF Downloads 365
28750 Energy-Efficient Clustering Protocol in Wireless Sensor Networks for Healthcare Monitoring

Authors: Ebrahim Farahmand, Ali Mahani

Abstract:

Wireless sensor networks (WSNs) can facilitate continuous monitoring of patients and increase early detection of emergency conditions and diseases. High density WSNs helps us to accurately monitor a remote environment by intelligently combining the data from the individual nodes. Due to energy capacity limitation of sensors, enhancing the lifetime and the reliability of WSNs are important factors in designing of these networks. The clustering strategies are verified as effective and practical algorithms for reducing energy consumption in WSNs and can tackle WSNs limitations. In this paper, an Energy-efficient weight-based Clustering Protocol (EWCP) is presented. Artificial retina is selected as a case study of WSNs applied in body sensors. Cluster heads’ (CHs) selection is equipped with energy efficient parameters. Moreover, cluster members are selected based on their distance to the selected CHs. Comparing with the other benchmark protocols, the lifetime of EWCP is improved significantly.

Keywords: WSN, healthcare monitoring, weighted based clustering, lifetime

Procedia PDF Downloads 309
28749 An Enhanced Distributed Weighted Clustering Algorithm for Intra and Inter Cluster Routing in MANET

Authors: K. Gomathi

Abstract:

Mobile Ad hoc Networks (MANET) is defined as collection of routable wireless mobile nodes with no centralized administration and communicate each other using radio signals. Especially MANETs deployed in hostile environments where hackers will try to disturb the secure data transfer and drain the valuable network resources. Since MANET is battery operated network, preserving the network resource is essential one. For resource constrained computation, efficient routing and to increase the network stability, the network is divided into smaller groups called clusters. The clustering architecture consists of Cluster Head(CH), ordinary node and gateway. The CH is responsible for inter and intra cluster routing. CH election is a prominent research area and many more algorithms are developed using many different metrics. The CH with longer life sustains network lifetime, for this purpose Secondary Cluster Head(SCH) also elected and it is more economical. To nominate efficient CH, a Enhanced Distributed Weighted Clustering Algorithm (EDWCA) has been proposed. This approach considers metrics like battery power, degree difference and speed of the node for CH election. The proficiency of proposed one is evaluated and compared with existing algorithm using Network Simulator(NS-2).

Keywords: MANET, EDWCA, clustering, cluster head

Procedia PDF Downloads 398
28748 Some Results for F-Minimal Hypersurfaces in Manifolds with Density

Authors: M. Abdelmalek

Abstract:

In this work, we study the hypersurfaces of constant weighted mean curvature embedded in weighted manifolds. We give a condition about these hypersurfaces to be minimal. This condition is given by the ellipticity of the weighted Newton transformations. We especially prove that two compact hypersurfaces of constant weighted mean curvature embedded in space forms and with the intersection in at least a point of the boundary must be transverse. The method is based on the calculus of the matrix of the second fundamental form in a boundary point and then the matrix associated with the Newton transformations. By equality, we find the weighted elementary symmetric function on the boundary of the hypersurface. We give in the end some examples and applications. Especially in Euclidean space, we use the above result to prove the Alexandrov spherical caps conjecture for the weighted case.

Keywords: weighted mean curvature, weighted manifolds, ellipticity, Newton transformations

Procedia PDF Downloads 93
28747 Mitigating the Negative Effect of Intrabrand Clustering: The Role of Interbrand Clustering and Firm Size

Authors: Moeen Naseer Butt

Abstract:

Clustering –geographic concentrations of entities– has recently received more attention in marketing research and has been shown to affect multiple outcomes. This study investigates the impact of intrabrand clustering (clustering of same-brand outlets) on an outlet’s quality performance. Further, it assesses the moderating effects of interbrand clustering (clustering of other-brand outlets) and firm size. An examination of approximately 21,000 food service establishments in New York State in 2019 finds that the impact of intrabrand clustering on an outlet’s quality performance is context-dependent. Specifically, intrabrand clustering decreases, whereas interbrand clustering and firm size help increase the outlet’s performance. Additionally, this study finds that the role of firm size is more substantial than interbrand clustering in mitigating the adverse effects of intrabrand clustering on outlet quality performance.

Keywords: intraband clustering, interbrand clustering, firm size, brand competition, outlet performance, quality violations

Procedia PDF Downloads 188
28746 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents

Authors: Ying Zhao, Xingyan Bin

Abstract:

Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.

Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints

Procedia PDF Downloads 547
28745 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm

Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan

Abstract:

This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.

Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data

Procedia PDF Downloads 222
28744 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 189
28743 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures

Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat

Abstract:

In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.

Keywords: association rules, clustering, similarity measure, statistical approaches

Procedia PDF Downloads 320
28742 Notes on Frames in Weighted Hardy Spaces and Generalized Weighted Composition Operators

Authors: Shams Alyusof

Abstract:

This work is to enrich the studies of the frames due to their prominent role in pure mathematics as well as in applied mathematics and many applications in computer science and engineering. Recently, there are remarkable studies of operators that preserve frames on some spaces, and this research could be considered as an extension of such studies. Indeed, this paper is to we characterize weighted composition operators that preserve frames in weighted Hardy spaces on the open unit disk. Moreover, it shows that this characterization does not apply to generalized weighted composition operators on such spaces. Nevertheless, this study could be extended to provide more specific characterizations.

Keywords: frames, generalized weighted composition operators, weighted Hardy spaces, analytic functions

Procedia PDF Downloads 122
28741 Multimodal Optimization of Density-Based Clustering Using Collective Animal Behavior Algorithm

Authors: Kristian Bautista, Ruben A. Idoy

Abstract:

A bio-inspired metaheuristic algorithm inspired by the theory of collective animal behavior (CAB) was integrated to density-based clustering modeled as multimodal optimization problem. The algorithm was tested on synthetic, Iris, Glass, Pima and Thyroid data sets in order to measure its effectiveness relative to CDE-based Clustering algorithm. Upon preliminary testing, it was found out that one of the parameter settings used was ineffective in performing clustering when applied to the algorithm prompting the researcher to do an investigation. It was revealed that fine tuning distance δ3 that determines the extent to which a given data point will be clustered helped improve the quality of cluster output. Even though the modification of distance δ3 significantly improved the solution quality and cluster output of the algorithm, results suggest that there is no difference between the population mean of the solutions obtained using the original and modified parameter setting for all data sets. This implies that using either the original or modified parameter setting will not have any effect towards obtaining the best global and local animal positions. Results also suggest that CDE-based clustering algorithm is better than CAB-density clustering algorithm for all data sets. Nevertheless, CAB-density clustering algorithm is still a good clustering algorithm because it has correctly identified the number of classes of some data sets more frequently in a thirty trial run with a much smaller standard deviation, a potential in clustering high dimensional data sets. Thus, the researcher recommends further investigation in the post-processing stage of the algorithm.

Keywords: clustering, metaheuristics, collective animal behavior algorithm, density-based clustering, multimodal optimization

Procedia PDF Downloads 231
28740 Data Clustering Algorithm Based on Multi-Objective Periodic Bacterial Foraging Optimization with Two Learning Archives

Authors: Chen Guo, Heng Tang, Ben Niu

Abstract:

Clustering splits objects into different groups based on similarity, making the objects have higher similarity in the same group and lower similarity in different groups. Thus, clustering can be treated as an optimization problem to maximize the intra-cluster similarity or inter-cluster dissimilarity. In real-world applications, the datasets often have some complex characteristics: sparse, overlap, high dimensionality, etc. When facing these datasets, simultaneously optimizing two or more objectives can obtain better clustering results than optimizing one objective. However, except for the objectives weighting methods, traditional clustering approaches have difficulty in solving multi-objective data clustering problems. Due to this, evolutionary multi-objective optimization algorithms are investigated by researchers to optimize multiple clustering objectives. In this paper, the Data Clustering algorithm based on Multi-objective Periodic Bacterial Foraging Optimization with two Learning Archives (DC-MPBFOLA) is proposed. Specifically, first, to reduce the high computing complexity of the original BFO, periodic BFO is employed as the basic algorithmic framework. Then transfer the periodic BFO into a multi-objective type. Second, two learning strategies are proposed based on the two learning archives to guide the bacterial swarm to move in a better direction. On the one hand, the global best is selected from the global learning archive according to the convergence index and diversity index. On the other hand, the personal best is selected from the personal learning archive according to the sum of weighted objectives. According to the aforementioned learning strategies, a chemotaxis operation is designed. Third, an elite learning strategy is designed to provide fresh power to the objects in two learning archives. When the objects in these two archives do not change for two consecutive times, randomly initializing one dimension of objects can prevent the proposed algorithm from falling into local optima. Fourth, to validate the performance of the proposed algorithm, DC-MPBFOLA is compared with four state-of-art evolutionary multi-objective optimization algorithms and one classical clustering algorithm on evaluation indexes of datasets. To further verify the effectiveness and feasibility of designed strategies in DC-MPBFOLA, variants of DC-MPBFOLA are also proposed. Experimental results demonstrate that DC-MPBFOLA outperforms its competitors regarding all evaluation indexes and clustering partitions. These results also indicate that the designed strategies positively influence the performance improvement of the original BFO.

Keywords: data clustering, multi-objective optimization, bacterial foraging optimization, learning archives

Procedia PDF Downloads 140
28739 E-Hailing Taxi Industry Management Mode Innovation Based on the Credit Evaluation

Authors: Yuan-lin Liu, Ye Li, Tian Xia

Abstract:

There are some shortcomings in Chinese existing taxi management modes. This paper suggests to establish the third-party comprehensive information management platform and put forward an evaluation model based on credit. Four indicators are used to evaluate the drivers’ credit, they are passengers’ evaluation score, driving behavior evaluation, drivers’ average bad record number, and personal credit score. A weighted clustering method is used to achieve credit level evaluation for taxi drivers. The management of taxi industry is based on the credit level, while the grade of the drivers is accorded to their credit rating. Credit rating determines the cost, income levels, the market access, useful period of license and the level of wage and bonus, as well as violation fine. These methods can make the credit evaluation effective. In conclusion, more credit data will help to set up a more accurate and detailed classification standard library.

Keywords: credit, mobile internet, e-hailing taxi, management mode, weighted cluster

Procedia PDF Downloads 325
28738 Decision Trees Constructing Based on K-Means Clustering Algorithm

Authors: Loai Abdallah, Malik Yousef

Abstract:

A domain space for the data should reflect the actual similarity between objects. Since objects belonging to the same cluster usually share some common traits even though their geometric distance might be relatively large. In general, the Euclidean distance of data points that represented by large number of features is not capturing the actual relation between those points. In this study, we propose a new method to construct a different space that is based on clustering to form a new distance metric. The new distance space is based on ensemble clustering (EC). The EC distance space is defined by tracking the membership of the points over multiple runs of clustering algorithm metric. Over this distance, we train the decision trees classifier (DT-EC). The results obtained by applying DT-EC on 10 datasets confirm our hypotheses that embedding the EC space as a distance metric would improve the performance.

Keywords: ensemble clustering, decision trees, classification, K nearest neighbors

Procedia PDF Downloads 191
28737 A Decision Support System to Detect the Lumbar Disc Disease on the Basis of Clinical MRI

Authors: Yavuz Unal, Kemal Polat, H. Erdinc Kocer

Abstract:

In this study, a decision support system comprising three stages has been proposed to detect the disc abnormalities of the lumbar region. In the first stage named the feature extraction, T2-weighted sagittal and axial Magnetic Resonance Images (MRI) were taken from 55 people and then 27 appearance and shape features were acquired from both sagittal and transverse images. In the second stage named the feature weighting process, k-means clustering based feature weighting (KMCBFW) proposed by Gunes et al. Finally, in the third stage named the classification process, the classifier algorithms including multi-layer perceptron (MLP- neural network), support vector machine (SVM), Naïve Bayes, and decision tree have been used to classify whether the subject has lumbar disc or not. In order to test the performance of the proposed method, the classification accuracy (%), sensitivity, specificity, precision, recall, f-measure, kappa value, and computation times have been used. The best hybrid model is the combination of k-means clustering based feature weighting and decision tree in the detecting of lumbar disc disease based on both sagittal and axial MR images.

Keywords: lumbar disc abnormality, lumbar MRI, lumbar spine, hybrid models, hybrid features, k-means clustering based feature weighting

Procedia PDF Downloads 520
28736 Performance Analysis of Hierarchical Agglomerative Clustering in a Wireless Sensor Network Using Quantitative Data

Authors: Tapan Jain, Davender Singh Saini

Abstract:

Clustering is a useful mechanism in wireless sensor networks which helps to cope with scalability and data transmission problems. The basic aim of our research work is to provide efficient clustering using Hierarchical agglomerative clustering (HAC). If the distance between the sensing nodes is calculated using their location then it’s quantitative HAC. This paper compares the various agglomerative clustering techniques applied in a wireless sensor network using the quantitative data. The simulations are done in MATLAB and the comparisons are made between the different protocols using dendrograms.

Keywords: routing, hierarchical clustering, agglomerative, quantitative, wireless sensor network

Procedia PDF Downloads 617
28735 Investigation of Clustering Algorithms Used in Wireless Sensor Networks

Authors: Naim Karasekreter, Ugur Fidan, Fatih Basciftci

Abstract:

Wireless sensor networks are networks in which more than one sensor node is organized among themselves. The working principle is based on the transfer of the sensed data over the other nodes in the network to the central station. Wireless sensor networks concentrate on routing algorithms, energy efficiency and clustering algorithms. In the clustering method, the nodes in the network are divided into clusters using different parameters and the most suitable cluster head is selected from among them. The data to be sent to the center is sent per cluster, and the cluster head is transmitted to the center. With this method, the network traffic is reduced and the energy efficiency of the nodes is increased. In this study, clustering algorithms were examined in terms of clustering performances and cluster head selection characteristics to try to identify weak and strong sides. This work is supported by the Project 17.Kariyer.123 of Afyon Kocatepe University BAP Commission.

Keywords: wireless sensor networks (WSN), clustering algorithm, cluster head, clustering

Procedia PDF Downloads 513
28734 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 279
28733 An Experimental Study on Some Conventional and Hybrid Models of Fuzzy Clustering

Authors: Jeugert Kujtila, Kristi Hoxhalli, Ramazan Dalipi, Erjon Cota, Ardit Murati, Erind Bedalli

Abstract:

Clustering is a versatile instrument in the analysis of collections of data providing insights of the underlying structures of the dataset and enhancing the modeling capabilities. The fuzzy approach to the clustering problem increases the flexibility involving the concept of partial memberships (some value in the continuous interval [0, 1]) of the instances in the clusters. Several fuzzy clustering algorithms have been devised like FCM, Gustafson-Kessel, Gath-Geva, kernel-based FCM, PCM etc. Each of these algorithms has its own advantages and drawbacks, so none of these algorithms would be able to perform superiorly in all datasets. In this paper we will experimentally compare FCM, GK, GG algorithm and a hybrid two-stage fuzzy clustering model combining the FCM and Gath-Geva algorithms. Firstly we will theoretically dis-cuss the advantages and drawbacks for each of these algorithms and we will describe the hybrid clustering model exploiting the advantages and diminishing the drawbacks of each algorithm. Secondly we will experimentally compare the accuracy of the hybrid model by applying it on several benchmark and synthetic datasets.

Keywords: fuzzy clustering, fuzzy c-means algorithm (FCM), Gustafson-Kessel algorithm, hybrid clustering model

Procedia PDF Downloads 514
28732 Knowledge Representation Based on Interval Type-2 CFCM Clustering

Authors: Lee Myung-Won, Kwak Keun-Chang

Abstract:

This paper is concerned with knowledge representation and extraction of fuzzy if-then rules using Interval Type-2 Context-based Fuzzy C-Means clustering (IT2-CFCM) with the aid of fuzzy granulation. This proposed clustering algorithm is based on information granulation in the form of IT2 based Fuzzy C-Means (IT2-FCM) clustering and estimates the cluster centers by preserving the homogeneity between the clustered patterns from the IT2 contexts produced in the output space. Furthermore, we can obtain the automatic knowledge representation in the design of Radial Basis Function Networks (RBFN), Linguistic Model (LM), and Adaptive Neuro-Fuzzy Networks (ANFN) from the numerical input-output data pairs. We shall focus on a design of ANFN in this paper. The experimental results on an estimation problem of energy performance reveal that the proposed method showed a good knowledge representation and performance in comparison with the previous works.

Keywords: IT2-FCM, IT2-CFCM, context-based fuzzy clustering, adaptive neuro-fuzzy network, knowledge representation

Procedia PDF Downloads 322
28731 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Ji Eun Shin, Dong Hoon Lim

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: big data, combiner, K-means clustering, RHadoop

Procedia PDF Downloads 439
28730 Power Iteration Clustering Based on Deflation Technique on Large Scale Graphs

Authors: Taysir Soliman

Abstract:

One of the current popular clustering techniques is Spectral Clustering (SC) because of its advantages over conventional approaches such as hierarchical clustering, k-means, etc. and other techniques as well. However, one of the disadvantages of SC is the time consuming process because it requires computing the eigenvectors. In the past to overcome this disadvantage, a number of attempts have been proposed such as the Power Iteration Clustering (PIC) technique, which is one of versions from SC; some of PIC advantages are: 1) its scalability and efficiency, 2) finding one pseudo-eigenvectors instead of computing eigenvectors, and 3) linear combination of the eigenvectors in linear time. However, its worst disadvantage is an inter-class collision problem because it used only one pseudo-eigenvectors which is not enough. Previous researchers developed Deflation-based Power Iteration Clustering (DPIC) to overcome problems of PIC technique on inter-class collision with the same efficiency of PIC. In this paper, we developed Parallel DPIC (PDPIC) to improve the time and memory complexity which is run on apache spark framework using sparse matrix. To test the performance of PDPIC, we compared it to SC, ESCG, ESCALG algorithms on four small graph benchmark datasets and nine large graph benchmark datasets, where PDPIC proved higher accuracy and better time consuming than other compared algorithms.

Keywords: spectral clustering, power iteration clustering, deflation-based power iteration clustering, Apache spark, large graph

Procedia PDF Downloads 190
28729 Method of Cluster Based Cross-Domain Knowledge Acquisition for Biologically Inspired Design

Authors: Shen Jian, Hu Jie, Ma Jin, Peng Ying Hong, Fang Yi, Liu Wen Hai

Abstract:

Biologically inspired design inspires inventions and new technologies in the field of engineering by mimicking functions, principles, and structures in the biological domain. To deal with the obstacles of cross-domain knowledge acquisition in the existing biologically inspired design process, functional semantic clustering based on functional feature semantic correlation and environmental constraint clustering composition based on environmental characteristic constraining adaptability are proposed. A knowledge cell clustering algorithm and the corresponding prototype system is developed. Finally, the effectiveness of the method is verified by the visual prosthetic device design.

Keywords: knowledge clustering, knowledge acquisition, knowledge based engineering, knowledge cell, biologically inspired design

Procedia PDF Downloads 427
28728 Machine Learning Approach for Lateralization of Temporal Lobe Epilepsy

Authors: Samira-Sadat JamaliDinan, Haidar Almohri, Mohammad-Reza Nazem-Zadeh

Abstract:

Lateralization of temporal lobe epilepsy (TLE) is very important for positive surgical outcomes. We propose a machine learning framework to ultimately identify the epileptogenic hemisphere for temporal lobe epilepsy (TLE) cases using magnetoencephalography (MEG) coherence source imaging (CSI) and diffusion tensor imaging (DTI). Unlike most studies that use classification algorithms, we propose an effective clustering approach to distinguish between normal and TLE cases. We apply the famous Minkowski weighted K-Means (MWK-Means) technique as the clustering framework. To overcome the problem of poor initialization of K-Means, we use particle swarm optimization (PSO) to effectively select the initial centroids of clusters prior to applying MWK-Means. We demonstrate that compared to K-means and MWK-means independently, this approach is able to improve the result of a benchmark data set.

Keywords: temporal lobe epilepsy, machine learning, clustering, magnetoencephalography

Procedia PDF Downloads 156
28727 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 399
28726 Detection Method of Federated Learning Backdoor Based on Weighted K-Medoids

Authors: Xun Li, Haojie Wang

Abstract:

Federated learning is a kind of distributed training and centralized training mode, which is of great value in the protection of user privacy. In order to solve the problem that the model is vulnerable to backdoor attacks in federated learning, a backdoor attack detection method based on a weighted k-medoids algorithm is proposed. First of all, this paper collates the update parameters of the client to construct a vector group, then uses the principal components analysis (PCA) algorithm to extract the corresponding feature information from the vector group, and finally uses the improved k-medoids clustering algorithm to identify the normal and backdoor update parameters. In this paper, the backdoor is implanted in the federation learning model through the model replacement attack method in the simulation experiment, and the update parameters from the attacker are effectively detected and removed by the defense method proposed in this paper.

Keywords: federated learning, backdoor attack, PCA, k-medoids, backdoor defense

Procedia PDF Downloads 114