Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 10

K-Means Clustering Related Publications

10 Improving Fake News Detection Using K-means and Support Vector Machine Approaches

Authors: Jingyu Hou, Kasra Majbouri Yazdi, Adel Majbouri Yazdi, Saeid Khodayi, Wanlei Zhou, Saeed Saedy

Abstract:

Fake news and false information are big challenges of all types of media, especially social media. There is a lot of false information, fake likes, views and duplicated accounts as big social networks such as Facebook and Twitter admitted. Most information appearing on social media is doubtful and in some cases misleading. They need to be detected as soon as possible to avoid a negative impact on society. The dimensions of the fake news datasets are growing rapidly, so to obtain a better result of detecting false information with less computation time and complexity, the dimensions need to be reduced. One of the best techniques of reducing data size is using feature selection method. The aim of this technique is to choose a feature subset from the original set to improve the classification performance. In this paper, a feature selection method is proposed with the integration of K-means clustering and Support Vector Machine (SVM) approaches which work in four steps. First, the similarities between all features are calculated. Then, features are divided into several clusters. Next, the final feature set is selected from all clusters, and finally, fake news is classified based on the final feature subset using the SVM method. The proposed method was evaluated by comparing its performance with other state-of-the-art methods on several specific benchmark datasets and the outcome showed a better classification of false information for our work. The detection performance was improved in two aspects. On the one hand, the detection runtime process decreased, and on the other hand, the classification accuracy increased because of the elimination of redundant features and the reduction of datasets dimensions.

Keywords: Social Media, Machine Learning, Feature selection, K-Means Clustering, support vector machine, fake news detection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 634
9 K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Authors: Chen-Chien Hsu, Shao-Tzu Huang, Wei-Yen Wang

Abstract:

Matching high dimensional features between images is computationally expensive for exhaustive search approaches in computer vision. Although the dimension of the feature can be degraded by simplifying the prior knowledge of homography, matching accuracy may degrade as a tradeoff. In this paper, we present a feature matching method based on k-means algorithm that reduces the matching cost and matches the features between images instead of using a simplified geometric assumption. Experimental results show that the proposed method outperforms the previous linear exhaustive search approaches in terms of the inlier ratio of matched pairs.

Keywords: K-Means Clustering, feature matching, scale invariant feature transform, linear exhaustive search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 618
8 A Computational Cost-Effective Clustering Algorithm in Multidimensional Space Using the Manhattan Metric: Application to the Global Terrorism Database

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.

Keywords: Pattern Recognition, K-Means Clustering, Manhattan distance, terrorism data analysis, partitional clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 746
7 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: correlation, K-Means Clustering, Hierarchical Clustering, CO2 emission, share of electricity generation, multivariate methods, targets, discriminant analyzed, EU member countries

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 726
6 Automatic LV Segmentation with K-means Clustering and Graph Searching on Cardiac MRI

Authors: Hae-Yeoun Lee

Abstract:

Quantification of cardiac function is performed by calculating blood volume and ejection fraction in routine clinical practice. However, these works have been performed by manual contouring, which requires computational costs and varies on the observer. In this paper, an automatic left ventricle segmentation algorithm on cardiac magnetic resonance images (MRI) is presented. Using knowledge on cardiac MRI, a K-mean clustering technique is applied to segment blood region on a coil-sensitivity corrected image. Then, a graph searching technique is used to correct segmentation errors from coil distortion and noises. Finally, blood volume and ejection fraction are calculated. Using cardiac MRI from 15 subjects, the presented algorithm is tested and compared with manual contouring by experts to show outstanding performance.

Keywords: K-Means Clustering, cardiac MRI, graph searching, Left ventricle segmentation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743
5 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: K-Means Clustering, information gain (IG), Weka, Intrusion Detection System (IDS)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2350
4 A New Approach for Image Segmentation using Pillar-Kmeans Algorithm

Authors: Ali Ridho Barakbah, Yasushi Kiyoki

Abstract:

This paper presents a new approach for image segmentation by applying Pillar-Kmeans algorithm. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after optimized by Pillar Algorithm. The Pillar algorithm considers the pillars- placement which should be located as far as possible from each other to withstand against the pressure distribution of a roof, as identical to the number of centroids amongst the data distribution. This algorithm is able to optimize the K-means clustering for image segmentation in aspects of precision and computation time. It designates the initial centroids- positions by calculating the accumulated distance metric between each data point and all previous centroids, and then selects data points which have the maximum distance as new initial centroids. This algorithm distributes all initial centroids according to the maximum accumulated distance metric. This paper evaluates the proposed approach for image segmentation by comparing with K-means and Gaussian Mixture Model algorithm and involving RGB, HSV, HSL and CIELAB color spaces. The experimental results clarify the effectiveness of our approach to improve the segmentation quality in aspects of precision and computational time.

Keywords: Image Segmentation, K-Means Clustering, Pillaralgorithm, color spaces

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2966
3 Application of a New Hybrid Optimization Algorithm on Cluster Analysis

Authors: B.Bahmani Firouzi, T. Niknam, M. Nayeripour

Abstract:

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Keywords: Data Clustering, K-Means Clustering, Particle Swarm Optimization (PSO), ant colony optimization (ACO), Hybrid evolutionary optimization algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1839
2 Neural Networks Learning Improvement using the K-Means Clustering Algorithm to Detect Network Intrusions

Authors: A. Boukelif, K. M. Faraoun

Abstract:

In the present work, we propose a new technique to enhance the learning capabilities and reduce the computation intensity of a competitive learning multi-layered neural network using the K-means clustering algorithm. The proposed model use multi-layered network architecture with a back propagation learning mechanism. The K-means algorithm is first applied to the training dataset to reduce the amount of samples to be presented to the neural network, by automatically selecting an optimal set of samples. The obtained results demonstrate that the proposed technique performs exceptionally in terms of both accuracy and computation time when applied to the KDD99 dataset compared to a standard learning schema that use the full dataset.

Keywords: Neural Networks, Intrusion Detection, K-Means Clustering, learningenhancement

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3180
1 A Comparison between Heuristic and Meta-Heuristic Methods for Solving the Multiple Traveling Salesman Problem

Authors: San Nah Sze, Wei King Tiong

Abstract:

The multiple traveling salesman problem (mTSP) can be used to model many practical problems. The mTSP is more complicated than the traveling salesman problem (TSP) because it requires determining which cities to assign to each salesman, as well as the optimal ordering of the cities within each salesman's tour. Previous studies proposed that Genetic Algorithm (GA), Integer Programming (IP) and several neural network (NN) approaches could be used to solve mTSP. This paper compared the results for mTSP, solved with Genetic Algorithm (GA) and Nearest Neighbor Algorithm (NNA). The number of cities is clustered into a few groups using k-means clustering technique. The number of groups depends on the number of salesman. Then, each group is solved with NNA and GA as an independent TSP. It is found that k-means clustering and NNA are superior to GA in terms of performance (evaluated by fitness function) and computing time.

Keywords: K-Means Clustering, GeneticAlgorithm, Multiple Traveling Salesman Problem, Nearest Neighbor Algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2686