Search results for: k-means clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 591

Search results for: k-means clustering

471 Spatial-Temporal Clustering Characteristics of Dengue in the Northern Region of Sri Lanka, 2010-2013

Authors: Sumiko Anno, Keiji Imaoka, Takeo Tadono, Tamotsu Igarashi, Subramaniam Sivaganesh, Selvam Kannathasan, Vaithehi Kumaran, Sinnathamby Noble Surendran

Abstract:

Dengue outbreaks are affected by biological, ecological, socio-economic and demographic factors that vary over time and space. These factors have been examined separately and still require systematic clarification. The present study aimed to investigate the spatial-temporal clustering relationships between these factors and dengue outbreaks in the northern region of Sri Lanka. Remote sensing (RS) data gathered from a plurality of satellites were used to develop an index comprising rainfall, humidity and temperature data. RS data gathered by ALOS/AVNIR-2 were used to detect urbanization, and a digital land cover map was used to extract land cover information. Other data on relevant factors and dengue outbreaks were collected through institutions and extant databases. The analyzed RS data and databases were integrated into geographic information systems, enabling temporal analysis, spatial statistical analysis and space-time clustering analysis. Our present results showed that increases in the number of the combination of ecological factor and socio-economic and demographic factors with above the average or the presence contribute to significantly high rates of space-time dengue clusters.

Keywords: ALOS/AVNIR-2, dengue, space-time clustering analysis, Sri Lanka

Procedia PDF Downloads 439
470 Uncertainty Quantification of Corrosion Anomaly Length of Oil and Gas Steel Pipelines Based on Inline Inspection and Field Data

Authors: Tammeen Siraj, Wenxing Zhou, Terry Huang, Mohammad Al-Amin

Abstract:

The high resolution inline inspection (ILI) tool is used extensively in the pipeline industry to identify, locate, and measure metal-loss corrosion anomalies on buried oil and gas steel pipelines. Corrosion anomalies may occur singly (i.e. individual anomalies) or as clusters (i.e. a colony of corrosion anomalies). Although the ILI technology has advanced immensely, there are measurement errors associated with the sizes of corrosion anomalies reported by ILI tools due limitations of the tools and associated sizing algorithms, and detection threshold of the tools (i.e. the minimum detectable feature dimension). Quantifying the measurement error in the ILI data is crucial for corrosion management and developing maintenance strategies that satisfy the safety and economic constraints. Studies on the measurement error associated with the length of the corrosion anomalies (in the longitudinal direction of the pipeline) has been scarcely reported in the literature and will be investigated in the present study. Limitations in the ILI tool and clustering process can sometimes cause clustering error, which is defined as the error introduced during the clustering process by including or excluding a single or group of anomalies in or from a cluster. Clustering error has been found to be one of the biggest contributory factors for relatively high uncertainties associated with ILI reported anomaly length. As such, this study focuses on developing a consistent and comprehensive framework to quantify the measurement errors in the ILI-reported anomaly length by comparing the ILI data and corresponding field measurements for individual and clustered corrosion anomalies. The analysis carried out in this study is based on the ILI and field measurement data for a set of anomalies collected from two segments of a buried natural gas pipeline currently in service in Alberta, Canada. Data analyses showed that the measurement error associated with the ILI-reported length of the anomalies without clustering error, denoted as Type I anomalies is markedly less than that for anomalies with clustering error, denoted as Type II anomalies. A methodology employing data mining techniques is further proposed to classify the Type I and Type II anomalies based on the ILI-reported corrosion anomaly information.

Keywords: clustered corrosion anomaly, corrosion anomaly assessment, corrosion anomaly length, individual corrosion anomaly, metal-loss corrosion, oil and gas steel pipeline

Procedia PDF Downloads 276
469 Agglomerative Hierarchical Clustering Based on Morphmetric Parameters of the Populations of Labeo rohita

Authors: Fayyaz Rasool, Naureen Aziz Qureshi, Shakeela Parveen

Abstract:

Labeo rohita populations from five geographical locations from the hatchery and riverine system of Punjab-Pakistan were studied for the clustering on the basis of similarities and differences based on morphometric parameters within the species. Agglomerative Hierarchical Clustering (AHC) was done by using Pearson Correlation Coefficient and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) as Agglomeration method by XLSTAT 2012 version 1.02. A dendrogram with the data on the morphometrics of the representative samples of each site divided the populations of Labeo rohita in to five major clusters or classes. The variance decomposition for the optimal classification values remained as 19.24% for within class variation, while 80.76% for the between class differences. The representative central objects of the each class, the distances between the class centroids and also the distance between the central objects of the classes were generated by the analysis. A measurable distinction between the classes of the populations of the Labeo rohita was indicated in this study which determined the impacts of changing environment and other possible factors influencing the variation level among the populations of the same species.

Keywords: AHC, Labeo rohita, hatchery, riverine, morphometric

Procedia PDF Downloads 413
468 Switched System Diagnosis Based on Intelligent State Filtering with Unknown Models

Authors: Nada Slimane, Foued Theljani, Faouzi Bouani

Abstract:

The paper addresses the problem of fault diagnosis for systems operating in several modes (normal or faulty) based on states assessment. We use, for this purpose, a methodology consisting of three main processes: 1) sequential data clustering, 2) linear model regression and 3) state filtering. Typically, Kalman Filter (KF) is an algorithm that provides estimation of unknown states using a sequence of I/O measurements. Inevitably, although it is an efficient technique for state estimation, it presents two main weaknesses. First, it merely predicts states without being able to isolate/classify them according to their different operating modes, whether normal or faulty modes. To deal with this dilemma, the KF is endowed with an extra clustering step based fully on sequential version of the k-means algorithm. Second, to provide state estimation, KF requires state space models, which can be unknown. A linear regularized regression is used to identify the required models. To prove its effectiveness, the proposed approach is assessed on a simulated benchmark.

Keywords: clustering, diagnosis, Kalman Filtering, k-means, regularized regression

Procedia PDF Downloads 145
467 Routing and Energy Efficiency through Data Coupled Clustering in Large Scale Wireless Sensor Networks (WSNs)

Authors: Jainendra Singh, Zaheeruddin

Abstract:

A typical wireless sensor networks (WSNs) consists of several tiny and low-power sensors which use radio frequency to perform distributed sensing tasks. The longevity of wireless sensor networks (WSNs) is a major issue that impacts the application of such networks. While routing protocols are striving to save energy by acting on sensor nodes, recent studies show that network lifetime can be enhanced by further involving sink mobility. A common approach for energy efficiency is partitioning the network into clusters with correlated data, where the representative nodes simply transmit or average measurements inside the cluster. In this paper, we propose an energy- efficient homogenous clustering (EHC) technique. In this technique, the decision of each sensor is based on their residual energy and an estimate of how many of its neighboring cluster heads (CHs) will benefit from it being a CH. We, also explore the routing algorithm in clustered WSNs. We show that the proposed schemes significantly outperform current approaches in terms of packet delay, hop count and energy consumption of WSNs.

Keywords: wireless sensor network, energy efficiency, clustering, routing

Procedia PDF Downloads 232
466 Enhanced Cluster Based Connectivity Maintenance in Vehicular Ad Hoc Network

Authors: Manverpreet Kaur, Amarpreet Singh

Abstract:

The demand of Vehicular ad hoc networks is increasing day by day, due to offering the various applications and marvelous benefits to VANET users. Clustering in VANETs is most important to overcome the connectivity problems of VANETs. In this paper, we proposed a new clustering technique Enhanced cluster based connectivity maintenance in vehicular ad hoc network. Our objective is to form long living clusters. The proposed approach is grouping the vehicles, on the basis of the longest list of neighbors to form clusters. The cluster formation and cluster head selection process done by the RSU that may results it reduces the chances of overhead on to the network. The cluster head selection procedure is the vehicle which has closest speed to average speed will elect as a cluster Head by the RSU and if two vehicles have same speed which is closest to average speed then they will be calculate by one of the new parameter i.e. distance to their respective destination. The vehicle which has largest distance to their destination will be choosing as a cluster Head by the RSU. Our simulation outcomes show that our technique performs better than the existing technique.

Keywords: VANETs, clustering, connectivity, cluster head, intelligent transportation system (ITS)

Procedia PDF Downloads 204
465 Radar on Bike: Coarse Classification based on Multi-Level Clustering for Cyclist Safety Enhancement

Authors: Asma Omri, Noureddine Benothman, Sofiane Sayahi, Fethi Tlili, Hichem Besbes

Abstract:

Cycling, a popular mode of transportation, can also be perilous due to cyclists' vulnerability to collisions with vehicles and obstacles. This paper presents an innovative cyclist safety system based on radar technology designed to offer real-time collision risk warnings to cyclists. The system incorporates a low-power radar sensor affixed to the bicycle and connected to a microcontroller. It leverages radar point cloud detections, a clustering algorithm, and a supervised classifier. These algorithms are optimized for efficiency to run on the TI’s AWR 1843 BOOST radar, utilizing a coarse classification approach distinguishing between cars, trucks, two-wheeled vehicles, and other objects. To enhance the performance of clustering techniques, we propose a 2-Level clustering approach. This approach builds on the state-of-the-art Density-based spatial clustering of applications with noise (DBSCAN). The objective is to first cluster objects based on their velocity, then refine the analysis by clustering based on position. The initial level identifies groups of objects with similar velocities and movement patterns. The subsequent level refines the analysis by considering the spatial distribution of these objects. The clusters obtained from the first level serve as input for the second level of clustering. Our proposed technique surpasses the classical DBSCAN algorithm in terms of geometrical metrics, including homogeneity, completeness, and V-score. Relevant cluster features are extracted and utilized to classify objects using an SVM classifier. Potential obstacles are identified based on their velocity and proximity to the cyclist. To optimize the system, we used the View of Delft dataset for hyperparameter selection and SVM classifier training. The system's performance was assessed using our collected dataset of radar point clouds synchronized with a camera on an Nvidia Jetson Nano board. The radar-based cyclist safety system is a practical solution that can be easily installed on any bicycle and connected to smartphones or other devices, offering real-time feedback and navigation assistance to cyclists. We conducted experiments to validate the system's feasibility, achieving an impressive 85% accuracy in the classification task. This system has the potential to significantly reduce the number of accidents involving cyclists and enhance their safety on the road.

Keywords: 2-level clustering, coarse classification, cyclist safety, warning system based on radar technology

Procedia PDF Downloads 38
464 Performance Analysis of Deterministic Stable Election Protocol Using Fuzzy Logic in Wireless Sensor Network

Authors: Sumanpreet Kaur, Harjit Pal Singh, Vikas Khullar

Abstract:

In Wireless Sensor Network (WSN), the sensor containing motes (nodes) incorporate batteries that can lament at some extent. To upgrade the energy utilization, clustering is one of the prototypical approaches for split sensor motes into a number of clusters where one mote (also called as node) proceeds as a Cluster Head (CH). CH selection is one of the optimization techniques for enlarging stability and network lifespan. Deterministic Stable Election Protocol (DSEP) is an effectual clustering protocol that makes use of three kinds of nodes with dissimilar residual energy for CH election. Fuzzy Logic technology is used to expand energy level of DSEP protocol by using fuzzy inference system. This paper presents protocol DSEP using Fuzzy Logic (DSEP-FL) CH by taking into account four linguistic variables such as energy, concentration, centrality and distance to base station. Simulation results show that our proposed method gives more effective results in term of a lifespan of network and stability as compared to the performance of other clustering protocols.

Keywords: DSEP, fuzzy logic, energy model, WSN

Procedia PDF Downloads 167
463 A Concept of Data Mining with XML Document

Authors: Akshay Agrawal, Anand K. Srivastava

Abstract:

The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semi-structured datasets. The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways.

Keywords: XML, similarity measure, clustering, cluster quality, semantic clustering

Procedia PDF Downloads 340
462 Progressive Multimedia Collection Structuring via Scene Linking

Authors: Aman Berhe, Camille Guinaudeau, Claude Barras

Abstract:

In order to facilitate information seeking in large collections of multimedia documents with long and progressive content (such as broadcast news or TV series), one can extract the semantic links that exist between semantically coherent parts of documents, i.e., scenes. The links can then create a coherent collection of scenes from which it is easier to perform content analysis, topic extraction, or information retrieval. In this paper, we focus on TV series structuring and propose two approaches for scene linking at different levels of granularity (episode and season): a fuzzy online clustering technique and a graph-based community detection algorithm. When evaluated on the two first seasons of the TV series Game of Thrones, we found that the fuzzy online clustering approach performed better compared to graph-based community detection at the episode level, while graph-based approaches show better performance at the season level.

Keywords: multimedia collection structuring, progressive content, scene linking, fuzzy clustering, community detection

Procedia PDF Downloads 66
461 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach

Authors: Dongkwon Han, Sangho Kim, Sunil Kwon

Abstract:

Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.

Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance

Procedia PDF Downloads 167
460 Implementation of Algorithm K-Means for Grouping District/City in Central Java Based on Macro Economic Indicators

Authors: Nur Aziza Luxfiati

Abstract:

Clustering is partitioning data sets into sub-sets or groups in such a way that elements certain properties have shared property settings with a high level of similarity within one group and a low level of similarity between groups. . The K-Means algorithm is one of thealgorithmsclustering as a grouping tool that is most widely used in scientific and industrial applications because the basic idea of the kalgorithm is-means very simple. In this research, applying the technique of clustering using the k-means algorithm as a method of solving the problem of national development imbalances between regions in Central Java Province based on macroeconomic indicators. The data sample used is secondary data obtained from the Central Java Provincial Statistics Agency regarding macroeconomic indicator data which is part of the publication of the 2019 National Socio-Economic Survey (Susenas) data. score and determine the number of clusters (k) using the elbow method. After the clustering process is carried out, the validation is tested using themethodsBetween-Class Variation (BCV) and Within-Class Variation (WCV). The results showed that detection outlier using z-score normalization showed no outliers. In addition, the results of the clustering test obtained a ratio value that was not high, namely 0.011%. There are two district/city clusters in Central Java Province which have economic similarities based on the variables used, namely the first cluster with a high economic level consisting of 13 districts/cities and theclustersecondwith a low economic level consisting of 22 districts/cities. And in the cluster second, namely, between low economies, the authors grouped districts/cities based on similarities to macroeconomic indicators such as 20 districts of Gross Regional Domestic Product, with a Poverty Depth Index of 19 districts, with 5 districts in Human Development, and as many as Open Unemployment Rate. 10 districts.

Keywords: clustering, K-Means algorithm, macroeconomic indicators, inequality, national development

Procedia PDF Downloads 124
459 An Empirical Study to Predict Myocardial Infarction Using K-Means and Hierarchical Clustering

Authors: Md. Minhazul Islam, Shah Ashisul Abed Nipun, Majharul Islam, Md. Abdur Rakib Rahat, Jonayet Miah, Salsavil Kayyum, Anwar Shadaab, Faiz Al Faisal

Abstract:

The target of this research is to predict Myocardial Infarction using unsupervised Machine Learning algorithms. Myocardial Infarction Prediction related to heart disease is a challenging factor faced by doctors & hospitals. In this prediction, accuracy of the heart disease plays a vital role. From this concern, the authors have analyzed on a myocardial dataset to predict myocardial infarction using some popular Machine Learning algorithms K-Means and Hierarchical Clustering. This research includes a collection of data and the classification of data using Machine Learning Algorithms. The authors collected 345 instances along with 26 attributes from different hospitals in Bangladesh. This data have been collected from patients suffering from myocardial infarction along with other symptoms. This model would be able to find and mine hidden facts from historical Myocardial Infarction cases. The aim of this study is to analyze the accuracy level to predict Myocardial Infarction by using Machine Learning techniques.

Keywords: Machine Learning, K-means, Hierarchical Clustering, Myocardial Infarction, Heart Disease

Procedia PDF Downloads 168
458 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering

Authors: Emiel Caron

Abstract:

Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.

Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics

Procedia PDF Downloads 166
457 Automatic Detection of Traffic Stop Locations Using GPS Data

Authors: Areej Salaymeh, Loren Schwiebert, Stephen Remias, Jonathan Waddell

Abstract:

Extracting information from new data sources has emerged as a crucial task in many traffic planning processes, such as identifying traffic patterns, route planning, traffic forecasting, and locating infrastructure improvements. Given the advanced technologies used to collect Global Positioning System (GPS) data from dedicated GPS devices, GPS equipped phones, and navigation tools, intelligent data analysis methodologies are necessary to mine this raw data. In this research, an automatic detection framework is proposed to help identify and classify the locations of stopped GPS waypoints into two main categories: signalized intersections or highway congestion. The Delaunay triangulation is used to perform this assessment in the clustering phase. While most of the existing clustering algorithms need assumptions about the data distribution, the effectiveness of the Delaunay triangulation relies on triangulating geographical data points without such assumptions. Our proposed method starts by cleaning noise from the data and normalizing it. Next, the framework will identify stoppage points by calculating the traveled distance. The last step is to use clustering to form groups of waypoints for signalized traffic and highway congestion. Next, a binary classifier was applied to find distinguish highway congestion from signalized stop points. The binary classifier uses the length of the cluster to find congestion. The proposed framework shows high accuracy for identifying the stop positions and congestion points in around 99.2% of trials. We show that it is possible, using limited GPS data, to distinguish with high accuracy.

Keywords: Delaunay triangulation, clustering, intelligent transportation systems, GPS data

Procedia PDF Downloads 243
456 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score

Authors: Jianfeng Hu

Abstract:

Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.

Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes

Procedia PDF Downloads 248
455 Proposing an Algorithm to Cluster Ad Hoc Networks, Modulating Two Levels of Learning Automaton and Nodes Additive Weighting

Authors: Mohammad Rostami, Mohammad Reza Forghani, Elahe Neshat, Fatemeh Yaghoobi

Abstract:

An Ad Hoc network consists of wireless mobile equipment which connects to each other without any infrastructure, using connection equipment. The best way to form a hierarchical structure is clustering. Various methods of clustering can form more stable clusters according to nodes' mobility. In this research we propose an algorithm, which allocates some weight to nodes based on factors, i.e. link stability and power reduction rate. According to the allocated weight in the previous phase, the cellular learning automaton picks out in the second phase nodes which are candidates for being cluster head. In the third phase, learning automaton selects cluster head nodes, member nodes and forms the cluster. Thus, this automaton does the learning from the setting and can form optimized clusters in terms of power consumption and link stability. To simulate the proposed algorithm we have used omnet++4.2.2. Simulation results indicate that newly formed clusters have a longer lifetime than previous algorithms and decrease strongly network overload by reducing update rate.

Keywords: mobile Ad Hoc networks, clustering, learning automaton, cellular automaton, battery power

Procedia PDF Downloads 368
454 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm

Authors: Ioannis P. Panapakidis, Marios N. Moschakis

Abstract:

With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.

Keywords: clustering, load profiling, load modeling, machine learning, energy efficiency and quality

Procedia PDF Downloads 126
453 Empirical Study of Partitions Similarity Measures

Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd

Abstract:

This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.

Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.

Procedia PDF Downloads 144
452 Data Clustering in Wireless Sensor Network Implemented on Self-Organization Feature Map (SOFM) Neural Network

Authors: Krishan Kumar, Mohit Mittal, Pramod Kumar

Abstract:

Wireless sensor network is one of the most promising communication networks for monitoring remote environmental areas. In this network, all the sensor nodes are communicated with each other via radio signals. The sensor nodes have capability of sensing, data storage and processing. The sensor nodes collect the information through neighboring nodes to particular node. The data collection and processing is done by data aggregation techniques. For the data aggregation in sensor network, clustering technique is implemented in the sensor network by implementing self-organizing feature map (SOFM) neural network. Some of the sensor nodes are selected as cluster head nodes. The information aggregated to cluster head nodes from non-cluster head nodes and then this information is transferred to base station (or sink nodes). The aim of this paper is to manage the huge amount of data with the help of SOM neural network. Clustered data is selected to transfer to base station instead of whole information aggregated at cluster head nodes. This reduces the battery consumption over the huge data management. The network lifetime is enhanced at a greater extent.

Keywords: artificial neural network, data clustering, self organization feature map, wireless sensor network

Procedia PDF Downloads 478
451 Design and Implementation of Machine Learning Model for Short-Term Energy Forecasting in Smart Home Management System

Authors: R. Ramesh, K. K. Shivaraman

Abstract:

The main aim of this paper is to handle the energy requirement in an efficient manner by merging the advanced digital communication and control technologies for smart grid applications. In order to reduce user home load during peak load hours, utility applies several incentives such as real-time pricing, time of use, demand response for residential customer through smart meter. However, this method provides inconvenience in the sense that user needs to respond manually to prices that vary in real time. To overcome these inconvenience, this paper proposes a convolutional neural network (CNN) with k-means clustering machine learning model which have ability to forecast energy requirement in short term, i.e., hour of the day or day of the week. By integrating our proposed technique with home energy management based on Bluetooth low energy provides predicted value to user for scheduling appliance in advanced. This paper describes detail about CNN configuration and k-means clustering algorithm for short-term energy forecasting.

Keywords: convolutional neural network, fuzzy logic, k-means clustering approach, smart home energy management

Procedia PDF Downloads 276
450 Optical Flow Direction Determination for Railway Crossing Occupancy Monitoring

Authors: Zdenek Silar, Martin Dobrovolny

Abstract:

This article deals with the obstacle detection on a railway crossing (clearance detection). Detection is based on the optical flow estimation and classification of the flow vectors by K-means clustering algorithm. For classification of passing vehicles is used optical flow direction determination. The optical flow estimation is based on a modified Lucas-Kanade method.

Keywords: background estimation, direction of optical flow, K-means clustering, objects detection, railway crossing monitoring, velocity vectors

Procedia PDF Downloads 480
449 Wavelet Based Residual Method of Detecting GSM Signal Strength Fading

Authors: Danladi Ali, Onah Festus Iloabuchi

Abstract:

In this paper, GSM signal strength was measured in order to detect the type of the signal fading phenomenon using one-dimensional multilevel wavelet residual method and neural network clustering to determine the average GSM signal strength received in the study area. The wavelet residual method predicted that the GSM signal experienced slow fading and attenuated with MSE of 3.875dB. The neural network clustering revealed that mostly -75dB, -85dB and -95dB were received. This means that the signal strength received in the study is a weak signal.

Keywords: one-dimensional multilevel wavelets, path loss, GSM signal strength, propagation, urban environment

Procedia PDF Downloads 308
448 Support Vector Machine Based Retinal Therapeutic for Glaucoma Using Machine Learning Algorithm

Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Yang Yung, Tracy Lin Huan

Abstract:

Glaucoma is a group of visual maladies represented by the scheduled optic nerve neuropathy; means to the increasing dwindling in vision ground, resulting in loss of sight. In this paper, a novel support vector machine based retinal therapeutic for glaucoma using machine learning algorithm is conservative. The algorithm has fitting pragmatism; subsequently sustained on correlation clustering mode, it visualizes perfect computations in the multi-dimensional space. Support vector clustering turns out to be comparable to the scale-space advance that investigates the cluster organization by means of a kernel density estimation of the likelihood distribution, where cluster midpoints are idiosyncratic by the neighborhood maxima of the concreteness. The predicted planning has 91% attainment rate on data set deterrent on a consolidation of 500 realistic images of resolute and glaucoma retina; therefore, the computational benefit of depending on the cluster overlapping system pedestal on machine learning algorithm has complete performance in glaucoma therapeutic.

Keywords: machine learning algorithm, correlation clustering mode, cluster overlapping system, glaucoma, kernel density estimation, retinal therapeutic

Procedia PDF Downloads 207
447 The Phylogenetic Investigation of Candidate Genes Related to Type II Diabetes in Man and Other Species

Authors: Srijoni Banerjee

Abstract:

Sequences of some of the candidate genes (e.g., CPE, CDKAL1, GCKR, HSD11B1, IGF2BP2, IRS1, LPIN1, PKLR, TNF, PPARG) implicated in some of the complex disease, e.g. Type II diabetes in man has been compared with other species to investigate phylogenetic affinity. Based on mRNA sequence of these genes of 7 to 8 species, using bioinformatics tools Mega 5, Bioedit, Clustal W, distance matrix was obtained. Phylogenetic trees were obtained by NJ and UPGMA clustering methods. The results of the phylogenetic analyses show that of the species compared: Xenopus l., Danio r., Macaca m., Homo sapiens s., Rattus n., Mus m. and Gallus g., Bos taurus, both NJ and UPGMA clustering show close affinity between clustering of Homo sapiens s. (Man) with Rattus n. (Rat), Mus m. species for the candidate genes, except in case of Lipin1 gene. The results support the functional similarity of these genes in physiological and biochemical process involving man and mouse/rat. Therefore, in understanding the complex etiology and treatment of the complex disease mouse/rate model is the best laboratory choice for experimentation.

Keywords: phylogeny, candidate gene of type-2 diabetes, CPE, CDKAL1, GCKR, HSD11B1, IGF2BP2, IRS1, LPIN1, PKLR, TNF, PPARG

Procedia PDF Downloads 279
446 Energy Efficient Clustering with Adaptive Particle Swarm Optimization

Authors: KumarShashvat, ArshpreetKaur, RajeshKumar, Raman Chadha

Abstract:

Wireless sensor networks have principal characteristic of having restricted energy and with limitation that energy of the nodes cannot be replenished. To increase the lifetime in this scenario WSN route for data transmission is opted such that utilization of energy along the selected route is negligible. For this energy efficient network, dandy infrastructure is needed because it impinges the network lifespan. Clustering is a technique in which nodes are grouped into disjoints and non–overlapping sets. In this technique data is collected at the cluster head. In this paper, Adaptive-PSO algorithm is proposed which forms energy aware clusters by minimizing the cost of locating the cluster head. The main concern is of the suitability of the swarms by adjusting the learning parameters of PSO. Particle Swarm Optimization converges quickly at the beginning stage of the search but during the course of time, it becomes stable and may be trapped in local optima. In suggested network model swarms are given the intelligence of the spiders which makes them capable enough to avoid earlier convergence and also help them to escape from the local optima. Comparison analysis with traditional PSO shows that new algorithm considerably enhances the performance where multi-dimensional functions are taken into consideration.

Keywords: Particle Swarm Optimization, adaptive – PSO, comparison between PSO and A-PSO, energy efficient clustering

Procedia PDF Downloads 212
445 An Approach for Pattern Recognition and Prediction of Information Diffusion Model on Twitter

Authors: Amartya Hatua, Trung Nguyen, Andrew Sung

Abstract:

In this paper, we study the information diffusion process on Twitter as a multivariate time series problem. Our model concerns three measures (volume, network influence, and sentiment of tweets) based on 10 features, and we collected 27 million tweets to build our information diffusion time series dataset for analysis. Then, different time series clustering techniques with Dynamic Time Warping (DTW) distance were used to identify different patterns of information diffusion. Finally, we built the information diffusion prediction models for new hashtags which comprise two phrases: The first phrase is recognizing the pattern using k-NN with DTW distance; the second phrase is building the forecasting model using the traditional Autoregressive Integrated Moving Average (ARIMA) model and the non-linear recurrent neural network of Long Short-Term Memory (LSTM). Preliminary results of performance evaluation between different forecasting models show that LSTM with clustering information notably outperforms other models. Therefore, our approach can be applied in real-world applications to analyze and predict the information diffusion characteristics of selected topics or memes (hashtags) in Twitter.

Keywords: ARIMA, DTW, information diffusion, LSTM, RNN, time series clustering, time series forecasting, Twitter

Procedia PDF Downloads 355
444 Hierarchical Checkpoint Protocol in Data Grids

Authors: Rahma Souli-Jbali, Minyar Sassi Hidri, Rahma Ben Ayed

Abstract:

Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.

Keywords: data grids, fault tolerance, clustering, chandy-lamport

Procedia PDF Downloads 295
443 An Observation of the Information Technology Research and Development Based on Article Data Mining: A Survey Study on Science Direct

Authors: Muhammet Dursun Kaya, Hasan Asil

Abstract:

One of the most important factors of research and development is the deep insight into the evolutions of scientific development. The state-of-the-art tools and instruments can considerably assist the researchers, and many of the world organizations have become aware of the advantages of data mining for the acquisition of the knowledge required for the unstructured data. This paper was an attempt to review the articles on the information technology published in the past five years with the aid of data mining. A clustering approach was used to study these articles, and the research results revealed that three topics, namely health, innovation, and information systems, have captured the special attention of the researchers.

Keywords: information technology, data mining, scientific development, clustering

Procedia PDF Downloads 241
442 Performance Evaluation of Clustered Routing Protocols for Heterogeneous Wireless Sensor Networks

Authors: Awatef Chniguir, Tarek Farah, Zouhair Ben Jemaa, Safya Belguith

Abstract:

Optimal routing allows minimizing energy consumption in wireless sensor networks (WSN). Clustering has proven its effectiveness in organizing WSN by reducing channel contention and packet collision and enhancing network throughput under heavy load. Therefore, nowadays, with the emergence of the Internet of Things, heterogeneity is essential. Stable election protocol (SEP) that has increased the network stability period and lifetime is the first clustering protocol for heterogeneous WSN. SEP and its descendants, namely SEP, Threshold Sensitive SEP (TSEP), Enhanced TSEP (ETSSEP) and Current Energy Allotted TSEP (CEATSEP), were studied. These algorithms’ performance was evaluated based on different metrics, especially first node death (FND), to compare their stability. Simulations were conducted on the MATLAB tool considering two scenarios: The first one demonstrates the fraction variation of advanced nodes by setting the number of total nodes. The second considers the interpretation of the number of nodes while keeping the number of advanced nodes permanent. CEATSEP outperforms its antecedents by increasing stability and, at the same time, keeping a low throughput. It also operates very well in a large-scale network. Consequently, CEATSEP has a useful lifespan and energy efficiency compared to the other routing protocol for heterogeneous WSN.

Keywords: clustering, heterogeneous, stability, scalability, IoT, WSN

Procedia PDF Downloads 93