Search results for: K-means clustering algorithm
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3911

Search results for: K-means clustering algorithm

3821 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 344
3820 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures

Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat

Abstract:

In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.

Keywords: association rules, clustering, similarity measure, statistical approaches

Procedia PDF Downloads 291
3819 A Technique for Image Segmentation Using K-Means Clustering Classification

Authors: Sadia Basar, Naila Habib, Awais Adnan

Abstract:

The paper presents the Technique for Image Segmentation Using K-Means Clustering Classification. The presented algorithms were specific, however, missed the neighboring information and required high-speed computerized machines to run the segmentation algorithms. Clustering is the process of partitioning a group of data points into a small number of clusters. The proposed method is content-aware and feature extraction method which is able to run on low-end computerized machines, simple algorithm, required low-quality streaming, efficient and used for security purpose. It has the capability to highlight the boundary and the object. At first, the user enters the data in the representation of the input. Then in the next step, the digital image is converted into groups clusters. Clusters are divided into many regions. The same categories with same features of clusters are assembled within a group and different clusters are placed in other groups. Finally, the clusters are combined with respect to similar features and then represented in the form of segments. The clustered image depicts the clear representation of the digital image in order to highlight the regions and boundaries of the image. At last, the final image is presented in the form of segments. All colors of the image are separated in clusters.

Keywords: clustering, image segmentation, K-means function, local and global minimum, region

Procedia PDF Downloads 350
3818 Self-Supervised Attributed Graph Clustering with Dual Contrastive Loss Constraints

Authors: Lijuan Zhou, Mengqi Wu, Changyong Niu

Abstract:

Attributed graph clustering can utilize the graph topology and node attributes to uncover hidden community structures and patterns in complex networks, aiding in the understanding and analysis of complex systems. Utilizing contrastive learning for attributed graph clustering can effectively exploit meaningful implicit relationships between data. However, existing attributed graph clustering methods based on contrastive learning suffer from the following drawbacks: 1) Complex data augmentation increases computational cost, and inappropriate data augmentation may lead to semantic drift. 2) The selection of positive and negative samples neglects the intrinsic cluster structure learned from graph topology and node attributes. Therefore, this paper proposes a method called self-supervised Attributed Graph Clustering with Dual Contrastive Loss constraints (AGC-DCL). Firstly, Siamese Multilayer Perceptron (MLP) encoders are employed to generate two views separately to avoid complex data augmentation. Secondly, the neighborhood contrastive loss is introduced to constrain node representation using local topological structure while effectively embedding attribute information through attribute reconstruction. Additionally, clustering-oriented contrastive loss is applied to fully utilize clustering information in global semantics for discriminative node representations, regarding the cluster centers from two views as negative samples to fully leverage effective clustering information from different views. Comparative clustering results with existing attributed graph clustering algorithms on six datasets demonstrate the superiority of the proposed method.

Keywords: attributed graph clustering, contrastive learning, clustering-oriented, self-supervised learning

Procedia PDF Downloads 13
3817 Health Trajectory Clustering Using Deep Belief Networks

Authors: Farshid Hajati, Federico Girosi, Shima Ghassempour

Abstract:

We present a Deep Belief Network (DBN) method for clustering health trajectories. Deep Belief Network (DBN) is a deep architecture that consists of a stack of Restricted Boltzmann Machines (RBM). In a deep architecture, each layer learns more complex features than the past layers. The proposed method depends on DBN in clustering without using back propagation learning algorithm. The proposed DBN has a better a performance compared to the deep neural network due the initialization of the connecting weights. We use Contrastive Divergence (CD) method for training the RBMs which increases the performance of the network. The performance of the proposed method is evaluated extensively on the Health and Retirement Study (HRS) database. The University of Michigan Health and Retirement Study (HRS) is a nationally representative longitudinal study that has surveyed more than 27,000 elderly and near-elderly Americans since its inception in 1992. Participants are interviewed every two years and they collect data on physical and mental health, insurance coverage, financial status, family support systems, labor market status, and retirement planning. The dataset is publicly available and we use the RAND HRS version L, which is easy to use and cleaned up version of the data. The size of sample data set is 268 and the length of the trajectories is equal to 10. The trajectories do not stop when the patient dies and represent 10 different interviews of live patients. Compared to the state-of-the-art benchmarks, the experimental results show the effectiveness and superiority of the proposed method in clustering health trajectories.

Keywords: health trajectory, clustering, deep learning, DBN

Procedia PDF Downloads 342
3816 Time-Series Load Data Analysis for User Power Profiling

Authors: Mahdi Daghmhehci Firoozjaei, Minchang Kim, Dima Alhadidi

Abstract:

In this paper, we present a power profiling model for smart grid consumers based on real time load data acquired smart meters. It profiles consumers’ power consumption behaviour using the dynamic time warping (DTW) clustering algorithm. Due to the invariability of signal warping of this algorithm, time-disordered load data can be profiled and consumption features be extracted. Two load types are defined and the related load patterns are extracted for classifying consumption behaviour by DTW. The classification methodology is discussed in detail. To evaluate the performance of the method, we analyze the time-series load data measured by a smart meter in a real case. The results verify the effectiveness of the proposed profiling method with 90.91% true positive rate for load type clustering in the best case.

Keywords: power profiling, user privacy, dynamic time warping, smart grid

Procedia PDF Downloads 104
3815 A Local Tensor Clustering Algorithm to Annotate Uncharacterized Genes with Many Biological Networks

Authors: Paul Shize Li, Frank Alber

Abstract:

A fundamental task of clinical genomics is to unravel the functions of genes and their associations with disorders. Although experimental biology has made efforts to discover and elucidate the molecular mechanisms of individual genes in the past decades, still about 40% of human genes have unknown functions, not to mention the diseases they may be related to. For those biologists who are interested in a particular gene with unknown functions, a powerful computational method tailored for inferring the functions and disease relevance of uncharacterized genes is strongly needed. Studies have shown that genes strongly linked to each other in multiple biological networks are more likely to have similar functions. This indicates that the densely connected subgraphs in multiple biological networks are useful in the functional and phenotypic annotation of uncharacterized genes. Therefore, in this work, we have developed an integrative network approach to identify the frequent local clusters, which are defined as those densely connected subgraphs that frequently occur in multiple biological networks and consist of the query gene that has few or no disease or function annotations. This is a local clustering algorithm that models multiple biological networks sharing the same gene set as a three-dimensional matrix, the so-called tensor, and employs the tensor-based optimization method to efficiently find the frequent local clusters. Specifically, massive public gene expression data sets that comprehensively cover dynamic, physiological, and environmental conditions are used to generate hundreds of gene co-expression networks. By integrating these gene co-expression networks, for a given uncharacterized gene that is of biologist’s interest, the proposed method can be applied to identify the frequent local clusters that consist of this uncharacterized gene. Finally, those frequent local clusters are used for function and disease annotation of this uncharacterized gene. This local tensor clustering algorithm outperformed the competing tensor-based algorithm in both module discovery and running time. We also demonstrated the use of the proposed method on real data of hundreds of gene co-expression data and showed that it can comprehensively characterize the query gene. Therefore, this study provides a new tool for annotating the uncharacterized genes and has great potential to assist clinical genomic diagnostics.

Keywords: local tensor clustering, query gene, gene co-expression network, gene annotation

Procedia PDF Downloads 104
3814 Automatic Detection of Proliferative Cells in Immunohistochemically Images of Meningioma Using Fuzzy C-Means Clustering and HSV Color Space

Authors: Vahid Anari, Mina Bakhshi

Abstract:

Visual search and identification of immunohistochemically stained tissue of meningioma was performed manually in pathologic laboratories to detect and diagnose the cancers type of meningioma. This task is very tedious and time-consuming. Moreover, because of cell's complex nature, it still remains a challenging task to segment cells from its background and analyze them automatically. In this paper, we develop and test a computerized scheme that can automatically identify cells in microscopic images of meningioma and classify them into positive (proliferative) and negative (normal) cells. Dataset including 150 images are used to test the scheme. The scheme uses Fuzzy C-means algorithm as a color clustering method based on perceptually uniform hue, saturation, value (HSV) color space. Since the cells are distinguishable by the human eye, the accuracy and stability of the algorithm are quantitatively compared through application to a wide variety of real images.

Keywords: positive cell, color segmentation, HSV color space, immunohistochemistry, meningioma, thresholding, fuzzy c-means

Procedia PDF Downloads 178
3813 A Non-parametric Clustering Approach for Multivariate Geostatistical Data

Authors: Francky Fouedjio

Abstract:

Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.

Keywords: clustering, geostatistics, multivariate data, non-parametric

Procedia PDF Downloads 454
3812 Power Iteration Clustering Based on Deflation Technique on Large Scale Graphs

Authors: Taysir Soliman

Abstract:

One of the current popular clustering techniques is Spectral Clustering (SC) because of its advantages over conventional approaches such as hierarchical clustering, k-means, etc. and other techniques as well. However, one of the disadvantages of SC is the time consuming process because it requires computing the eigenvectors. In the past to overcome this disadvantage, a number of attempts have been proposed such as the Power Iteration Clustering (PIC) technique, which is one of versions from SC; some of PIC advantages are: 1) its scalability and efficiency, 2) finding one pseudo-eigenvectors instead of computing eigenvectors, and 3) linear combination of the eigenvectors in linear time. However, its worst disadvantage is an inter-class collision problem because it used only one pseudo-eigenvectors which is not enough. Previous researchers developed Deflation-based Power Iteration Clustering (DPIC) to overcome problems of PIC technique on inter-class collision with the same efficiency of PIC. In this paper, we developed Parallel DPIC (PDPIC) to improve the time and memory complexity which is run on apache spark framework using sparse matrix. To test the performance of PDPIC, we compared it to SC, ESCG, ESCALG algorithms on four small graph benchmark datasets and nine large graph benchmark datasets, where PDPIC proved higher accuracy and better time consuming than other compared algorithms.

Keywords: spectral clustering, power iteration clustering, deflation-based power iteration clustering, Apache spark, large graph

Procedia PDF Downloads 149
3811 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering

Procedia PDF Downloads 444
3810 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 372
3809 Radar on Bike: Coarse Classification based on Multi-Level Clustering for Cyclist Safety Enhancement

Authors: Asma Omri, Noureddine Benothman, Sofiane Sayahi, Fethi Tlili, Hichem Besbes

Abstract:

Cycling, a popular mode of transportation, can also be perilous due to cyclists' vulnerability to collisions with vehicles and obstacles. This paper presents an innovative cyclist safety system based on radar technology designed to offer real-time collision risk warnings to cyclists. The system incorporates a low-power radar sensor affixed to the bicycle and connected to a microcontroller. It leverages radar point cloud detections, a clustering algorithm, and a supervised classifier. These algorithms are optimized for efficiency to run on the TI’s AWR 1843 BOOST radar, utilizing a coarse classification approach distinguishing between cars, trucks, two-wheeled vehicles, and other objects. To enhance the performance of clustering techniques, we propose a 2-Level clustering approach. This approach builds on the state-of-the-art Density-based spatial clustering of applications with noise (DBSCAN). The objective is to first cluster objects based on their velocity, then refine the analysis by clustering based on position. The initial level identifies groups of objects with similar velocities and movement patterns. The subsequent level refines the analysis by considering the spatial distribution of these objects. The clusters obtained from the first level serve as input for the second level of clustering. Our proposed technique surpasses the classical DBSCAN algorithm in terms of geometrical metrics, including homogeneity, completeness, and V-score. Relevant cluster features are extracted and utilized to classify objects using an SVM classifier. Potential obstacles are identified based on their velocity and proximity to the cyclist. To optimize the system, we used the View of Delft dataset for hyperparameter selection and SVM classifier training. The system's performance was assessed using our collected dataset of radar point clouds synchronized with a camera on an Nvidia Jetson Nano board. The radar-based cyclist safety system is a practical solution that can be easily installed on any bicycle and connected to smartphones or other devices, offering real-time feedback and navigation assistance to cyclists. We conducted experiments to validate the system's feasibility, achieving an impressive 85% accuracy in the classification task. This system has the potential to significantly reduce the number of accidents involving cyclists and enhance their safety on the road.

Keywords: 2-level clustering, coarse classification, cyclist safety, warning system based on radar technology

Procedia PDF Downloads 45
3808 Generalization of Clustering Coefficient on Lattice Networks Applied to Criminal Networks

Authors: Christian H. Sanabria-Montaña, Rodrigo Huerta-Quintanilla

Abstract:

A lattice network is a special type of network in which all nodes have the same number of links, and its boundary conditions are periodic. The most basic lattice network is the ring, a one-dimensional network with periodic border conditions. In contrast, the Cartesian product of d rings forms a d-dimensional lattice network. An analytical expression currently exists for the clustering coefficient in this type of network, but the theoretical value is valid only up to certain connectivity value; in other words, the analytical expression is incomplete. Here we obtain analytically the clustering coefficient expression in d-dimensional lattice networks for any link density. Our analytical results show that the clustering coefficient for a lattice network with density of links that tend to 1, leads to the value of the clustering coefficient of a fully connected network. We developed a model on criminology in which the generalized clustering coefficient expression is applied. The model states that delinquents learn the know-how of crime business by sharing knowledge, directly or indirectly, with their friends of the gang. This generalization shed light on the network properties, which is important to develop new models in different fields where network structure plays an important role in the system dynamic, such as criminology, evolutionary game theory, econophysics, among others.

Keywords: clustering coefficient, criminology, generalized, regular network d-dimensional

Procedia PDF Downloads 377
3807 Structure Clustering for Milestoning Applications of Complex Conformational Transitions

Authors: Amani Tahat, Serdal Kirmizialtin

Abstract:

Trajectory fragment methods such as Markov State Models (MSM), Milestoning (MS) and Transition Path sampling are the prime choice of extending the timescale of all atom Molecular Dynamics simulations. In these approaches, a set of structures that covers the accessible phase space has to be chosen a priori using cluster analysis. Structural clustering serves to partition the conformational state into natural subgroups based on their similarity, an essential statistical methodology that is used for analyzing numerous sets of empirical data produced by Molecular Dynamics (MD) simulations. Local transition kernel among these clusters later used to connect the metastable states using a Markovian kinetic model in MSM and a non-Markovian model in MS. The choice of clustering approach in constructing such kernel is crucial since the high dimensionality of the biomolecular structures might easily confuse the identification of clusters when using the traditional hierarchical clustering methodology. Of particular interest, in the case of MS where the milestones are very close to each other, accurate determination of the milestone identity of the trajectory becomes a challenging issue. Throughout this work we present two cluster analysis methods applied to the cis–trans isomerism of dinucleotide AA. The choice of nucleic acids to commonly used proteins to study the cluster analysis is two fold: i) the energy landscape is rugged; hence transitions are more complex, enabling a more realistic model to study conformational transitions, ii) Nucleic acids conformational space is high dimensional. A diverse set of internal coordinates is necessary to describe the metastable states in nucleic acids, posing a challenge in studying the conformational transitions. Herein, we need improved clustering methods that accurately identify the AA structure in its metastable states in a robust way for a wide range of confused data conditions. The single linkage approach of the hierarchical clustering available in GROMACS MD-package is the first clustering methodology applied to our data. Self Organizing Map (SOM) neural network, that also known as a Kohonen network, is the second data clustering methodology. The performance comparison of the neural network as well as hierarchical clustering method is studied by means of computing the mean first passage times for the cis-trans conformational rates. Our hope is that this study provides insight into the complexities and need in determining the appropriate clustering algorithm for kinetic analysis. Our results can improve the effectiveness of decisions based on clustering confused empirical data in studying conformational transitions in biomolecules.

Keywords: milestoning, self organizing map, single linkage, structure clustering

Procedia PDF Downloads 194
3806 Data Mining Spatial: Unsupervised Classification of Geographic Data

Authors: Chahrazed Zouaoui

Abstract:

In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.

Keywords: mining, GIS, geo-clustering, neighborhood

Procedia PDF Downloads 356
3805 A Relative Entropy Regularization Approach for Fuzzy C-Means Clustering Problem

Authors: Ouafa Amira, Jiangshe Zhang

Abstract:

Clustering is an unsupervised machine learning technique; its aim is to extract the data structures, in which similar data objects are grouped in the same cluster, whereas dissimilar objects are grouped in different clusters. Clustering methods are widely utilized in different fields, such as: image processing, computer vision , and pattern recognition, etc. Fuzzy c-means clustering (fcm) is one of the most well known fuzzy clustering methods. It is based on solving an optimization problem, in which a minimization of a given cost function has been studied. This minimization aims to decrease the dissimilarity inside clusters, where the dissimilarity here is measured by the distances between data objects and cluster centers. The degree of belonging of a data point in a cluster is measured by a membership function which is included in the interval [0, 1]. In fcm clustering, the membership degree is constrained with the condition that the sum of a data object’s memberships in all clusters must be equal to one. This constraint can cause several problems, specially when our data objects are included in a noisy space. Regularization approach took a part in fuzzy c-means clustering technique. This process introduces an additional information in order to solve an ill-posed optimization problem. In this study, we focus on regularization by relative entropy approach, where in our optimization problem we aim to minimize the dissimilarity inside clusters. Finding an appropriate membership degree to each data object is our objective, because an appropriate membership degree leads to an accurate clustering result. Our clustering results in synthetic data sets, gaussian based data sets, and real world data sets show that our proposed model achieves a good accuracy.

Keywords: clustering, fuzzy c-means, regularization, relative entropy

Procedia PDF Downloads 240
3804 Optical Flow Direction Determination for Railway Crossing Occupancy Monitoring

Authors: Zdenek Silar, Martin Dobrovolny

Abstract:

This article deals with the obstacle detection on a railway crossing (clearance detection). Detection is based on the optical flow estimation and classification of the flow vectors by K-means clustering algorithm. For classification of passing vehicles is used optical flow direction determination. The optical flow estimation is based on a modified Lucas-Kanade method.

Keywords: background estimation, direction of optical flow, K-means clustering, objects detection, railway crossing monitoring, velocity vectors

Procedia PDF Downloads 485
3803 Study for an Optimal Cable Connection within an Inner Grid of an Offshore Wind Farm

Authors: Je-Seok Shin, Wook-Won Kim, Jin-O Kim

Abstract:

The offshore wind farm needs to be designed carefully considering economics and reliability aspects. There are many decision-making problems for designing entire offshore wind farm, this paper focuses on an inner grid layout which means the connection between wind turbines as well as between wind turbines and an offshore substation. A methodology proposed in this paper determines the connections and the cable type for each connection section using K-clustering, minimum spanning tree and cable selection algorithms. And then, a cost evaluation is performed in terms of investment, power loss and reliability. Through the cost evaluation, an optimal layout of inner grid is determined so as to have the lowest total cost. In order to demonstrate the validity of the methodology, the case study is conducted on 240MW offshore wind farm, and the results show that it is helpful to design optimally offshore wind farm.

Keywords: offshore wind farm, optimal layout, k-clustering algorithm, minimum spanning algorithm, cable type selection, power loss cost, reliability cost

Procedia PDF Downloads 358
3802 Graph Clustering Unveiled: ClusterSyn - A Machine Learning Framework for Predicting Anti-Cancer Drug Synergy Scores

Authors: Babak Bahri, Fatemeh Yassaee Meybodi, Changiz Eslahchi

Abstract:

In the pursuit of effective cancer therapies, the exploration of combinatorial drug regimens is crucial to leverage synergistic interactions between drugs, thereby improving treatment efficacy and overcoming drug resistance. However, identifying synergistic drug pairs poses challenges due to the vast combinatorial space and limitations of experimental approaches. This study introduces ClusterSyn, a machine learning (ML)-powered framework for classifying anti-cancer drug synergy scores. ClusterSyn employs a two-step approach involving drug clustering and synergy score prediction using a fully connected deep neural network. For each cell line in the training dataset, a drug graph is constructed, with nodes representing drugs and edge weights denoting synergy scores between drug pairs. Drugs are clustered using the Markov clustering (MCL) algorithm, and vectors representing the similarity of drug pairs to each cluster are input into the deep neural network for synergy score prediction (synergy or antagonism). Clustering results demonstrate effective grouping of drugs based on synergy scores, aligning similar synergy profiles. Subsequently, neural network predictions and synergy scores of the two drugs on others within their clusters are used to predict the synergy score of the considered drug pair. This approach facilitates comparative analysis with clustering and regression-based methods, revealing the superior performance of ClusterSyn over state-of-the-art methods like DeepSynergy and DeepDDS on diverse datasets such as Oniel and Almanac. The results highlight the remarkable potential of ClusterSyn as a versatile tool for predicting anti-cancer drug synergy scores.

Keywords: drug synergy, clustering, prediction, machine learning., deep learning

Procedia PDF Downloads 40
3801 A Parallel Implementation of k-Means in MATLAB

Authors: Dimitris Varsamis, Christos Talagkozis, Alkiviadis Tsimpiris, Paris Mastorocostas

Abstract:

The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.

Keywords: K-means algorithm, clustering, parallel computations, Matlab

Procedia PDF Downloads 345
3800 Clustering for Detection of the Population at Risk of Anticholinergic Medication

Authors: A. Shirazibeheshti, T. Radwan, A. Ettefaghian, G. Wilson, C. Luca, Farbod Khanizadeh

Abstract:

Anticholinergic medication has been associated with events such as falls, delirium, and cognitive impairment in older patients. To further assess this, anticholinergic burden scores have been developed to quantify risk. A risk model based on clustering was deployed in a healthcare management system to cluster patients into multiple risk groups according to anticholinergic burden scores of multiple medicines prescribed to patients to facilitate clinical decision-making. To do so, anticholinergic burden scores of drugs were extracted from the literature, which categorizes the risk on a scale of 1 to 3. Given the patients’ prescription data on the healthcare database, a weighted anticholinergic risk score was derived per patient based on the prescription of multiple anticholinergic drugs. This study was conducted on over 300,000 records of patients currently registered with a major regional UK-based healthcare provider. The weighted risk scores were used as inputs to an unsupervised learning algorithm (mean-shift clustering) that groups patients into clusters that represent different levels of anticholinergic risk. To further evaluate the performance of the model, any association between the average risk score within each group and other factors such as socioeconomic status (i.e., Index of Multiple Deprivation) and an index of health and disability were investigated. The clustering identifies a group of 15 patients at the highest risk from multiple anticholinergic medication. Our findings also show that this group of patients is located within more deprived areas of London compared to the population of other risk groups. Furthermore, the prescription of anticholinergic medicines is more skewed to female than male patients, indicating that females are more at risk from this kind of multiple medications. The risk may be monitored and controlled in well artificial intelligence-equipped healthcare management systems.

Keywords: anticholinergic medicines, clustering, deprivation, socioeconomic status

Procedia PDF Downloads 170
3799 Routing and Energy Efficiency through Data Coupled Clustering in Large Scale Wireless Sensor Networks (WSNs)

Authors: Jainendra Singh, Zaheeruddin

Abstract:

A typical wireless sensor networks (WSNs) consists of several tiny and low-power sensors which use radio frequency to perform distributed sensing tasks. The longevity of wireless sensor networks (WSNs) is a major issue that impacts the application of such networks. While routing protocols are striving to save energy by acting on sensor nodes, recent studies show that network lifetime can be enhanced by further involving sink mobility. A common approach for energy efficiency is partitioning the network into clusters with correlated data, where the representative nodes simply transmit or average measurements inside the cluster. In this paper, we propose an energy- efficient homogenous clustering (EHC) technique. In this technique, the decision of each sensor is based on their residual energy and an estimate of how many of its neighboring cluster heads (CHs) will benefit from it being a CH. We, also explore the routing algorithm in clustered WSNs. We show that the proposed schemes significantly outperform current approaches in terms of packet delay, hop count and energy consumption of WSNs.

Keywords: wireless sensor network, energy efficiency, clustering, routing

Procedia PDF Downloads 236
3798 Improved Qualitative Modeling of the Magnetization Curve B(H) of the Ferromagnetic Materials for a Transformer Used in the Power Supply for Magnetron

Authors: M. Bassoui, M. Ferfra, M. Chrayagne

Abstract:

This paper presents a qualitative modeling for the nonlinear B-H curve of the saturable magnetic materials for a transformer with shunts used in the power supply for the magnetron. This power supply is composed of a single phase leakage flux transformer supplying a cell composed of a capacitor and a diode, which double the voltage and stabilize the current, and a single magnetron at the output of the cell. A procedure consisting of a fuzzy clustering method and a rule processing algorithm is then employed for processing the constructed fuzzy modeling rules to extract the qualitative properties of the curve.

Keywords: B(H) curve, fuzzy clustering, magnetron, power supply

Procedia PDF Downloads 205
3797 Max-Entropy Feed-Forward Clustering Neural Network

Authors: Xiaohan Bookman, Xiaoyan Zhu

Abstract:

The outputs of non-linear feed-forward neural network are positive, which could be treated as probability when they are normalized to one. If we take Entropy-Based Principle into consideration, the outputs for each sample could be represented as the distribution of this sample for different clusters. Entropy-Based Principle is the principle with which we could estimate the unknown distribution under some limited conditions. As this paper defines two processes in Feed-Forward Neural Network, our limited condition is the abstracted features of samples which are worked out in the abstraction process. And the final outputs are the probability distribution for different clusters in the clustering process. As Entropy-Based Principle is considered into the feed-forward neural network, a clustering method is born. We have conducted some experiments on six open UCI data sets, comparing with a few baselines and applied purity as the measurement. The results illustrate that our method outperforms all the other baselines that are most popular clustering methods.

Keywords: feed-forward neural network, clustering, max-entropy principle, probabilistic models

Procedia PDF Downloads 407
3796 Clustering of Extremes in Financial Returns: A Comparison between Developed and Emerging Markets

Authors: Sara Ali Alokley, Mansour Saleh Albarrak

Abstract:

This paper investigates the dependency or clustering of extremes in the financial returns data by estimating the extremal index value θ∈[0,1]. The smaller the value of θ the more clustering we have. Here we apply the method of Ferro and Segers (2003) to estimate the extremal index for a range of threshold values. We compare the dependency structure of extremes in the developed and emerging markets. We use the financial returns of the stock market index in the developed markets of US, UK, France, Germany and Japan and the emerging markets of Brazil, Russia, India, China and Saudi Arabia. We expect that more clustering occurs in the emerging markets. This study will help to understand the dependency structure of the financial returns data.

Keywords: clustring, extremes, returns, dependency, extermal index

Procedia PDF Downloads 363
3795 Progressive Multimedia Collection Structuring via Scene Linking

Authors: Aman Berhe, Camille Guinaudeau, Claude Barras

Abstract:

In order to facilitate information seeking in large collections of multimedia documents with long and progressive content (such as broadcast news or TV series), one can extract the semantic links that exist between semantically coherent parts of documents, i.e., scenes. The links can then create a coherent collection of scenes from which it is easier to perform content analysis, topic extraction, or information retrieval. In this paper, we focus on TV series structuring and propose two approaches for scene linking at different levels of granularity (episode and season): a fuzzy online clustering technique and a graph-based community detection algorithm. When evaluated on the two first seasons of the TV series Game of Thrones, we found that the fuzzy online clustering approach performed better compared to graph-based community detection at the episode level, while graph-based approaches show better performance at the season level.

Keywords: multimedia collection structuring, progressive content, scene linking, fuzzy clustering, community detection

Procedia PDF Downloads 70
3794 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 434
3793 Co-Evolutionary Fruit Fly Optimization Algorithm and Firefly Algorithm for Solving Unconstrained Optimization Problems

Authors: R. M. Rizk-Allah

Abstract:

This paper presents co-evolutionary fruit fly optimization algorithm based on firefly algorithm (CFOA-FA) for solving unconstrained optimization problems. The proposed algorithm integrates the merits of fruit fly optimization algorithm (FOA), firefly algorithm (FA) and elite strategy to refine the performance of classical FOA. Moreover, co-evolutionary mechanism is performed by applying FA procedures to ensure the diversity of the swarm. Finally, the proposed algorithm CFOA- FA is tested on several benchmark problems from the usual literature and the numerical results have demonstrated the superiority of the proposed algorithm for finding the global optimal solution.

Keywords: firefly algorithm, fruit fly optimization algorithm, unconstrained optimization problems

Procedia PDF Downloads 504
3792 Intrusion Detection Using Dual Artificial Techniques

Authors: Rana I. Abdulghani, Amera I. Melhum

Abstract:

With the abnormal growth of the usage of computers over networks and under the consideration or agreement of most of the computer security experts who said that the goal of building a secure system is never achieved effectively, all these points led to the design of the intrusion detection systems(IDS). This research adopts a comparison between two techniques for network intrusion detection, The first one used the (Particles Swarm Optimization) that fall within the field (Swarm Intelligence). In this Act, the algorithm Enhanced for the purpose of obtaining the minimum error rate by amending the cluster centers when better fitness function is found through the training stages. Results show that this modification gives more efficient exploration of the original algorithm. The second algorithm used a (Back propagation NN) algorithm. Finally a comparison between the results of two methods used were based on (NSL_KDD) data sets for the construction and evaluation of intrusion detection systems. This research is only interested in clustering the two categories (Normal and Abnormal) for the given connection records. Practices experiments result in intrude detection rate (99.183818%) for EPSO and intrude detection rate (69.446416%) for BP neural network.

Keywords: IDS, SI, BP, NSL_KDD, PSO

Procedia PDF Downloads 357