Search results for: attributed graph clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2015

Search results for: attributed graph clustering

1985 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm

Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan

Abstract:

This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.

Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data

Procedia PDF Downloads 190
1984 Scattering Operator and Spectral Clustering for Ultrasound Images: Application on Deep Venous Thrombi

Authors: Thibaud Berthomier, Ali Mansour, Luc Bressollette, Frédéric Le Roy, Dominique Mottier, Léo Fréchier, Barthélémy Hermenault

Abstract:

Deep Venous Thrombosis (DVT) occurs when a thrombus is formed within a deep vein (most often in the legs). This disease can be deadly if a part or the whole thrombus reaches the lung and causes a Pulmonary Embolism (PE). This disorder, often asymptomatic, has multifactorial causes: immobilization, surgery, pregnancy, age, cancers, and genetic variations. Our project aims to relate the thrombus epidemiology (origins, patient predispositions, PE) to its structure using ultrasound images. Ultrasonography and elastography were collected using Toshiba Aplio 500 at Brest Hospital. This manuscript compares two classification approaches: spectral clustering and scattering operator. The former is based on the graph and matrix theories while the latter cascades wavelet convolutions with nonlinear modulus and averaging operators.

Keywords: deep venous thrombosis, ultrasonography, elastography, scattering operator, wavelet, spectral clustering

Procedia PDF Downloads 451
1983 Existence and Construction of Maximal Rectangular Duals

Authors: Krishnendra Shekhawat

Abstract:

Given a graph G = (V, E), a rectangular dual of G represents the vertices of G by a set of interior-disjoint rectangles such that two rectangles touch if and only if there is an edge between the two corresponding vertices in G. Rectangular duals do not exist for every graph, so we can define maximal rectangular duals. A maximal rectangular dual is a rectangular dual of a graph G such that there exists no graph G ′ with a rectangular dual where G is a subgraph of G ′. In this paper, we enumerate all maximal rectangular duals (or, to be precise, the corresponding planar graphs) up to six nodes and presents a necessary condition for the existence of a rectangular dual. This work allegedly has applications in integrated circuit design and architectural floor plans.

Keywords: adjacency, degree sequence, dual graph, rectangular dual

Procedia PDF Downloads 237
1982 Characterising Stable Model by Extended Labelled Dependency Graph

Authors: Asraful Islam

Abstract:

Extended dependency graph (EDG) is a state-of-the-art isomorphic graph to represent normal logic programs (NLPs) that can characterize the consistency of NLPs by graph analysis. To construct the vertices and arcs of an EDG, additional renaming atoms and rules besides those the given program provides are used, resulting in higher space complexity compared to the corresponding traditional dependency graph (TDG). In this article, we propose an extended labeled dependency graph (ELDG) to represent an NLP that shares an equal number of nodes and arcs with TDG and prove that it is isomorphic to the domain program. The number of nodes and arcs used in the underlying dependency graphs are formulated to compare the space complexity. Results show that ELDG uses less memory to store nodes, arcs, and cycles compared to EDG. To exhibit the desirability of ELDG, firstly, the stable models of the kernel form of NLP are characterized by the admissible coloring of ELDG; secondly, a relation of the stable models of a kernel program with the handles of the minimal, odd cycles appearing in the corresponding ELDG has been established; thirdly, to our best knowledge, for the first time an inverse transformation from a dependency graph to the representing NLP w.r.t. ELDG has been defined that enables transferring analytical results from the graph to the program straightforwardly.

Keywords: normal logic program, isomorphism of graph, extended labelled dependency graph, inverse graph transforma-tion, graph colouring

Procedia PDF Downloads 185
1981 Topological Analyses of Unstructured Peer to Peer Systems: A Survey

Authors: Hend Alrasheed

Abstract:

Due to their different properties that have led to avoid several limitations of classic client/server systems, there has been a great interest in the development and the improvement of different peer to peer systems. Understanding the properties of complex peer to peer networks is essential for their future improvements. It was shown that the performances of peer to peer protocols are directly related to their underlying topologies. Therefore, multiple efforts have analyzed the topologies of different peer to peer systems. This study presents an overview of major findings of close experimental analyses to different topologies of three unstructured peer to peer systems: BitTorrent, Gnutella, and FreeNet.

Keywords: peer to peer networks, network topology, graph diameter, clustering coefficient, small-world property, random graph, degree distribution

Procedia PDF Downloads 357
1980 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents

Authors: Ying Zhao, Xingyan Bin

Abstract:

Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.

Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints

Procedia PDF Downloads 507
1979 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 146
1978 ACOPIN: An ACO Algorithm with TSP Approach for Clustering Proteins in Protein Interaction Networks

Authors: Jamaludin Sallim, Rozlina Mohamed, Roslina Abdul Hamid

Abstract:

In this paper, we proposed an Ant Colony Optimization (ACO) algorithm together with Traveling Salesman Problem (TSP) approach to investigate the clustering problem in Protein Interaction Networks (PIN). We named this combination as ACOPIN. The purpose of this work is two-fold. First, to test the efficacy of ACO in clustering PIN and second, to propose the simple generalization of the ACO algorithm that might allow its application in clustering proteins in PIN. We split this paper to three main sections. First, we describe the PIN and clustering proteins in PIN. Second, we discuss the steps involved in each phase of ACO algorithm. Finally, we present some results of the investigation with the clustering patterns.

Keywords: ant colony optimization algorithm, searching algorithm, protein functional module, protein interaction network

Procedia PDF Downloads 573
1977 Introduction to Paired Domination Polynomial of a Graph

Authors: Puttaswamy, Anwar Alwardi, Nayaka S. R.

Abstract:

One of the algebraic representation of a graph is the graph polynomial. In this article, we introduce the paired-domination polynomial of a graph G. The paired-domination polynomial of a graph G of order n is the polynomial Dp(G, x) with the coefficients dp(G, i) where dp(G, i) denotes the number of paired dominating sets of G of cardinality i and γpd(G) denotes the paired-domination number of G. We obtain some properties of Dp(G, x) and its coefficients. Further, we compute this polynomial for some families of standard graphs. Further, we obtain some characterization for some specific graphs.

Keywords: domination polynomial, paired dominating set, paired domination number, paired domination polynomial

Procedia PDF Downloads 200
1976 Eccentric Connectivity Index, First and Second Zagreb Indices of Corona Graph

Authors: A. Kulandai Therese

Abstract:

The eccentric connectivity index based on degree and eccentricity of the vertices of a graph is a widely used graph invariant in mathematics.In this paper, we present the explicit eccentric connectivity index, first and second Zagreb indices for a Corona graph and sub division-related corona graphs.

Keywords: corona graph, degree, eccentricity, eccentric connectivity index, first zagreb index, second zagreb index, subdivision graphs

Procedia PDF Downloads 311
1975 2D Structured Non-Cyclic Fuzzy Graphs

Authors: T. Pathinathan, M. Peter

Abstract:

Fuzzy graphs incorporate concepts from graph theory with fuzzy principles. In this paper, we make a study on the properties of fuzzy graphs which are non-cyclic and are of two-dimensional in structure. In particular, this paper presents 2D structure or the structure of double layer for a non-cyclic fuzzy graph whose underlying crisp graph is non-cyclic. In any graph structure, introducing 2D structure may lead to an inherent cycle. We propose relevant conditions for 2D structured non-cyclic fuzzy graphs. These conditions are extended even to fuzzy graphs of the 3D structure. General theoretical properties that are studied for any fuzzy graph are verified to 2D structured or double layered fuzzy graphs. Concepts like Order, Degree, Strong and Size for a fuzzy graph are studied for 2D structured or double layered non-cyclic fuzzy graphs. Using different types of fuzzy graphs, the proposed concepts relating to 2D structured fuzzy graphs are verified.

Keywords: double layered fuzzy graph, double layered non–cyclic fuzzy graph, order, degree and size

Procedia PDF Downloads 369
1974 Spectral Clustering for Manufacturing Cell Formation

Authors: Yessica Nataliani, Miin-Shen Yang

Abstract:

Cell formation (CF) is an important step in group technology. It is used in designing cellular manufacturing systems using similarities between parts in relation to machines so that it can identify part families and machine groups. There are many CF methods in the literature, but there is less spectral clustering used in CF. In this paper, we propose a spectral clustering algorithm for machine-part CF. Some experimental examples are used to illustrate its efficiency. Overall, the spectral clustering algorithm can be used in CF with a wide variety of machine/part matrices.

Keywords: group technology, cell formation, spectral clustering, grouping efficiency

Procedia PDF Downloads 375
1973 Multi-Stream Graph Attention Network for Recommendation with Knowledge Graph

Authors: Zhifei Hu, Feng Xia

Abstract:

In recent years, Graph neural network has been widely used in knowledge graph recommendation. The existing recommendation methods based on graph neural network extract information from knowledge graph through entity and relation, which may not be efficient in the way of information extraction. In order to better propose useful entity information for the current recommendation task in the knowledge graph, we propose an end-to-end Neural network Model based on multi-stream graph attentional Mechanism (MSGAT), which can effectively integrate the knowledge graph into the recommendation system by evaluating the importance of entities from both users and items. Specifically, we use the attention mechanism from the user's perspective to distil the domain nodes information of the predicted item in the knowledge graph, to enhance the user's information on items, and generate the feature representation of the predicted item. Due to user history, click items can reflect the user's interest distribution, we propose a multi-stream attention mechanism, based on the user's preference for entities and relationships, and the similarity between items to be predicted and entities, aggregate user history click item's neighborhood entity information in the knowledge graph and generate the user's feature representation. We evaluate our model on three real recommendation datasets: Movielens-1M (ML-1M), LFM-1B 2015 (LFM-1B), and Amazon-Book (AZ-book). Experimental results show that compared with the most advanced models, our proposed model can better capture the entity information in the knowledge graph, which proves the validity and accuracy of the model.

Keywords: graph attention network, knowledge graph, recommendation, information propagation

Procedia PDF Downloads 91
1972 Bounds on the Laplacian Vertex PI Energy

Authors: Ezgi Kaya, A. Dilek Maden

Abstract:

A topological index is a number related to graph which is invariant under graph isomorphism. In theoretical chemistry, molecular structure descriptors (also called topological indices) are used for modeling physicochemical, pharmacologic, toxicologic, biological and other properties of chemical compounds. Let G be a graph with n vertices and m edges. For a given edge uv, the quantity nu(e) denotes the number of vertices closer to u than v, the quantity nv(e) is defined analogously. The vertex PI index defined as the sum of the nu(e) and nv(e). Here the sum is taken over all edges of G. The energy of a graph is defined as the sum of the eigenvalues of adjacency matrix of G and the Laplacian energy of a graph is defined as the sum of the absolute value of difference of laplacian eigenvalues and average degree of G. In theoretical chemistry, the π-electron energy of a conjugated carbon molecule, computed using the Hückel theory, coincides with the energy. Hence results on graph energy assume special significance. The Laplacian matrix of a graph G weighted by the vertex PI weighting is the Laplacian vertex PI matrix and the Laplacian vertex PI eigenvalues of a connected graph G are the eigenvalues of its Laplacian vertex PI matrix. In this study, Laplacian vertex PI energy of a graph is defined of G. We also give some bounds for the Laplacian vertex PI energy of graphs in terms of vertex PI index, the sum of the squares of entries in the Laplacian vertex PI matrix and the absolute value of the determinant of the Laplacian vertex PI matrix.

Keywords: energy, Laplacian energy, laplacian vertex PI eigenvalues, Laplacian vertex PI energy, vertex PI index

Procedia PDF Downloads 210
1971 Graph Codes - 2D Projections of Multimedia Feature Graphs for Fast and Effective Retrieval

Authors: Stefan Wagenpfeil, Felix Engel, Paul McKevitt, Matthias Hemmje

Abstract:

Multimedia Indexing and Retrieval is generally designed and implemented by employing feature graphs. These graphs typically contain a significant number of nodes and edges to reflect the level of detail in feature detection. A higher level of detail increases the effectiveness of the results but also leads to more complex graph structures. However, graph-traversal-based algorithms for similarity are quite inefficient and computation intensive, especially for large data structures. To deliver fast and effective retrieval, an efficient similarity algorithm, particularly for large graphs, is mandatory. Hence, in this paper, we define a graph-projection into a 2D space (Graph Code) as well as the corresponding algorithms for indexing and retrieval. We show that calculations in this space can be performed more efficiently than graph-traversals due to a simpler processing model and a high level of parallelization. In consequence, we prove that the effectiveness of retrieval also increases substantially, as Graph Codes facilitate more levels of detail in feature fusion. Thus, Graph Codes provide a significant increase in efficiency and effectiveness (especially for Multimedia indexing and retrieval) and can be applied to images, videos, audio, and text information.

Keywords: indexing, retrieval, multimedia, graph algorithm, graph code

Procedia PDF Downloads 133
1970 Construction of Graph Signal Modulations via Graph Fourier Transform and Its Applications

Authors: Xianwei Zheng, Yuan Yan Tang

Abstract:

Classical window Fourier transform has been widely used in signal processing, image processing, machine learning and pattern recognition. The related Gabor transform is powerful enough to capture the texture information of any given dataset. Recently, in the emerging field of graph signal processing, researchers devoting themselves to develop a graph signal processing theory to handle the so-called graph signals. Among the new developing theory, windowed graph Fourier transform has been constructed to establish a time-frequency analysis framework of graph signals. The windowed graph Fourier transform is defined by using the translation and modulation operators of graph signals, following the similar calculations in classical windowed Fourier transform. Specifically, the translation and modulation operators of graph signals are defined by using the Laplacian eigenvectors as follows. For a given graph signal, its translation is defined by a similar manner as its definition in classical signal processing. Specifically, the translation operator can be defined by using the Fourier atoms; the graph signal translation is defined similarly by using the Laplacian eigenvectors. The modulation of the graph can also be established by using the Laplacian eigenvectors. The windowed graph Fourier transform based on these two operators has been applied to obtain time-frequency representations of graph signals. Fundamentally, the modulation operator is defined similarly to the classical modulation by multiplying a graph signal with the entries in each Fourier atom. However, a single Laplacian eigenvector entry cannot play a similar role as the Fourier atom. This definition ignored the relationship between the translation and modulation operators. In this paper, a new definition of the modulation operator is proposed and thus another time-frequency framework for graph signal is constructed. Specifically, the relationship between the translation and modulation operations can be established by the Fourier transform. Specifically, for any signal, the Fourier transform of its translation is the modulation of its Fourier transform. Thus, the modulation of any signal can be defined as the inverse Fourier transform of the translation of its Fourier transform. Therefore, similarly, the graph modulation of any graph signal can be defined as the inverse graph Fourier transform of the translation of its graph Fourier. The novel definition of the graph modulation operator established a relationship of the translation and modulation operations. The new modulation operation and the original translation operation are applied to construct a new framework of graph signal time-frequency analysis. Furthermore, a windowed graph Fourier frame theory is developed. Necessary and sufficient conditions for constructing windowed graph Fourier frames, tight frames and dual frames are presented in this paper. The novel graph signal time-frequency analysis framework is applied to signals defined on well-known graphs, e.g. Minnesota road graph and random graphs. Experimental results show that the novel framework captures new features of graph signals.

Keywords: graph signals, windowed graph Fourier transform, windowed graph Fourier frames, vertex frequency analysis

Procedia PDF Downloads 314
1969 On Chromaticity of Wheels

Authors: Zainab Yasir Abed Al-Rekaby, Abdul Jalil M. Khalaf

Abstract:

Let the vertices of a graph such that every two adjacent vertices have different color is a very common problem in the graph theory. This is known as proper coloring of graphs. The possible number of different proper colorings on a graph with a given number of colors can be represented by a function called the chromatic polynomial. Two graphs G and H are said to be chromatically equivalent, if they share the same chromatic polynomial. A Graph G is chromatically unique, if G is isomorphic to H for any graph H such that G is chromatically equivalent to H. The study of chromatically equivalent and chromatically unique problems is called chromaticity. This paper shows that a wheel W12 is chromatically unique.

Keywords: chromatic polynomial, chromatically equivalent, chromatically unique, wheel

Procedia PDF Downloads 402
1968 Investigation of Clustering Algorithms Used in Wireless Sensor Networks

Authors: Naim Karasekreter, Ugur Fidan, Fatih Basciftci

Abstract:

Wireless sensor networks are networks in which more than one sensor node is organized among themselves. The working principle is based on the transfer of the sensed data over the other nodes in the network to the central station. Wireless sensor networks concentrate on routing algorithms, energy efficiency and clustering algorithms. In the clustering method, the nodes in the network are divided into clusters using different parameters and the most suitable cluster head is selected from among them. The data to be sent to the center is sent per cluster, and the cluster head is transmitted to the center. With this method, the network traffic is reduced and the energy efficiency of the nodes is increased. In this study, clustering algorithms were examined in terms of clustering performances and cluster head selection characteristics to try to identify weak and strong sides. This work is supported by the Project 17.Kariyer.123 of Afyon Kocatepe University BAP Commission.

Keywords: wireless sensor networks (WSN), clustering algorithm, cluster head, clustering

Procedia PDF Downloads 475
1967 A Comparative Study of Multi-SOM Algorithms for Determining the Optimal Number of Clusters

Authors: Imèn Khanchouch, Malika Charrad, Mohamed Limam

Abstract:

The interpretation of the quality of clusters and the determination of the optimal number of clusters is still a crucial problem in clustering. We focus in this paper on multi-SOM clustering method which overcomes the problem of extracting the number of clusters from the SOM map through the use of a clustering validity index. We then tested multi-SOM using real and artificial data sets with different evaluation criteria not used previously such as Davies Bouldin index, Dunn index and silhouette index. The developed multi-SOM algorithm is compared to k-means and Birch methods. Results show that it is more efficient than classical clustering methods.

Keywords: clustering, SOM, multi-SOM, DB index, Dunn index, silhouette index

Procedia PDF Downloads 572
1966 Explainable Graph Attention Networks

Authors: David Pham, Yongfeng Zhang

Abstract:

Graphs are an important structure for data storage and computation. Recent years have seen the success of deep learning on graphs such as Graph Neural Networks (GNN) on various data mining and machine learning tasks. However, most of the deep learning models on graphs cannot easily explain their predictions and are thus often labelled as “black boxes.” For example, Graph Attention Network (GAT) is a frequently used GNN architecture, which adopts an attention mechanism to carefully select the neighborhood nodes for message passing and aggregation. However, it is difficult to explain why certain neighbors are selected while others are not and how the selected neighbors contribute to the final classification result. In this paper, we present a graph learning model called Explainable Graph Attention Network (XGAT), which integrates graph attention modeling and explainability. We use a single model to target both the accuracy and explainability of problem spaces and show that in the context of graph attention modeling, we can design a unified neighborhood selection strategy that selects appropriate neighbor nodes for both better accuracy and enhanced explainability. To justify this, we conduct extensive experiments to better understand the behavior of our model under different conditions and show an increase in both accuracy and explainability.

Keywords: explainable AI, graph attention network, graph neural network, node classification

Procedia PDF Downloads 148
1965 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects

Authors: Behnam Tavakkol

Abstract:

Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.

Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data

Procedia PDF Downloads 179
1964 A Study of Chromatic Uniqueness of W14

Authors: Zainab Yasir Al-Rekaby, Abdul Jalil M. Khalaf

Abstract:

Coloring the vertices of a graph such that every two adjacent vertices have different color is a very common problem in the graph theory. This is known as proper coloring of graphs. The possible number of different proper colorings on a graph with a given number of colors can be represented by a function called the chromatic polynomial. Two graphs G and H are said to be chromatically equivalent, if they share the same chromatic polynomial. A Graph G is chromatically unique, if G is isomorphic to H for any graph H such that G is chromatically equivalent to H. The study of chromatically equivalent and chromatically unique problems is called chromaticity. This paper shows that a wheel W14 is chromatically unique.

Keywords: chromatic polynomial, chromatically Equivalent, chromatically unique, wheel

Procedia PDF Downloads 388
1963 An Experimental Study on Some Conventional and Hybrid Models of Fuzzy Clustering

Authors: Jeugert Kujtila, Kristi Hoxhalli, Ramazan Dalipi, Erjon Cota, Ardit Murati, Erind Bedalli

Abstract:

Clustering is a versatile instrument in the analysis of collections of data providing insights of the underlying structures of the dataset and enhancing the modeling capabilities. The fuzzy approach to the clustering problem increases the flexibility involving the concept of partial memberships (some value in the continuous interval [0, 1]) of the instances in the clusters. Several fuzzy clustering algorithms have been devised like FCM, Gustafson-Kessel, Gath-Geva, kernel-based FCM, PCM etc. Each of these algorithms has its own advantages and drawbacks, so none of these algorithms would be able to perform superiorly in all datasets. In this paper we will experimentally compare FCM, GK, GG algorithm and a hybrid two-stage fuzzy clustering model combining the FCM and Gath-Geva algorithms. Firstly we will theoretically dis-cuss the advantages and drawbacks for each of these algorithms and we will describe the hybrid clustering model exploiting the advantages and diminishing the drawbacks of each algorithm. Secondly we will experimentally compare the accuracy of the hybrid model by applying it on several benchmark and synthetic datasets.

Keywords: fuzzy clustering, fuzzy c-means algorithm (FCM), Gustafson-Kessel algorithm, hybrid clustering model

Procedia PDF Downloads 481
1962 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 370
1961 A Graph-Based Retrieval Model for Passage Search

Authors: Junjie Zhong, Kai Hong, Lei Wang

Abstract:

Passage Retrieval (PR) plays an important role in many Natural Language Processing (NLP) tasks. Traditional efficient retrieval models relying on exact term-matching, such as TF-IDF or BM25, have nowadays been exceeded by pre-trained language models which match by semantics. Though they gain effectiveness, deep language models often require large memory as well as time cost. To tackle the trade-off between efficiency and effectiveness in PR, this paper proposes Graph Passage Retriever (GraphPR), a graph-based model inspired by the development of graph learning techniques. Different from existing works, GraphPR is end-to-end and integrates both term-matching information and semantics. GraphPR constructs a passage-level graph from BM25 retrieval results and trains a GCN-like model on the graph with graph-based objectives. Passages were regarded as nodes in the constructed graph and were embedded in dense vectors. PR can then be implemented using embeddings and a fast vector-similarity search. Experiments on a variety of real-world retrieval datasets show that the proposed model outperforms related models in several evaluation metrics (e.g., mean reciprocal rank, accuracy, F1-scores) while maintaining a relatively low query latency and memory usage.

Keywords: efficiency, effectiveness, graph learning, language model, passage retrieval, term-matching model

Procedia PDF Downloads 97
1960 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Ji Eun Shin, Dong Hoon Lim

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: big data, combiner, K-means clustering, RHadoop

Procedia PDF Downloads 403
1959 The K-Distance Neighborhood Polynomial of a Graph

Authors: Soner Nandappa D., Ahmed Mohammed Naji

Abstract:

In a graph G = (V, E), the distance from a vertex v to a vertex u is the length of shortest v to u path. The eccentricity e(v) of v is the distance to a farthest vertex from v. The diameter diam(G) is the maximum eccentricity. The k-distance neighborhood of v, for 0 ≤ k ≤ e(v), is Nk(v) = {u ϵ V (G) : d(v, u) = k}. In this paper, we introduce a new distance degree based topological polynomial of a graph G is called a k- distance neighborhood polynomial, denoted Nk(G, x). It is a polynomial with the coefficient of the term k, for 0 ≤ k ≤ e(v), is the sum of the cardinalities of Nk(v) for every v ϵ V (G). Some properties of k- distance neighborhood polynomials are obtained. Exact formulas of the k- distance neighborhood polynomial for some well-known graphs, Cartesian product and join of graphs are presented.

Keywords: vertex degrees, distance in graphs, graph operation, Nk-polynomials

Procedia PDF Downloads 508
1958 A Summary-Based Text Classification Model for Graph Attention Networks

Authors: Shuo Liu

Abstract:

In Chinese text classification tasks, redundant words and phrases can interfere with the formation of extracted and analyzed text information, leading to a decrease in the accuracy of the classification model. To reduce irrelevant elements, extract and utilize text content information more efficiently and improve the accuracy of text classification models. In this paper, the text in the corpus is first extracted using the TextRank algorithm for abstraction, the words in the abstract are used as nodes to construct a text graph, and then the graph attention network (GAT) is used to complete the task of classifying the text. Testing on a Chinese dataset from the network, the classification accuracy was improved over the direct method of generating graph structures using text.

Keywords: Chinese natural language processing, text classification, abstract extraction, graph attention network

Procedia PDF Downloads 68
1957 A Graph Library Development Based on the Service-‎Oriented Architecture: Used for Representation of the ‎Biological ‎Systems in the Computer Algorithms

Authors: Mehrshad Khosraviani, Sepehr Najjarpour

Abstract:

Considering the usage of graph-based approaches in systems and synthetic biology, and the various types of ‎the graphs employed by them, a comprehensive graph library based ‎on the three-tier architecture (3TA) was previously introduced for full representation of the biological systems. Although proposing a 3TA-based graph library, three following reasons motivated us to redesign the graph ‎library based on the service-oriented architecture (SOA): (1) Maintaining the accuracy of the data related to an input graph (including its edges, its ‎vertices, its topology, etc.) without involving the end user:‎ Since, in the case of using 3TA, the library files are available to the end users, they may ‎be utilized incorrectly, and consequently, the invalid graph data will be provided to the ‎computer algorithms. However, considering the usage of the SOA, the operation of the ‎graph registration is specified as a service by encapsulation of the library files. In other words, overall control operations needed for registration of the valid data will be the ‎responsibility of the services. (2) Partitioning of the library product into some different parts: Considering 3TA, a whole library product was provided in general. While here, the product ‎can be divided into smaller ones, such as an AND/OR graph drawing service, and each ‎one can be provided individually. As a result, the end user will be able to select any ‎parts of the library product, instead of all features, to add it to a project. (3) Reduction of the complexities: While using 3TA, several other libraries must be needed to add for connecting to the ‎database, responsibility of the provision of the needed library resources in the SOA-‎based graph library is entrusted with the services by themselves. Therefore, the end user ‎who wants to use the graph library is not involved with its complexity. In the end, in order to ‎make ‎the library easier to control in the system, and to restrict the end user from accessing the files, ‎it was preferred to use the service-oriented ‎architecture ‎‎(SOA) over the three-tier architecture (3TA) and to redevelop the previously proposed graph library based on it‎.

Keywords: Bio-Design Automation, Biological System, Graph Library, Service-Oriented Architecture, Systems and Synthetic Biology

Procedia PDF Downloads 286
1956 Normalized Laplacian Eigenvalues of Graphs

Authors: Shaowei Sun

Abstract:

Let G be a graph with vertex set V(G)={v_1,v_2,...,v_n} and edge set E(G). For any vertex v belong to V(G), let d_v denote the degree of v. The normalized Laplacian matrix of the graph G is the matrix where the non-diagonal (i,j)-th entry is -1/(d_id_j) when vertex i is adjacent to vertex j and 0 when they are not adjacent, and the diagonal (i,i)-th entry is the di. In this paper, we discuss some bounds on the largest and the second smallest normalized Laplacian eigenvalue of trees and graphs. As following, we found some new bounds on the second smallest normalized Laplacian eigenvalue of tree T in terms of graph parameters. Moreover, we use Sage to give some conjectures on the second largest and the third smallest normalized eigenvalues of graph.

Keywords: graph, normalized Laplacian eigenvalues, normalized Laplacian matrix, tree

Procedia PDF Downloads 306