Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32297
Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures

Authors: Do Phuc, Nguyen Thi Kim Phung


In this paper, we represent protein structure by using graph. A protein structure database will become a graph database. Each graph is represented by a spectral vector. We use Jacobi rotation algorithm to calculate the eigenvalues of the normalized Laplacian representation of adjacency matrix of graph. To measure the similarity between two graphs, we calculate the Euclidean distance between two graph spectral vectors. To cluster the graphs, we use M-tree with the Euclidean distance to cluster spectral vectors. Besides, M-tree can be used for graph searching in graph database. Our proposal method was tested with graph database of 100 graphs representing 100 protein structures downloaded from Protein Data Bank (PDB) and we compare the result with the SCOP hierarchical structure.

Keywords: Eigenvalues, m-tree, graph database, protein structure, spectra graph theory.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502


[1] Brew C, Schulte im Walde (2002). Spectral Clustering for German Verbs, In Proc of the Conf in Natural Language Processing, PA, USA, pp 117-124
[2] Chris Godsil (2006). Graph Spectra and Graph Isomorphism. Aveiro Workshop on Graph Spectra, University of Aveiro, Mathematics Department, April 2006.
[3] David McWherter, William C. Regli (2001). An Approach to Indexing Database of Solid Models. Technical Report DU-MCS-01-02.
[4] D. Shasha, J. T. L. Wang, and R. Giugno (2002). Algorithmics and Applications of Tree and Graph Searching. In Proc. PODS'02 Proceeding of the International Conference in Pattern recognition (ICPR), Quebec, Canada, August 2002.
[5] Do Phuc, Hoang Kiem (2006) . Using M-tree for similarity search in biological sequence databases, Journal of Bio-Technology, VAST, ISBN 1811-4899, pp151-158
[6] Huan, J., Wang, W., Washington, A., Prins, J., Shah, R., Tropsha, A. (2004). Accurate classification of proteins tructural families based on coherent subgraph analysis. In Proc. Pacific Symposium on Biocomputing 9:411-422.
[7] Huahai He, Ambuj K. Singh (2006). Closure-Tree: An Index Structure for Graph Queries, ICDE.
[8] Jun Huan, Deepak Bandyopadhyay, Wei Wang, Jack Snoeyink, Jan Prins, Alexander Tropsha (2005). Comparing Graph Representations of Protein Structure for Mining Family-Specific Residue-Based Packing Motifs. Journal of Computational Biology. July 2005, 12(6): 657-671.
[9] James Cheng, Yiping Ke, Wilfred Ng, An Lu (2007). FG-Index: Towards Verification-Free Query Processing on Graph Databases, SIGMOD, 2007.
[10] Mitchell Peabody (2002). Finding Groups of Graphs in Databases. Master thesis in Computer Science, Drexel University, August, 2002.
[11] Murzin, A.G, Brenner, S., Hubbard, T. & Chothia., C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of MolecularBiology, 247, 536-40.
[12] Norman Biggs. Algebraic Graph Theory. Cambridge University Press, 2nd edition, 1993.
[13] Paolo C., Marco P., Pavel Z (1997): M-tree: An efficient Access Method for Similarity Search In Metric Spaces, VLDB 1997, pp:426-435.
[14] Patella M (1998). Similarity search in multimedia database, PhD dissertation, University of Bolgona, Italia.
[15] Peixiang Zhao, Jeffrey Xu Yu, Philip S. Yu (2007). Graph Indexing: Tree + Delta >= Tree, VLDB, 2007
[16] Steffen Lang (2007). Protein domain decomposition using spectral graph partitioning.
[17] Saraswathi Vishveshwara et al (2002). Protein structure insights from graph theory, Journal of Theoretical and Computational Chemistry, Vol 1, No 1.
[18] X. Yan, J. Han (2002). gSpan: Graph-based substructure pattern mining, ICDM, 2002
[19] Xiafeng Yan, Philip S. Yu, Jiawei Han (2004). Graph indexing: A Frequent Structure-based Approach, SIGMOD 2004
[20] Yun Chi, Yirong Yang, Richard Muntz, Mining Frequent Rooted Trees and Free Trees Using Canonical Forms, UCLA Technical Report, 2003