Protein Graph Partitioning by Mutually Maximization of cycle-distributions

Frank Emmert Streib

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Protein Graph Partitioning by Mutually Maximization of cycle-distributions

Authors: Frank Emmert Streib

Abstract:

The classification of the protein structure is commonly not performed for the whole protein but for structural domains, i.e., compact functional units preserved during evolution. Hence, a first step to a protein structure classification is the separation of the protein into its domains. We approach the problem of protein domain identification by proposing a novel graph theoretical algorithm. We represent the protein structure as an undirected, unweighted and unlabeled graph which nodes correspond the secondary structure elements of the protein. This graph is call the protein graph. The domains are then identified as partitions of the graph corresponding to vertices sets obtained by the maximization of an objective function, which mutually maximizes the cycle distributions found in the partitions of the graph. Our algorithm does not utilize any other kind of information besides the cycle-distribution to find the partitions. If a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. As stop criterion, we calculate numerically a significance level which indicates the stability of the predicted partition against a random rewiring of the protein graph. Hence, our algorithm terminates automatically its iterative application. We present results for one and two domain proteins and compare our results with the manually assigned domains by the SCOP database and differences are discussed.

Keywords: Graph partitioning, unweighted graph, protein domains.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1077515

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1362

References:

[1] H. Kopka and P. W. Daly, A Guide to LATEX, 3rd ed. Harlow, England: Addison-Wesley, 1999.
[2] A. Andreeva, D. Howorth, S.E. Brenner, T.J. Hubbard, C. Chothia and A.G. Murzin, SCOP database in 2004: refinements integrate structure and sequence fmily data, Nucleic Acids Res., 32:D226-229, 2004.
[3] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer Jr, M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, M. Tasumi, The Protein Data Bank: A computer-based archival file for macromolecular structures. J. Mol. Biol., 112:535-542, 1977.
[4] R.F. Doolittle, The multiplicity of domains in proteins, Annu. Rev. Biochem., 64:287-314, 1995.
[5] J-t. Guo, D. Xu, D. Kim and Y.Xu, Improving the performance of DomainParser for structural domain partition using neural network, Nucl. Acids Res., 31(3)944-952, 2003.
[6] L. Holm and C. Sander, Dictionary of recurrent domains in protein structures, Proteins: Structure, Function and Genetics, 33:88-96, 1998.
[7] P.J. Kraulis, Molscript: A program to produce both detailed and schematic plots of protein sturctures, J. of Appl. Chrystallograph, 24:946-950, 1991.
[8] N.J. Mulder et al., InterPro, progress and status in 2005, Nucleic Acids Res., 33:D201-205, 2005.
[9] A.G. Murzin, S.E. Brenner, T. Hubbard and C. Chothia, SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247:536-540, 1995.
[10] D.C. Phillips, The dree-dimensional structure of an enzyme molecule, Sci. Am., 215:78-90, 1966.
[11] A.S. Siddiqui and G.J. Barton, Contious and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definietions, Protein Science, 4:872-884, 1995.
[12] G.D. Rose, Hierachic organization of domains in globular proteins, J. Mol. Biol., 134(3):447-470, 1979.
[13] M.G. Rossmann and A. Liljas, Recognition of structural domains in globular proteins, J. Mol. Biol., 85:177-181, 1974
[14] W.R. Taylor, Protein structural domain identification, Protein Eng., 12(3):203-216, 1999.
[15] S. Veretnik, P.E. Bourne, N.N. Alexandrov and I.N Shindyalov, Toward consistent assignment of structural domains in proteins, J. Mol. Biol., 339:647-678, 2004.
[16] D. Vitkup, E. Melamud, J. Moult and C. Sander, Completness in structural genomics, Nat. Struct. Biol., 8(6):559-566, 2001.
[17] D. Wetlaufer, Nucleation, rapid folding, and globular intrachain regions in proteins, Proc. Natl. Acad. Sci., 70:697-701, 1973
[18] Y. Xu, D. Xu and H.N. Gabow, Protein domain decomposition using a graph-theoretic apprach, Bioinformatics, 16(12):1091-1104, 2000.