Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33087
Protein Graph Partitioning by Mutually Maximization of cycle-distributions
Authors: Frank Emmert Streib
Abstract:
The classification of the protein structure is commonly not performed for the whole protein but for structural domains, i.e., compact functional units preserved during evolution. Hence, a first step to a protein structure classification is the separation of the protein into its domains. We approach the problem of protein domain identification by proposing a novel graph theoretical algorithm. We represent the protein structure as an undirected, unweighted and unlabeled graph which nodes correspond the secondary structure elements of the protein. This graph is call the protein graph. The domains are then identified as partitions of the graph corresponding to vertices sets obtained by the maximization of an objective function, which mutually maximizes the cycle distributions found in the partitions of the graph. Our algorithm does not utilize any other kind of information besides the cycle-distribution to find the partitions. If a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. As stop criterion, we calculate numerically a significance level which indicates the stability of the predicted partition against a random rewiring of the protein graph. Hence, our algorithm terminates automatically its iterative application. We present results for one and two domain proteins and compare our results with the manually assigned domains by the SCOP database and differences are discussed.Keywords: Graph partitioning, unweighted graph, protein domains.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1077515
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1354References:
[1] H. Kopka and P. W. Daly, A Guide to LATEX, 3rd ed. Harlow, England: Addison-Wesley, 1999.
[2] A. Andreeva, D. Howorth, S.E. Brenner, T.J. Hubbard, C. Chothia and A.G. Murzin, SCOP database in 2004: refinements integrate structure and sequence fmily data, Nucleic Acids Res., 32:D226-229, 2004.
[3] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer Jr, M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, M. Tasumi, The Protein Data Bank: A computer-based archival file for macromolecular structures. J. Mol. Biol., 112:535-542, 1977.
[4] R.F. Doolittle, The multiplicity of domains in proteins, Annu. Rev. Biochem., 64:287-314, 1995.
[5] J-t. Guo, D. Xu, D. Kim and Y.Xu, Improving the performance of DomainParser for structural domain partition using neural network, Nucl. Acids Res., 31(3)944-952, 2003.
[6] L. Holm and C. Sander, Dictionary of recurrent domains in protein structures, Proteins: Structure, Function and Genetics, 33:88-96, 1998.
[7] P.J. Kraulis, Molscript: A program to produce both detailed and schematic plots of protein sturctures, J. of Appl. Chrystallograph, 24:946-950, 1991.
[8] N.J. Mulder et al., InterPro, progress and status in 2005, Nucleic Acids Res., 33:D201-205, 2005.
[9] A.G. Murzin, S.E. Brenner, T. Hubbard and C. Chothia, SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247:536-540, 1995.
[10] D.C. Phillips, The dree-dimensional structure of an enzyme molecule, Sci. Am., 215:78-90, 1966.
[11] A.S. Siddiqui and G.J. Barton, Contious and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definietions, Protein Science, 4:872-884, 1995.
[12] G.D. Rose, Hierachic organization of domains in globular proteins, J. Mol. Biol., 134(3):447-470, 1979.
[13] M.G. Rossmann and A. Liljas, Recognition of structural domains in globular proteins, J. Mol. Biol., 85:177-181, 1974
[14] W.R. Taylor, Protein structural domain identification, Protein Eng., 12(3):203-216, 1999.
[15] S. Veretnik, P.E. Bourne, N.N. Alexandrov and I.N Shindyalov, Toward consistent assignment of structural domains in proteins, J. Mol. Biol., 339:647-678, 2004.
[16] D. Vitkup, E. Melamud, J. Moult and C. Sander, Completness in structural genomics, Nat. Struct. Biol., 8(6):559-566, 2001.
[17] D. Wetlaufer, Nucleation, rapid folding, and globular intrachain regions in proteins, Proc. Natl. Acad. Sci., 70:697-701, 1973
[18] Y. Xu, D. Xu and H.N. Gabow, Protein domain decomposition using a graph-theoretic apprach, Bioinformatics, 16(12):1091-1104, 2000.