Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30172
Introducing Sequence-Order Constraint into Prediction of Protein Binding Sites with Automatically Extracted Templates

Authors: Yi-Zhong Weng, Chien-Kang Huang, Yu-Feng Huang, Chi-Yuan Yu, Darby Tien-Hao Chang


Search for a tertiary substructure that geometrically matches the 3D pattern of the binding site of a well-studied protein provides a solution to predict protein functions. In our previous work, a web server has been built to predict protein-ligand binding sites based on automatically extracted templates. However, a drawback of such templates is that the web server was prone to resulting in many false positive matches. In this study, we present a sequence-order constraint to reduce the false positive matches of using automatically extracted templates to predict protein-ligand binding sites. The binding site predictor comprises i) an automatically constructed template library and ii) a local structure alignment algorithm for querying the library. The sequence-order constraint is employed to identify the inconsistency between the local regions of the query protein and the templates. Experimental results reveal that the sequence-order constraint can largely reduce the false positive matches and is effective for template-based binding site prediction.

Keywords: Protein structure, binding site, functional prediction

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1125


[1] S. E. Brenner, "A tour of structural genomics," Nature Reviews Genetics, vol. 2, pp. 801-809, Oct 2001.
[2] J. D. Watson, R. A. Laskowski, and J. M. Thornton, "Predicting protein function from sequence and structural data," Current Opinion in Structural Biology, vol. 15, pp. 275-284, Jun 2005.
[3] D. T. H. Chang, C. Y. Chen, W. C. Chung, Y. J. Oyang, H. F. Juan, and H. C. Huang, "ProteMiner-SSM: a web server for efficient analysis of similar protein tertiary substructures," Nucleic Acids Research, vol. 32, pp. W76-W82, Jul 1 2004.
[4] A. Shulman-Peleg, R. Nussinov, and H. J. Wolfson, "Recognition of functional sites in protein structures," Journal of Molecular Biology, vol. 339, pp. 607-633, Jun 4 2004.
[5] F. Ferre, G. Ausiello, A. Zanzoni, and M. Helmer-Citterich, "Functional annotation by identification of local surface similarities: a novel tool for structural genomics," BMC Bioinformatics, vol. 6, p. 194, Aug 2 2005.
[6] J. W. Torrance, G. J. Bartlett, C. T. Porter, and J. M. Thornton, "Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families," Journal of Molecular Biology, vol. 347, pp. 565-581, Apr 1 2005.
[7] C. T. Porter, G. J. Bartlett, and J. M. Thornton, "The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data," Nucleic Acids Research, vol. 32, pp. D129-D133, Jan 1 2004.
[8] J. A. Barker and J. M. Thornton, "An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis," Bioinformatics, vol. 19, pp. 1644-1649, Sep 1 2003.
[9] D. T.-H. Chang, Y.-Z. Weng, J.-H. Lin, M.-J. Hwang, and Y.-J. Oyang, "Protemot: prediction of protein binding sites with automatically extracted geometrical templates," Nucleic Acids Research, vol. 34, pp. W303-W309, 2006.
[10] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, "The Protein Data Bank," Nucleic Acids Research, vol. 28, pp. 235-242, Jan 1 2000.
[11] B. P. Pandey, C. Zhang, X. Z. Yuan, J. Zi, and Y. Q. Zhou, "Protein flexibility prediction by an all-atom mean-field statistical theory," Protein Science, vol. 14, pp. 1772-1777, Jul 2005.
[12] I. Bahar, A. R. Atilgan, and B. Erman, "Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential," Folding & Design, vol. 2, pp. 173-181, 1997.
[13] R. A. Laskowski, V. V. Chistyakov, and J. M. Thornton, "PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids," Nucleic Acids Research, vol. 33, pp. D266-D268, Jan 12005.
[14] Y. J. Oyang, S. C. Hwang, Y. Y. Ou, C. Y. Chen, and Z. W. Chen, "Data classification with radial basis function networks based on a novel kernel density estimation algorithm," IEEE Transactions on Neural Networks,vol. 16, pp. 225-236, Jan 2005.
[15] Y.-J. Oyang, D. T.-H. Chang, Y.-Y. Ou, H.-G. Hung, C.-P. Wu, and C.-Y. Chen, "Supervised Machine Learning with a Novel Kernel Density Estimator," 2007, p. arXiv:stat.ML/0709.2760.
[16] H. J. Wolfson and I. Rigoutsos, "Geometric hashing: An overview," Ieee Computational Science & Engineering, vol. 4, pp. 10-21, Oct-Dec 1997.
[17] C. A. Orengo and W. R. Taylor, "SSAP: Sequential structure alignment program for protein structure comparison," Computer Methods for Macromolecular Sequence Analysis, vol. 266, pp. 617-635, 1996.
[18] X. Pennec and N. Ayache, "A geometric algorithm to find small but highly similar 3D substructures in proteins," Bioinformatics, vol. 14, pp. 516-522, 1998.
[19] N. S. Boutonnet, M. J. Rooman, M. E. Ochagavia, J. Richelle, and S. J. Wodak, "Optimal Protein-Structure Alignments by Multiple Linkage Clustering - Application to Distantly Related Proteins," Protein Engineering, vol. 8, pp. 647-662, Jul 1995.
[20] D. E. Krane and M. L. Raymer, Fundamental concepts of bioinformatics. San Francisco: Benjamin Cummings, 2002.
[21] S. F. Altschul, "Amino-Acid Substitution Matrices from an Information Theoretic Perspective," Journal of Molecular Biology, vol. 219, pp. 555-565, Jun 5 1991.
[22] Y. Zhang and J. Skolnick, "Scoring function for automated assessment of protein structure template quality," Proteins-Structure Function and Bioinformatics, vol. 57, pp. 702-710, Dec 1 2004.
[23] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, Second Edition: The MIT Press, 2001.
[24] A. Andreeva, D. Howorth, J. M. Chandonia, S. E. Brenner, T. J. P. Hubbard, C. Chothia, and A. G. Murzin, "Data growth and its impact on the SCOP database: new developments," Nucleic Acids Research, vol. 36, pp. D419-D425, Jan 2008.
[25] S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," J Mol Biol, vol. 48, pp. 443-453, 1970.