Detecting Remote Protein Evolutionary Relationships via String Scoring Method
Authors: Nazar Zaki, Safaai Deris
Abstract:
The amount of the information being churned out by the field of biology has jumped manifold and now requires the extensive use of computer techniques for the management of this information. The predominance of biological information such as protein sequence similarity in the biological information sea is key information for detecting protein evolutionary relationship. Protein sequence similarity typically implies homology, which in turn may imply structural and functional similarities. In this work, we propose, a learning method for detecting remote protein homology. The proposed method uses a transformation that converts protein sequence into fixed-dimensional representative feature vectors. Each feature vector records the sensitivity of a protein sequence to a set of amino acids substrings generated from the protein sequences of interest. These features are then used in conjunction with support vector machines for the detection of the protein remote homology. The proposed method is tested and evaluated on two different benchmark protein datasets and it-s able to deliver improvements over most of the existing homology detection methods.
Keywords: Protein homology detection; support vectormachine; string kernel.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1056014
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1395References:
[1] T. Smith, and M. Waterman, "Identification of common molecular subsequence", J. Mol. Biol, 147, pp.195, 1981.
[2] W. R. Pearson, "Rapid and sensitive sequence comparisons with FASTAP and FASTA Method", Enzymol, 183, pp. 63, 1985.
[3] S. F. Altschul, W. Gish, W. Miller, E. Myer and J. Lipman "Basic local alignment search tool", J. Mol. Biol., 215, pp. 403, 1990.
[4] M. Gribskov, R. L├╝thy and D. Eisenberg, "Profile analysis. Method", Enzymol., 183, pp. 146, 1990.
[5] P. Baldi, Y. Chauvin, T. Hunkapiller and M. A. McClure, "Hidden Markov models of biological primary sequence information", Proc. Nati. Acad. Sci., 91: pp. 1059, 1994.
[6] A. Krogh, M. Brown, I. S. Mian, K. Sjölander D. Haussler, "Hidden Markov models in computational biology: Applications to protein modeling", J. Mol. Biol., 235, pp. 1501, 1994.
[7] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, "Gapped Blast and Psi-Blast: a new generation of protein database search programs", Nuc. Acid. Res., 25: pp. 3389, 1997.
[8] K. Karplus, C. Barrett and R. Hughey, "Hidden Markov models for detecting remote protein homologies", Bioinformatics, 14, pp. 846, 1998.
[9] T. Jaakkola, M. Diekhans and D. Haussler "A discriminative framework for detecting remote protein homologies", J. Comp. Biol., 7, pp. 95, 2000.
[10] V. N. Vapnik, "Statistical Learning Theory", John Wiley & Sons, Inc., 1998.
[11] N. Cristianini, and J. Shawe-Taylor, "An introduction to Support Vector Machines", Cambridge, UK: Cambridge University Press. 2000.
[12] N. M. Zaki, S. Deris, and R. M. Illias, "Feature Extraction for Protein Homologies Detection Using Markov Models Combining Scores", Int. J. on Comp. Intelligence and Appl., 1, pp. 1, 2004.
[13] C. Leslie, E. Eskin, J. Weston and W. Noble, "Mismatch String Kernels for Discriminative Protein Classification", Bioinformatics, 20, pp. 67, 2004.
[14] N. M. Zaki, S. Deris, and R. M. Illias, "Application of string kernels in protein sequence classification", App. Bioinformatics, 1, pp. 45, 2005.
[15] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, "Text Classification using String Kernels", J. Machine Learning Res., 2, pp. 419, 2002.
[16] L. Liao, and W. S. Noble, "Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships", J. Comp. Biol., 10, pp. 857, 2003.
[17] Zaki, N. M. and Deris, S. (2005). "Representing Protein Sequence with Low Number of Dimensions". Journal of Biological Sciences, 5(6): 795- 800.
[18] A. G. Murzin, S. E. Brenner T. Hubbard C. Chothia, "SCOP: a structural classification of proteins database for the investigation of sequences and structures", J. Molec. Biol., 247, pp. 536, 1995.
[19] S. E. Brenner, P. Koehl and M. Levitt, "The ASTRAL compendium for sequence and structure analysis", Nucl. Acids Res., 28, pp. 254, 2000.
[20] S. R. Eddy, "Multiple alignment using hidden Markov models," In Proc. of the 3rd ISMB, pp. 114, 1995.
[21] Swets, "Measuring the accuracy of diagnostic systems". Science, 270: 1285-1293. 1988.