One-Class Support Vector Machines for Protein-Protein Interactions Prediction
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32799
One-Class Support Vector Machines for Protein-Protein Interactions Prediction

Authors: Hany Alashwal, Safaai Deris, Razib M. Othman

Abstract:

Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.

Keywords: Bioinformatics, Protein-protein interactions, One-Class Support Vector Machines

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1081842

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1937

References:


[1] T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y. Sakaki, "Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins," Proc. Natl. Acad. Sci. USA. 97: 1143-1147, 2000.
[2] P. Uetz, L. Giot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, et al., "A Comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae," Nature 403:623 627, 2000.
[3] J. R. Newman, E. Wolf, and P. S. Kim, "A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae," Proc. Natl. Acad. Sci. U. S. A. 97, 13203-13208, 2000.
[4] H. Lodish, A. Berk, L. Zipursky, P. Matsudaira, D. Baltimore, and J. Darnell, Molecular cell biology (4th edition). W.H. Freeman, New York, 2000.
[5] B. Alberts, A. Johnson, J. Lewis, M. Raff, K.Roberts, and P. Walter, Molecular Biology of the Cell (4th edition). Garland Science, 2002.
[6] P. Uetz and C. S. Vollert, "Protein-Protein Interactions," Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine (ERGPMM), Springer Verlag, 2005.
[7] J. R. Bock and D. A. Gough, "Predicting protein-protein interactions from primary structure," Bioinformatics, vol. 17(5), pp: 455-460, 2001.
[8] Y. Chung, G. Kim, Y. Hwang, and H. Park, "Predicting Protein-Protein Interactions from One Feature Using SVM," In proceedings of IEA/AIE-04, pp:50-55, 2004.
[9] S. Dohkan, A. Koike and T. Takagi, "Prediction of protein-protein interactions using Support Vector Machines," In Proceedings of the Fourth IEEE Symposium on BioInformatics and BioEngineering (BIBE2004), Taitung, Taiwan, 576-584, 2004
[10] I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S. M. Kim, and D. Eisenberg, "DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions," Nucleic Acids Research, vol. 30(1), pp: 303- 305, 2002.
[11] C. M. Deane, L. Salwinski, I. Xenarios, and D. Eisenberg, "Protein interactions: two methods for assessment of the reliability of high throughput observations," Molecular & Cellular Proteomics, vol. 1(5), pp: 349-56, 2002.
[12] E. M. Phizicky and S. Fields, "Protein-protein interactions: Method for detection and analysis," Microbiological Reviews, pp.94-123, 1995.
[13] A. Valencia, F. Pazos, "Computational methods for the prediction of protein interactions," Curr. Opin. Struct. Biol. 12, pp: 368-373, 2002.
[14] M. Pellegrini, E.M. Marcotte, M.J. Thompson, D. Eisenberg, T.O. Yeates, "Assigning protein functions by comparative genome analysis: protein phylogenetic profiles," Proc. Natl. Acad. Sci. USA 96, pp: 4285- 4288, 1999.
[15] T. Gaasterland, M.A. Ragan, "Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes," Microb. Comp. Genomics 3 pp:199-217, 1998.
[16] J. Tamames, G. Casari, C. Ouzounis, A.Valencia, "Conserved clusters of functionally related genes in two bacterial genomes," J. Mol. Evol. 44 pp: 66-73, 1997.
[17] A.J. Enright, I. Iliopoulos, N.C. Kyrpides, C.A. Ouzounis, "Protein interaction maps for complete genomes based on gene fusion events," Nature 402 pp:86-90, 1999.
[18] T. Pawson and P. Nash, "Assembly of cell regulatory systems through protein interaction domains," Science, vol. 300, pp: 445-452, 2003.
[19] W. K. Kim, J. Park, and J. K. Suh, "Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair," Genome Informatics, vol. 13, pp: 42-50, 2002.
[20] D. S. Han, H. S. Kim, W. H. Jang, S. D. Lee, "PreSPI: A Domain Combination Based Prediction System for Protein-Protein Interaction," Nucleic Acids Research, vol. 32, no. 21, pp: 6312-6320, 2004.
[21] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer. 1995.
[22] B. Schölkopf and A. Smola, Learning with kernelsÔÇösupport vector machines, regularization, optimization and beyond, Cambridge, MA: MIT Press, 2002.
[23] K. R. M├╝ller, S. Mika, G. Ratsch, K. Tsuda, and B. Schölkopf, "An introduction to kernel-based learning algorithms," IEEE Transactions on Neural Networks, 12(2), 181-201, 2001.
[24] S. Dumais, J. Platt, D. Heckerman, and M. Sahami, "Inductive learning algorithms and representations for text categorization," In Proceedings of ACM-CIKM98, Washington, DC (pp. 148-155). 1998.
[25] E. Osuna, R. Freund, and F. Girosi, "Training support vector machines: An application to face detection," In 1997 Conference on computer vision and pattern recognition (pp. 130-136). Puerto Rico: IEEE. 1997.
[26] D. Roobaert and V. M. Hulle, "View-based 3d-object recognition with support vector machines," In 1999 IEEE workshop on neural networks for signal processing (pp. 77-84). Madison, WI: IEEE. 1999.
[27] J. R. Bock and D. A. Gough, "Predicting protein-protein interactions from primary structure," Bioinformatics, vol. 17(5), pp: 455-460, 2001.
[28] H. J. Shin , D. H. Eom, S. S. Kim, "One-class support vector machines: an application in machine fault detection and classification," Computers and Industrial Engineering, vol. 48 n. 2, pp:395-408, 2005.
[29] S. K. Ng, Z. Zhang, S. H. Tan, and K. Lin, "InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes," Nucleic Acids Research, vol. 31, pp: 251- 254, 2003.
[30] A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, G. Jones, S. Khanna, A. Marshall, S.E. Moxon, L.L. Sonnhammer, D.J. Studholme, C. Yeats, and S.R. Eddy, "The Pfam Protein Families Database," Nucleic Acids Research Database Issue. 32:D138-D141, 2004.
[31] E. L. Hong, R. Balakrishnan, K.R. Christie, M.C. Costanzo, S.S. Dwight, S.R. Engel, D.G. Fisk, et al., "Saccharomyces Genome Database" http://www.yeastgenome.org/, (25th Dec 2005).
[32] N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, D. Barrell, A. Bateman, D. Binns, et al., "The InterPro Database brings increased coverage and new features," Nucleic Acids Research, vol. 31, pp: 315- 318, 2003.
[33] C. C. Chang and C. J. Lin, "LIBSVM : a library for support vector machines," 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.