Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31106
Predicting Protein-Protein Interactions from Protein Sequences Using Phylogenetic Profiles

Authors: Omer Nebil Yaveroglu, Tolga Can


In this study, a high accuracy protein-protein interaction prediction method is developed. The importance of the proposed method is that it only uses sequence information of proteins while predicting interaction. The method extracts phylogenetic profiles of proteins by using their sequence information. Combining the phylogenetic profiles of two proteins by checking existence of homologs in different species and fitting this combined profile into a statistical model, it is possible to make predictions about the interaction status of two proteins. For this purpose, we apply a collection of pattern recognition techniques on the dataset of combined phylogenetic profiles of protein pairs. Support Vector Machines, Feature Extraction using ReliefF, Naive Bayes Classification, K-Nearest Neighborhood Classification, Decision Trees, and Random Forest Classification are the methods we applied for finding the classification method that best predicts the interaction status of protein pairs. Random Forest Classification outperformed all other methods with a prediction accuracy of 76.93%

Keywords: Decision trees, SVM, Protein Interaction Prediction, Phylogenetic Profile, ReliefF, Random Forest Classification

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1245


[1] O¨ mer N. Yaverog˘lu, Tolga Can, "Prediction of proteinprotein interactions using statistical data analysis methods", 4th International Symposium on Health Informatics and Bioinformatics , 2009
[2] Joel R. Bock, David A. Gough, "Predicting protein-protein interactions from primary structure" Bioinformatics, vol. 17, no. 5, 2001.
[3] Lukasz Salwinski, David Eisenbergy, "Computational methods of analysis of protein-protein interactions" Current opinion in structural biology, 13:377-382, 2003.
[4] Alfonso Valencia, Florencio Pazos, "Computational methods for the prediction of protein interactions", Current opinion in structural biology, 12:368-373, 2002
[5] Chih-Chung Chang, Chih-Jen Lin, "LIBSVM: a Library for Support Vector Machines", 2003.
[6] Yanzhi Guo, Lezheng Yu, Zhining Wen, Menglong Li, "Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences" Nucleic Acids Research, vol. 36, no. 9, 2008.
[7] Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, "A Practical Guide to Support Vector Classification", 2008.
[8] Lindsay I. Smith, "A tutorial on Principal Components Analysis", 2002.
[9] I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S. M. Kim, D. Eisenberg, "DIP:the database of interacting proteins. A research tool for studying cellular networks of protein interactions." Nucleic Acids Research, vol. 30, pages: 303-305, 2002.
[10] Marko Robnik-ˆSikonja, Igor Kononenko "Theoretical and Empirical Analysis of ReliefF and RReliefF" Machine Learning, vol. 53, pages: 2369, 2003.
[11] Yiran Li, "Feature Extraction with RELIEF and Its Kernelization"
[12] Paul Helman, Robert Veroff, Susan R. Atlas and Cheryl Willman "A Bayesian Network Classification Methodology for Gene Expression Data" Journal of Computational Biology 11(4): 581-615. doi:10.1089/cmb.2004.11.581, 2004.
[13] Tin Kam Ho "Random Decision Forests " Proc. of the 3rd Int-l Conf. on Document Analysis and recognition, Montreal, Canada, 1995.
[14] Leo Breiman and Adele Cutler "Random Forests"
[15] Ian H. Witten and Eibe Frank "Data Mining: Practical machine learning tools and techniques", 2nd Edition, Morgan Kaufmann, San Francisco, 2005.