Selecting Negative Examples for Protein-Protein Interaction

Mohammad Shoyaib; M. Abdullah-Al-Wadud; Oksam Chae

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32807

Selecting Negative Examples for Protein-Protein Interaction

Authors: Mohammad Shoyaib, M. Abdullah-Al-Wadud, Oksam Chae

Abstract:

Proteomics is one of the largest areas of research for bioinformatics and medical science. An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. Predicting Protein-Protein Interaction (PPI) is one of the crucial and decisive problems in current research. Genomic data offer a great opportunity and at the same time a lot of challenges for the identification of these interactions. Many methods have already been proposed in this regard. In case of in-silico identification, most of the methods require both positive and negative examples of protein interaction and the perfection of these examples are very much crucial for the final prediction accuracy. Positive examples are relatively easy to obtain from well known databases. But the generation of negative examples is not a trivial task. Current PPI identification methods generate negative examples based on some assumptions, which are likely to affect their prediction accuracy. Hence, if more reliable negative examples are used, the PPI prediction methods may achieve even more accuracy. Focusing on this issue, a graph based negative example generation method is proposed, which is simple and more accurate than the existing approaches. An interaction graph of the protein sequences is created. The basic assumption is that the longer the shortest path between two protein-sequences in the interaction graph, the less is the possibility of their interaction. A well established PPI detection algorithm is employed with our negative examples and in most cases it increases the accuracy more than 10% in comparison with the negative pair selection method in that paper.

Keywords: Interaction graph, Negative training data, Protein-Protein interaction, Support vector machine.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1077453

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658

References:

[1] Juwen Shen, Jian Zhang, Xiaomin Luo, Weiliang Zhu, Kunqian Yu, Kaixian Chen, Yixue Li, and Hualiang Jiang, "Predicting protein-protein interactions based only on sequences information", PNAS, vol. 104, no. 11,pp. 4337-4341, 2007.
[2] Shawn Martin, Diana Roe and Jean-Loup Faulon, "Predicting protein-protein interactions using signature products", Bioinformatics, Vol. 21 no. 2 2005, pp. 218-226
[3] Jin Wang, Chunhe Li, Erkang Wang and Xidi Wang, "Uncovering the rules for protein-protein interactions from yeast genomic data", PNAS, 2009, vol. 106, no. 10 , pp. 3752-3757.
[4] Xue-wen Chen and Mei Liu, "Prediction of Protein-Protein Interactions Using Random Decision Forest Framework", Bioinformatics, 21(24), pp. 4394-4400, 2005.
[5] Nazar Zaki, Safaai Deris and Hany Alashwal, "Protein-Protein Interaction Detection Based on Substring Sensitivity Measure", International Journal of Biological and Medical Sciences, 1:2 2006
[6] Joel R. Bock and David A. Gough," Predicting protein-protein interactions from primary structure", Vol. 17 no. 5 2001 pp. 455-460
[7] Xiao-Li Li, Soon-Heng Tan, See-Kiong Ng, "Improving domain-based protein interaction prediction using biologically-significant negative dataset", International Journal of Data Mining and Bioinformatics, Vol-1, No.2 pp. 138 - 149, 2006.
[8] Daniel R Rhodes,Scott A Tomlins, Sooryanarayana Varambally, Vasudeva Mahavisno, Terrence Barrette, Shanker Kalyana Sundaram, Debashis Ghosh, Akhilesh Pandey and Arul M Chinnaiyan, "Probabilistic model of the human protein-protein interaction network", Nature Biotechnology 23, 2005, pp. 951 - 959
[9] R. Jansen, H. Yu, D. Greenbaum, Y. Kluger, N.J. Krogan, S. Chung, A. Emili, M. Snyder, J.F. Greenblatt and M. Gerstein, "A Bayesian networks approach for predicting protein-protein interactions from genomic data", Science, 302: (5644), pp. 449-453, 2003.
[10] Lu LJ, Xia Y, Paccanaro A, Yu H and Gerstein M, "Assessing the limits of genomic data integration for predicting protein networks", Genome Res 2005, 15(7) pp. 945-953.
[11] Kumar,A., Agarwal,S., Heyman,John A., Matson S., Heidtman M., Piccirillo S., Umansky L., Drawid A., Jansen R., Liu, Y., Kei- Cheung H., Miller P., Gerstein M., Roeder G. S., and Snyder M., "Subcellular localization of the yeast proteome", Genes Dev., 16, 2002, pp. 707-719.
[12] E. Coward, "Shufflet: shuffling sequences while conserving the k-let counts", Bioinformatics, 15, pp. 1058-1059.
[13] D. Kandel, Y. Mathias, R. Unger and P. Winkler, "Shuffling biological sequences", Discrete Appl. Math., 71, pp. 171-185, 1996.
[14] M. Deng, F. Sun, S. Metha and T. Chen, "Inferring domain-domain interactions from protein-protein interactions", Genome Research, Vol. 12, pp.1540-1548, 2002.
[15] S.K. Ng, Z. Zhang, and S.H. Tan, "Integrative approach for computationally inferring protein domain interactions", Bioinformatics, Vol. 19, pp.923-929, 2003.
[16] Wan, K.K. and Jong, P., "Large scale statistical prediction of protein-protein interaction by potentially interacting domain (pid) pair",, Genome Informatics, Vol. 13, 2002, pp.45-50.
[17] Fiona Browne, Haiying Wang, Huiru Zheng and Francisco Azuaje, "GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction", Source Code for Biology and Medicine 2009, 4:2
[18] P. Pagel, S. Kovac, M. Oesterheld, B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, P. Mark, V. St├╝mpflen, H.W. Mewes, A. Ruepp and D. Frishman, "The MIPS mammalian protein-protein interaction database", Bioinformatics, 21, pp. 832-834,2005.
[19] L . Salwinski, C.S. Miller, A.J. Smith, F.K. Pettit, J.U. Bowie and D. Eisenberg, "The Database of Interacting Proteins: 2004 update", Nucleic Acids Res, 32 Database issue:D449-51, 2004.
[20] Bader, G.D., Betel, D. and Hogue, C.W., "BIND: the Biomolecular Interaction Network Database", Nucleic Acids Res. 31, 2003, pp. 248-250.
[21] Mishra, G.R. et al., "Human protein reference database; 2006 update", Nucleic Acids Res. 34, D411-D414, Network Database. Nucleic Acids Res. 31, 2003, 248-250.
[22] A. Chatr-aryamontri et al. "MINT: the Molecular INTeraction database", Nucleic Acids Res. 35, D572-D574, 2007.
[23] T. Reguly et al., "Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae", J. Biol., 5, 11, 2006.
[24] C. von Mering et al., "Comparative assessment of large-scale data sets of protein-protein interactions", Nature, 417, 2002, pp. 399-403.
[25] A. M. Deane, L. Salwinski, I. Xenarios, D. Eisenberg, Mol. Cell. Proteomics 1, 349, 2002.
[26] A.M. Edwards, B. Kus, R. Jansen, D. Greenbaum, J. Greenblatt and M. Gerstein, "Bridging structural biology and genomics: assessing protein interaction data with known complexes", Trends Genet 18, pp. 529-536, 2002.
[27] Jingkai Yu and Farshad Fotouhi, "Computational Approaches for Predicting Protein-Protein Interactions: A Survey", J Med Sys 30(1), 2006, pp. 39-44.
[28] Michael E Cusick, Haiyuan Yu, Alex Smolyar, Kavitha Venkatesan,Anne-Ruxandra Carvunis, Nicolas Simonis, Jean-Fran├ºois Rual, Heather Borick,Pascal Braun, Matija Dreze, Jean Vandenhaute, Mary Galli, Junshi Yazaki,David E Hill1, Joseph R Ecker, Frederick P Roth and Marc Vidal, "Literature-curated protein interaction datasets", Nature Methods, VOL.6 NO.1, JANUARY 2009.
[29] Jansen, R. and Gerstein, M., "Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction", Curr. Opin. Microbiol. 7, 2004, pp. 535-545.
[30] P. Braun et al., "An experimentally derived confidence score for binary protein-protein interactions", Nat. Methods 6, pp. 91-97, 2008.
[31] Ben-Hur A and Noble S, "Choosing negative examples for the prediction of protein-protein interactions", BMC Bioinformatics, 2006, 7:S2.
[32] S.M. Gomez, W.S. Noble and A. Rzhetsky, "Learning to predict proteinprotein interactions", Bioinformatics, 19:1875-1881, 2003.
[33] Ben-Hur A and Noble WS, "Kernel methods for predicting proteinprotein interactions", Bioinformatics, 2005, 21(suppl 1):i38-i46.
[34] Zhang LV, Wong S, King O and Roth F, "Predicting co-complexed protein pairs using genomic and proteomic data integration", BMC Bioinformatics, 2004, 5:38-53.
[35] Qi Y, Klein-Seetharaman J and Bar-Joseph Z, "Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources", Proceedings of the Pacific Symposium on Biocomputing 2005.
[36] Han, D., Kim, H., Jang, W. and Lee, S., "Domain combination based protein-protein interaction possibility ranking method",, IEEE Fourth Symposium on Bioinformatics and Bioengineering, 2004, pp.434-441.
[37] Han, D., Kim, H., Seo, J. and Jang, W. , "Domain combination based probabilistic framework for protein-protein interaction predication", Genome Informatics, Vol. 14, 2003, pp.250-259.
[38] Iakes Ezkurdia, Lisa Bartoli, Piero Fariselli, Rita Casadio, Alfonso Valencia and Michael L. Tress, "Progress and challenges in predicting protein-protein interaction sites", Briefings In Bioinformatics. vol 10. no 3., Advance Access publication April 3, 2009
[39] Stanley Letovsky and Simon Kasif, "Predicting protein function from protein/protein interaction data: a probabilistic approach", Bioinformatics, Vol. 19 Suppl. 1, pp. i197-i204, 2003.