Commenced in January 2007
Paper Count: 31105
Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms
Abstract:Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1330599Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1829
 B.B. Finlay, and S. Falkow, Common themes in microbial pathogenicity revisited. Microbiology and Molecular Biology Reviews 61 (1997) 136-&.
 H.J. Wu, A.H.J. Wang, and M.P. Jennings, Discovery of virulence factors of pathogenic bacteria. Current Opinion in Chemical Biology 12 (2008) 93-101.
 R.A. Weiss, Virulence and pathogenesis. Trends in Microbiology 10 (2002) 314-317.
 L.H. Chen, J. Yang, J. Yu, Z.J. Ya, L.L. Sun, Y. Shen, and Q. Jin, VFDB: a reference database for bacterial virulence factors. Nucleic Acids Research 33 (2005) D325-D328.
 J. Yang, L.H. Chen, L.L. Sun, J. Yu, and Q. Jin, VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics. Nucleic Acids Research 36 (2008) D539-D542.
 A. Bairoch, and R. Apweiler, The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28 (2000) 45-48.
 A. Garg, and D. Gupta, VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. Bmc Bioinformatics 9 (2008) -.
 G. Sachdeva, K. Kumar, P. Jain, and S. Ramachandran, SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks. Bioinformatics 21 (2005) 483-491.
 S.F. Altschul, T.L. Madden, A.A. Schaffer, J.H. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 (1997) 3389-3402.
 M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G. Sherlock, and G.O. Consortium, Gene Ontology: tool for the unification of biology. Nature Genetics 25 (2000) 25-29.
 A. Lewin, and I.C. Grieve, Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data. Bmc Bioinformatics 7 (2006) -.
 S. Carroll, and V. Pavlovic, Protein classification using probabilistic chain graphs and the Gene Ontology structure. Bioinformatics 22 (2006) 1871-1878.
 Z.D. Lei, and Y. Dai, Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction. Bmc Bioinformatics 7 (2006) -.
 Z.L. Qian, Y.D. Cai, and Y.X. Li, A novel computational method to predict transcription factor DNA binding preference. Biochemical and Biophysical Research Communications 348 (2006) 1034-1037.
 W.L. Huang, C.W. Tung, S.W. Ho, S.F. Hwang, and S.Y. Ho, ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization. Bmc Bioinformatics 9 (2008) -.
 D. Barrell, E. Dimmer, R.P. Huntley, D. Binns, C. O'Donovan, and R. Apweiler, The GOA database in 2009-an integrated Gene Ontology Annotation resource. Nucleic Acids Research 37 (2009) D396-D403.
 K. Chan, and W. Lam, Gene ontology classification of biomedical literatures using context association. Information Retrieval Technology, Proceedings 3689 (2005) 552-557.
 D.W. Park, H.S. Heo, H.C. Kwon, and H.Y. Chung, Protein function classification based on gene ontology. Information Retrieval Technology, Proceedings 3689 (2005) 691-696.
 S. Altschul, T. Madden, A. Schaffer, J.H. Zhang, Z. Zhang, W. Miller, and D. Lipman, Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Faseb Journal 12 (1998) A1326-A1326.
 I.H. Witten, and E. Frank, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, San Francisco, 2005.
 C. Chang, and C. Lin, LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.. 2001.
 S. M, Cross-validatory choice and assessment of statistical predictions. Jounral of the Royal Statistical Society 36 (1974) 111-147.