UTMGO: A Tool for Searching a Group of Semantically Related Gene Ontology Terms and Application to Annotation of Anonymous Protein Sequence
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
UTMGO: A Tool for Searching a Group of Semantically Related Gene Ontology Terms and Application to Annotation of Anonymous Protein Sequence

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias

Abstract:

Gene Ontology terms have been actively used to annotate various protein sets. SWISS-PROT, TrEMBL, and InterPro are protein databases that are annotated according to the Gene Ontology terms. However, direct implementation of the Gene Ontology terms for annotation of anonymous protein sequences is not easy, especially for species not commonly represented in biological databases. UTMGO is developed as a tool that allows the user to quickly and easily search for a group of semantically related Gene Ontology terms. The applicability of the UTMGO is demonstrated by applying it to annotation of anonymous protein sequence. The extended UTMGO uses the Gene Ontology terms together with protein sequences associated with the terms to perform the annotation task. GOPET, GOtcha, GoFigure, and JAFA are used to compare the performance of the extended UTMGO.

Keywords: Anonymous protein sequence, Gene Ontology, Protein sequence annotation, Protein sequence alignment

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1332528

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443

References:


[1] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, "Gene Ontology: tool for the unification of biology," Nat. Genet., vol. 25, no. 1, pp. 25-29, May 2000.
[2] G. Gao, Y. Zhong, A. Guo, Q. Zhu, W. Tang, W. Zheng, X. Gu, L. Wei, and J. Luo, "DRTF: a database of rice transcription factors," Bioinformatics, vol. 22, no. 10, pp. 1286-1287, May 2006.
[3] C. Winter, A. Henschel, W.K. Kim, and M. Schroeder, "SCOPPI: a structural classification of protein-protein interfaces," Nucleic Acids Res., vol. 34, no. 1, pp. D310-D314, Jan. 2006.
[4] A.K. Leung, L. Trinkle-Mulcahy, Y.W. Lam, J.S. Andersen, M. Mann, and A.I. Lamond, "NOPdb: Nucleolar Proteome Database," Nucleic Acids Res., vol. 34, no. 1, pp. D218-D220, Jan. 2006.
[5] N. Wiwatwattana and A. Kumar, "Organelle DB: a cross-species database of protein localization and function," Nucleic Acids Res., vol. 33, no. 1, pp. D598-D604, Jan. 2005.
[6] X. Wu, L. Zhu, J. Guo, D.Y. Zhang, and K. Lin, "Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations," Nucleic Acids Res., vol. 34, no. 7, pp. 2137-2150, Apr. 2006.
[7] R.M. Salasznyk, A.M. Westcott, R.F. Klees, D.F. Ward, Z. Xiang, S. Vandenberg, K. Bennett, and G.E. Plopper, "Comparing the protein expression profiles of human mesenchymal stem cells and human osteoblasts using gene ontologies," Stem Cells Dev., vol. 14, no. 4, pp. 354-366, Aug. 2005.
[8] S. Basu, E. Bremer, C. Zhou, and D.F. Bogenhagen, "MiGenes: a searchable interspecies database of mitochondrial proteins curated using Gene Ontology annotation," Bioinformatics, vol. 22, no. 4, pp. 485-492, Dec. 2005.
[9] P. Lu, D. Szafron, R. Greiner, D.S. Wishart, A. Fyshe, B. Pearcy, B. Poulin, R. Eisner, D. Ngo, and N. Lamb, "PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization," Nucleic Acids Res., vol. 33, no. 1, pp. D147-D153, Jan. 2005.
[10] E. Camon, M. Magrane, D. Barrell, V. Lee, E. Dimmer, J. Maslen, D. Binns, N. Harte, R. Lopez, and R. Apweiler, "The Gene Ontology Annotation (GOA) database: sharing knowledge in Uniprot with Gene Ontology," Nucleic Acids Res., vol. 32, no. 1, pp. D262-D266, Jan. 2004.
[11] E. Camon, M. Magrane, D. Barrell, D. Binns, W. Fleischmann, P. Kersey, N. Mulder, T. Oinn, J. Maslen, A. Cox, and R. Apweiler, "The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro," Genome Res., vol. 13, no. 4, pp. 662-672, Apr. 2003.
[12] A. Moustafa (2005, Apr.). JAligner: open source Java implementation of Smith-Waterman. Available: http://jaligner.sourceforge.net.
[13] K.-S. Goh, E.Y. Chang, and B. Li, "Using one-class and two-class SVMs for multiclass image annotation," IEEE Trans. Knowledge & Data Engineering, vol. 17, no. 10, pp. 1333-1346, Oct. 2005.
[14] M. Fernandes, M. Alho, J.A. Martins, J.S. Pinto, and P. Almeida, "Web annotation system based on web services," Int. J. Web Services Practices, vol. 1, no. 1-2, pp. 101-108, Aug. 2005.
[15] A. Vinayagam, C. del Val, F. Schubert, R. Eils, K. Glatting, S. Suhai, and R. König, "GOPET: a tool for automated predictions of Gene Ontology terms," BMC Bioinformatics, vol. 7, no. 1, rec. 161, Mar. 2006.
[16] D.M. Martin, M. Berriman, and G.J. Barton, "GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes," BMC Bioinformatics, vol. 5, no. 1, rec. 178, Nov. 2004.
[17] D. Groth, H. Lehrach, and S. Hennig, "GOblet: a platform for Gene Ontology annotation of anonymous sequence data," Nucleic Acids Res., vol. 32, no. 1, pp. W313-W317, Jul. 2004.
[18] S. Khan, G. Situ, K. Decker, and C.J. Schmidt, "GoFigure: automated Gene Ontology annotation," Bioinformatics, vol. 19, no. 18, pp. 2484- 2485, Dec. 2003.
[19] I. Friedberg, T. Harder, and A. Godzik, "JAFA: a protein function annotation meta server," Nucleic Acids Res., to be published.
[20] E. Quevillon, V. Silventoinen, S. Pillai, N. Harte, N. Mulder, R. Apweiler, and R. Lopez, "InterProScan: protein domains identifier," Nucleic Acids Res., vol. 33, no. 1, pp.W116-W120, Jul. 2005.
[21] F. Enault, K. Suhre, O. Poirot, C. Abergel, and J.M. Claverie, "Phydbac (phylogenomic display of bacterial genes): an interactive resource for the annotation of bacterial genomes," Nucleic Acids Res., vol. 31, no. 13, pp. 3720-3722, Jul. 2003.
[22] K. Verspoor, J. Cohn, C. Joslyn, S. Mniszewski, A. Rechtsteiner, L.M. Rocha, and T. Simas, "Protein annotation as term categorization in the Gene Ontology using word proximity networks," BMC Bioinformatics, vol. 6, suppl. 1, rec. S20, May 2005.
[23] H. Xie, A. Wasserman, Z. Levine, A. Novik, V. Grebinskiy, A. Shoshan, and L. Mintz, "Large-scale protein annotation through Gene Ontology," Genome Res., vol. 12, no. 5, pp. 785-794, May 2002.
[24] R. Gutman, C. Berezin, R. Wollman, Y. Rosenberg, and N. Ben-Tal, "QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patterns," Nucleic Acids Res., vol. 33, no. 1, pp. W255-261, Jul. 2005.
[25] M. Pagni, V. Ioannidis, L. Cerutti, M. Zahn-Zabal, C.V. Jongeneel, and L. Falquet, "MyHits: a new interactive resource for protein annotation and domain identification," Nucleic Acids Res., vol. 32, no. 1, pp. W332- W335, Jul. 2004.
[26] J.M. Johnson, K. Mason, C. Moallemi, H. Xi, S. Somaroo, and E.S. Huang, "Protein family annotation in a multiple alignment viewer," Bioinformatics, vol. 19, no. 4, pp. 544-545, Mar. 2003.
[27] D.H. Kitson, A. Badretdinov, Z.Y. Zhu, M. Velikanov, D.J. Edwards, K. Olszewski, S. Szalma, and L. Yan, "Functional annotation of proteomic sequences based on consensus of sequence and structural analysis," Brief Bioinform., vol. 3, no. 1, pp. 32-44, Mar. 2002.
[28] I. Letunic, L. Goodstadt, N.J. Dickens, T. Doerks, J. Schultz, R. Mott, F. Ciccarelli, R.R. Copley, C.P. Ponting, and P. Bork, "Recent improvements to the SMART domain-based sequence annotation resource," Nucleic Acids Res., vol. 30, no. 1, pp. 242-244, Jan. 2002.
[29] E.D. Levy, C.A. Ouzounis, W.R. Gilks, and B. Audit, "Probabilistic annotation of protein sequences based on functional classifications," BMC Bioinformatics, vol. 6, no. 1, rec. 302, Dec. 2005.
[30] W.G. Krebs and P.E. Bourne, "Statistically rigorous automated protein annotation," Bioinformatics, vol. 20, no. 7, pp. 1066-1073, May 2004.
[31] E. Kretschmann, W. Fleischmann, and R. Apweiler, "Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT," Bioinformatics, vol. 17, no. 10, pp. 920-926, Oct. 2001.
[32] R.M. Othman, S. Deris, R.M. Illias, H.T. Alashwal, R. Hassan, and F. Mohamed, "Incorporating semantic similarity measure in genetic algorithm: an approach for searching the Gene Ontology terms," Int. J. Computational Intelligence, vol. 3, no. 3, pp. 257-266, May 2006.
[33] R.M. Othman, S. Deris, R.M. Illias, Z. Zakaria, and S.M. Mohamad, "Automatic clustering of Gene Ontology by genetic algorithm," Int. J. Information Technology, vol. 3, no. 1, pp. 37-46, Apr. 2006.
[34] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, "A high-performance, portable implementation of the MPI message-passing interface standard," Parallel Computing, vol. 22, no. 6, pp. 789-828, Sep. 1996.
[35] P. Jaiswal, J. Ni, I. Yap, D. Ware, W. Spooner, K. Youens-Clark, L. Ren, C. Liang, B. Hurwitz, W. Zhao, K. Ratnapu, B. Faga, P. Canaran, M. Fogleman, C. Hebberd, S. Avraham, S. Schmidt, T. Casstevens, E.S. Buckler, L. Stein, and S. Mccouch, "Gramene: a genomics and genetics resource for rice," Rice Genetics Newsletter, vol. 22, no. 1, pp. 9-16, Jan. 2006.