GeNS: a Biological Data Integration Platform

Joel Arrais; João E. Pereira; João Fernandes; José Luís Oliveira

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33132

GeNS: a Biological Data Integration Platform

Authors: Joel Arrais, João E. Pereira, João Fernandes, José Luís Oliveira

Abstract:

The scientific achievements coming from molecular biology depend greatly on the capability of computational applications to analyze the laboratorial results. A comprehensive analysis of an experiment requires typically the simultaneous study of the obtained dataset with data that is available in several distinct public databases. Nevertheless, developing a centralized access to these distributed databases rises up a set of challenges such as: what is the best integration strategy, how to solve nomenclature clashes, how to solve database overlapping data and how to deal with huge datasets. In this paper we present GeNS, a system that uses a simple and yet innovative approach to address several biological data integration issues. Compared with existing systems, the main advantages of GeNS are related to its maintenance simplicity and to its coverage and scalability, in terms of number of supported databases and data types. To support our claims we present the current use of GeNS in two concrete applications. GeNS currently contains more than 140 million of biological relations and it can be publicly downloaded or remotely access through SOAP web services.

Keywords: Data integration, biological databases

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1332218

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1637

References:

[1] W. Zhong and P. W. Sternberg, "Automated data integration for developmental biological research," Development, vol. 134, pp. 3227-38, Sep 2007.
[2] Z. Lacroix, "Biological data integration: wrapping data and tools," IEEE Trans Inf Technol Biomed, vol. 6, pp. 123-8, Jun 2002.
[3] B. Louie, P. Mork, F. Martin-Sanchez, A. Halevy, and P. Tarczy-Hornoch, "Data integration and genomic medicine," J Biomed Inform, vol. 40, pp. 5-16, Feb 2007.
[4] L. D. Stein, "Integrating biological databases," Nat Rev Genet, vol. 4, pp. 337-45, May 2003.
[5] T. Topaloglou, A. Kosky, and V. Markowitz, "Seamless integration of biological applications within a database framework," Proc Int Conf Intell Syst Mol Biol, pp. 272-81, 1999.
[6] L. Wong, "Technologies for integrating biological data," Brief Bioinform, vol. 3, pp. 389-404, Dec 2002.
[7] M. Y. Galperin, "The Molecular Biology Database Collection: 2008 update," Nucleic Acids Res, Nov 19 2007.
[8] F. Al-Shahrour, L. Arbiza, H. Dopazo, J. Huerta-Cepas, P. Minguez, D. Montaner, and J. Dopazo, "From genes to functional classes in the study of biological systems," BMC Bioinformatics, vol. 8, p. 114, 2007.
[9] Z. Fang, J. Yang, Y. Li, Q. Luo, and L. Liu, "Knowledge guided analysis of microarray data," J Biomed Inform, vol. 39, pp. 401-11, Aug 2006.
[10] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler, "GenBank," Nucleic Acids Res, vol. 35, pp. D21-5, Jan 2007.
[11] M. Kanehisa, M. Araki, S. Goto, M. Hattori, M. Hirakawa, M. Itoh, T. Katayama, S. Kawashima, S. Okuda, T. Tokimatsu, and Y. Yamanishi, "KEGG for linking genomes to life and the environment," Nucleic Acids Res, Dec 12 2007.
[12] H. Parkinson, M. Kapushesky, M. Shojatalab, N. Abeygunawardena, R. Coulson, A. Farne, E. Holloway, N. Kolesnykov, P. Lilja, M. Lukk, R. Mani, T. Rayner, A. Sharma, E. William, U. Sarkans, and A. Brazma, "Array Express--a public database of microarray experiments and gene expression profiles," Nucleic Acids Res, vol. 35, pp. D747-50, Jan 2007.
[13] V. Detours, J. E. Dumont, H. Bersini, and C. Maenhaut, "Integration and cross-validation of high-throughput gene expression data: comparing heterogeneous data sets," FEBS Lett, vol. 546, pp. 98-102, Jul 3 2003.
[14] F. Achard, G. Vaysseix, and E. Barillot, "XML, bioinformatics and data integration," Bioinformatics, vol. 17, pp. 115-25, Feb 2001.
[15] T. J. Lee, Y. Pouliot, V. Wagner, P. Gupta, D. W. Stringer-Calvert, J. D. Tenenbaum, and P. D. Karp, "BioWarehouse: a bioinformatics database warehouse toolkit," BMC Bioinformatics, vol. 7, p. 170, 2006.
[16] J. Kuntzer, C. Backes, T. Blum, A. Gerasch, M. Kaufmann, O. Kohlbacher, and H. P. Lenhof, "BNDB - the Biochemical Network Database," BMC Bioinformatics, vol. 8, p. 367, 2007.
[17] A. Birkland and G. Yona, "BIOZON: a hub of heterogeneous biological data," Nucleic Acids Res, vol. 34, pp. D235-42, Jan 1 2006.
[18] E. Cadag, B. Louie, P. J. Myler, and P. Tarczy-Hornoch, "Biomediator data integration and inference for functional annotation of anonymous sequences," Pac Symp Biocomput, pp. 343-54, 2007.
[19] J. Kohler, S. Philippi, and M. Lange, "SEMEDA: ontology based semantic integration of biological databases," Bioinformatics, vol. 19, pp. 2420-7, Dec 12 2003.
[20] D. Maglott, J. Ostell, K. D. Pruitt, and T. Tatusova, "Entrez Gene: gene-centered information at NCBI," Nucleic Acids Res, vol. 35, pp. D26-31, Jan 2007.
[21] J. L. Oliveira, G. Dias, I. Oliveira, P. Rocha, I. Hermosilla, J. Vicente, I. Spiteri, F. Martin-S├ínchez, and A. S. Pereira, "DiseaseCard: A Web- Based Tool for the Collaborative Integration of Genetic and Medical Information," in Biological And Medical Data Analysis: 5th International Symposium, Springer, Ed., 2004, pp. 409-417.
[22] S. B. Davidson, C. Overton, and P. Buneman, "Challenges in integrating biological data sources," Journal of Computational Biology, vol. 2, pp. 557-572, 1995.
[23] P. Flicek, B. L. Aken, K. Beal, B. Ballester, M. Caccamo, Y. Chen, L. Clarke, G. Coates, F. Cunningham, T. Cutts, T. Down, S. C. Dyer, T. Eyre, S. Fitzgerald, J. Fernandez-Banet, S. Graf, S. Haider, M. Hammond, R. Holland, K. L. Howe, K. Howe, N. Johnson, A. Jenkinson, A. Kahari, D. Keefe, F. Kokocinski, E. Kulesha, D. Lawson, I. Longden, K. Megy, P. Meidl, B. Overduin, A. Parker, B. Pritchard, A. Prlic, S. Rice, D. Rios, M. Schuster, I. Sealy, G. Slater, D. Smedley, G. Spudich, S. Trevanion, A. J. Vilella, J. Vogel, S. White, M. Wood, E. Birney, T. Cox, V. Curwen, R. Durbin, X. M. Fernandez-Suarez, J. Herrero, T. J. Hubbard, A. Kasprzyk, G. Proctor, J. Smith, A. Ureta- Vidal, and S. Searle, "Ensembl 2008," Nucleic Acids Res, vol. 36, pp. D707-14, Jan 2008.
[24] C. H. Wu, R. Apweiler, A. Bairoch, D. A. Natale, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M. J. Martin, R. Mazumder, C. O'Donovan, N. Redaschi, and B. Suzek, "The Universal Protein Resource (UniProt): an expanding universe of protein information," Nucleic Acids Res, vol. 34, pp. D187-91, Jan 1 2006.
[25] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock, "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium," Nat Genet, vol. 25, pp. 25-9, May 2000.
[26] J. Arrais, J. G. L. M. Rodrigues, and J. L. Oliveira, "Improving Literature Searches in Gene Expression Studies," in Advances in Intelligent and Soft Computing : 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics, J. M. Corchado, J. F. De Paz, M. P. Rocha, and F. Fernandez Riverola, Eds. Berlin, DE: Springer Berlin / Heidelberg, 2009, pp. Capt. 10, p. 74- 82.
[27] J. Arrais, B. Santos, J. Fernandes, L. Carreto, M. A. S. Santos, and J. L. Oliveira, "GeneBrowser: an approach for integration and functional classification of genomic data," in Journal of Integrative Bioinformatics. vol. 4, 2007.