Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31532
A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir


Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1273


[1] N. Shadbolt, T. Berners-Lee, and W. Hall, “The semantic web revisited,” IEEE intelligent systems, vol. 21, no. 3, pp. 96–101, 2006.
[2] A. H. Asiaee, T. Minning, P. Doshi, and R. L. Tarleton, “A framework for ontology-based question answering with application to parasite immunology,” Journal of biomedical semantics, vol. 6, no. 1, p. 1, 2015.
[3] G. Santipantakis, K. I. Kotis, and G. A. Vouros, “Ontology-based data integration for event recognition in the maritime domain,” in Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics. ACM, 2015, p. 6.
[4] C. Jonquet, E. Dzal´e-Yeumo, E. Arnaud, and P. Larmande, “Agroportal: a proposition for ontology-based services in the agronomic domain,” in IN-OVIVE’15: 3`eme atelier INt´egration de sources/masses de donn´ees h´et´erog`enes et Ontologies, dans le domaine des sciences du VIVant et de l’Environnement, 2015.
[5] M. Iannacone, S. Bohn, G. Nakamura, J. Gerth, K. Huffer, R. Bridges, E. Ferragut, and J. Goodall, “Developing an ontology for cyber security knowledge graphs,” in Proceedings of the 10th Annual Cyber and Information Security Research Conference. ACM, 2015, p. 12.
[6] H. Wache, T. Voegele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. H¨ubner, “Ontology-based integration of information-a survey of existing approaches,” in IJCAI-01 workshop: ontologies and information sharing, vol. 2001. Citeseer, 2001, pp. 108–117.
[7] D. Dou, H. Wang, and H. Liu, “Semantic data mining: A survey of ontology-based approaches,” in Semantic Computing (ICSC), 2015 IEEE International Conference on. IEEE, 2015, pp. 244–251.
[8] T. R. Gruber, “A translation approach to portable ontology specifications,” Knowledge Acquisition, 5(2), pp. 199–220, 1993.
[9] B. L. Jonathan and S. Y. Rhee, “Ontologies in biology: Design application and future challenges.” 2004.
[10] F. Manola and E. Miller, “Rdf primer,” World Wide Web Consortium, 2004.
[11] B. Grau, I. Horrocks, and B. M. et al, “Owl 2: the next step for owl,” Web Semant, vol. 6, pp. 309–322, 2008.
[12] A. Seaborne and E. Prud’hommeaux, “Sparql query language for rdf,” W3C Recommendation (W3C, 2008), 2008.
[13] C. Bizer, “Evolving the web into a global data space.” in BNCOD, vol. 7051, 2011, p. 1.
[14] I. Horrocks, “Obo flat file format syntax and semantics and mapping to owl web ontology language,” University of Manchester, 2007.
[15] J. Blake and C. Bult, “Beyond the data deluge: Data integration and bio-ontologies.” Journal of Biomedical Informatics, pp. 314–320, 2006.
[16] R. Hoehndorf, P. Schofield, and G. Gkoutos, “The role of ontologies in biological and biomedical research: a functional perspective,” Brief. Bioinform, 16 (6), pp. 1069–1080, 2015.
[17] M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry, A. Davis, K.Dolinski, S. Dwight, J. Eppig, M. Harris, D. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. Matese, J. Richardson, M. Ringwald, G. Rubin, and G. Sherlock, “Gene ontology: tool for the unification of biology,” The Gene Ontology Consortium, vol. Nat. Genet. 25, pp. 25–29, 2000.
[18] S. Orchard, “Molecular interaction databases,” Proteomics, vol. 12, pp. 1656–1662, 2012.
[19] K. Degtyarenko, P. de Matos, M. Ennis, J. Hastings, M. Zbinden, A. McNaught, R. Alcantara, M. Darsow, M. Guedj, and M. Ashburner, “Chebi: a database and ontology forchemical entities of biological interest,” Nucleic Acids Res, pp. D344–D350, 2008.
[20] L. Montecchi-Palazzi, R. Beavis, P. Binz, R. Chalkley, J. Cottrell, D. Creasy, J. Shofstahl, S. Seymour, and J. Garavelli, “The psi-mod community standard for representation of protein modification data,” Nat. Biotechnol., vol. 26, pp. 864–866, 2008.
[21] R. Brinkman, M. Courtot, D. Derom, J. Fostel, Y. He, P. Lord, J. Malone, H. Parkinson, B. Peters, P. Rocca-Serra, A. Ruttenberg, S.-A. Sansone, L. Soldatova, C. S. Jr., J. Turner, and J. Zheng, “O.b.i. consortium,modeling biomedical experimental processes with obi,” Biomed. Semant., vol. (Suppl. 1), 2010.
[22] M. Gremse, A. Chang, I. Schomburg, A. Grote, M. Scheer, C. Ebeling, and D. Schomburg, “The brenda tissue ontology (bto): the first all-integrating ontology of all organisms for enzyme sources,” Nucleic Acids Res, pp. D507–D513, 2011.
[23] D. Natale, C. Arighi,W. Barker, J. Blake, C. Bult, M. Caudy, H. Drabkin, P. D’Eustachio, A. Evsikov, H. Huang, J. Nchoutmboube, N. Roberts, B. Smith, J. Zhang, and C. Wu, “The protein ontology: a structured representation of protein forms and complexes,” Nucleic Acids Res, vol. 39, pp. D539–D545, 2011.
[24] G. Gkoutos, P. Schofield, and R. Hoehndorf, “The units ontology: a tool for integrating units of measurement in science,” Database (Oxford), vol. 6, pp. D539–D545, 2012.
[25] E. Younesi, S. Ansaril, M. Guendel, S. Ahmadi, C. Coggins, J. Hoeng, M. Hofmann-Apitius, and M. C. Peitsch, “Cseo - the cigarette smoke exposure ontology,” Journal of Biomedical Semantics, 2014.
[26] E. Friederike, L. Rieswijk, C. Evelo, H. Sarimveis, P. Doganis, G. Drakakis, B. Fadeel, B. Hardy, J. Hastings, C. Helma, N. Jeliazkova, V. Jeliazkov, P. Kohonen, R. Grafstrom, P. Sopasakisa, G. Tsiliki, and E. Willighagen, “Ontology, database and tools for nanomaterial safety evaluation,” Journal of Biomedical Semantics, 2015.
[27] E. Gu´erin, G. Marquet, A. Burgun, O. Lor´eal, L. Berti-Equille, U. Leser, and F. Moussouni, “Integrating and warehousing liver gene expression data and related biomedical resources in gedaw,” in International Workshop on Data Integration in the Life Sciences. Springer, 2005, pp. 158–174.
[28] K. M. Livingston, M. Bada, W. A. Baumgartner, and L. E. Hunter, “Kabob: ontology-based semantic integration of biomedical databases,” BMC bioinformatics, vol. 16, no. 1, p. 1, 2015.
[29] M. Masseroli, A. Canakoglu, and S. Ceri, “Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 13, no. 2, pp. 209–219, 2016.
[30] M. Dumontier, C. J. Baker, J. Baran, A. Callahan, and L. C. et al., “The semanticscience integrated ontology (sio) for biomedical research and knowledge discovery,” Biomed Semantics, vol. vol. 5, p. p. 14, 2014.
[31] J. Zheng, Z. Xiang, C. J. Stoeckert, and Y. Hel, “Ontodog: a web-based ontology community view generation tool,” Bioinformatics, vol. vol. 30, pp. pp. 1340–1342, 2014.
[32] D. Ostrowski, N. Rychtyckyj, P. MacNeille, and M. Kim, “Integration of big data using semantic web technologies,” in 2016 IEEE Tenth International Conference on Semantic Computing (ICSC). IEEE, 2016, pp. 382–385.
[33] B.-H. Tran, C. Plumejeaud-Perreau, A. Bouju, and V. Bretagnolle, “A semantic mediator for handling heterogeneity of spatio-temporal environment data,” in Research Conference on Metadata and Semantics Research. Springer, 2015, pp. 381–392.
[34] O. Cur´e, F. Kerdjoudj, D. Faye, C. Le Duc, and M. Lamolle, “On the potential integration of an ontology-based data access approach in nosql stores,” International Journal of Distributed Systems and Technologies (IJDST), vol. 4, no. 3, pp. 17–30, 2013.
[35] O. Cur´e, R. Hecht, C. Le Duc, and M. Lamolle, “Data integration over nosql stores using access path based mappings,” in International Conference on Database and Expert Systems Applications. Springer, 2011, pp. 481–495.
[36] L. H. Childs, S. Mamlouk, J. Brandt, C. Sers, and U. Leser, “Sofia: a data integration framework for annotating high-throughput datasets,” Bioinformatics, p. btw302, 2016.
[37] J. Huang, K. Eilbeck, J. A. Blake, D. Dou, D. A. Natale, A. Ruttenberg, B. Smith, M. T. Zimmermann, G. Jiang, Y. Lin et al., “A domain ontology for the non-coding rna field,” in Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on. IEEE, 2015, pp. 621–624.
[38] C. Jonquet, M. A.Musen, and N. H. Shah, “Building a biomedical ontology recommender web service,” Biomed Semantics, pp. 1–18, 2010.
[39] J. Malone, R. Stevens, S. Jupp, T. Hancocks, H. Parkinson, and C. Brooksbank, “Ten simple rules for selecting a bio-ontology,” PLOS Comput Biol, vol. vol. 30, pp. 12(2), e1 004 743, 2016.
[40] E. Gu´erin, F. Moussouni, B. Courselaud, and O. Lor´eal, “Mod´elisation d’un entrepˆot de donnes d´edi´e `a l’analyse du transcriptome h´epatique,” Actes des Journ´ees Ouvertes Biologie Informatique Math´ematiques (JOBIM), vol. vol. 30, pp. pp 319–324, 2008.
[41] W. Bensz, D. Borys, K. Fujarewicz, K. Herok, R. Jaksik, M. Krasucki, A. Kurczyk, K. Matusik, D. Mrozek, M. Ochab et al., “Integrated system supporting research on environment related cancers,” in Recent Developments in Intelligent Information and Database Systems. Springer, 2016, pp. 399–409.
[42] C. Goble and R. Stevens, “State of the nation in data integration for bioinformatics,” Journal of biomedical informatics, vol. 41, no. 5, pp. 687–693, 2008.
[43] A. Kasprzyk, “Biomart: driving a paradigm change in biological data management,” Database, vol. 2011, p. bar049, 2011.
[44] S. Trißl, K. Rother, H. M¨uller, T. Steinke, I. Koch, R. Preissner, C. Fr¨ommel, and U. Leser, “Columba: an integrated database of proteins, structures, and annotations,” BMC bioinformatics, vol. 6, no. 1, p. 1, 2005.
[45] C. M. Machado, D. Rebholz-Schuhmann, A. T. Freitas, and F. M. Couto, “The semantic web in translational medicine: current applications and future directions,” Briefings in bioinformatics, vol. 16, no. 1, pp. 89–103, 2015.
[46] S. Bechhofer, I. Buchan, D. De Roure, P. Missier, J. Ainsworth, J. Bhagat, P. Couch, D. Cruickshank, M. Delderfield, I. Dunlop et al., “Why linked data is not enough for scientists,” Future Generation Computer Systems, vol. 29, no. 2, pp. 599–611, 2013.
[47] T. J¨org and S. Deßloch, “Towards generating etl processes for incremental loading,” in Proceedings of the 2008 international symposium on Database engineering & applications. ACM, 2008, pp. 101–110.
[48] T. J. Lee, Y. Pouliot, V. Wagner, P. Gupta, D. W. Stringer-Calvert, J. D. Tenenbaum, and P. D. Karp, “Biowarehouse: a bioinformatics database warehouse toolkit,” BMC bioinformatics, vol. 7, no. 1, p. 1, 2006.
[49] W. McLaren, B. Pritchard, D. Rios, Y. Chen, P. Flicek, and F. Cunningham, “Deriving the consequences of genomic variants with the ensembl api and snp effect predictor,” Bioinformatics, vol. 26, no. 16, pp. 2069–2070, 2010.
[50] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah, Y. Zhang, D. Blankenberg, I. Albert, J. Taylor et al., “Galaxy: a platform for interactive large-scale genome analysis,” Genome research, vol. 15, no. 10, pp. 1451–1455, 2005.
[51] K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers, S. Owen, S. Soiland-Reyes, I. Dunlop, A. Nenadic, P. Fisher et al., “The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud,” Nucleic acids research, p. gkt328, 2013.