Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32727
An Algebra for Protein Structure Data

Authors: Yanchao Wang, Rajshekhar Sunderraman


This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.

Keywords: Domain-Specific Data Management, Protein Algebra, Protein Ontology, Protein Structure Data.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1481


[1] J. Hammer and M. Schneider, "The GenAlg project: developing a new integrating data model, language, and tool for managing and querying genomic information," ACM SIGMOD, vol. 33, pp. 45-50.
[2] Y. Wang, R. Sunderraman, and P. Phoungphol, "A high level programming environment for protein structure data," 2007 International Symposium on Bioinformatics Research and Applications (ISBRA 2007), pp. 215-226.
[3] J. Hammer and M. Schneider, "Genomics Algebra: A new, integrating data model, language, and tool for processing and querying genomic information," First Biennial Conference on Innovative Data Systems Research, pp. 176-187.
[4] S. Tata, W. Lang, and J.M. Patel, "Periscope/SQ: interactive exploration of biological sequence databases," Proceedings of the 33rd international conference on Very large databases, VLDB ÔÇÿ07, 007, pp. 1406-1409.
[5] Y. Wang and R. Sunderraman, "PDB data curation," Engineering in Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the IEEE, 2006, pp. 4221 - 4224.
[6] Y. Wang and R. Sunderraman, "Database management system for protein structure data," Innovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering, pp.526-531, 2008.
[7] A.S. Sidhu, T.D. Dillon, and E. Chang, "Ontology algebra for composition of protein data sources," IEEE 2007, pp.144-140
[8] I. Mani, Z. Hu, and W. Hu, "PRONTO: a large-scale machine-induced protein ontology," 2nd Standards and Ontologies for Functional Genomics Conference (SOFG 2004), UK.