Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30172
Dimensional Modeling of HIV Data Using Open Source

Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer

Abstract:

Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.

Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1071932

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1561

References:


[1] Chen, P. (1976). The Entity Relationship model-Towards a unified view of data, ACM Transactions on Database Systems, 1, 1, 9-36.
[2] Chilton, M.A. (2006). Data Modeling Education: The changing technology, Journal of Information Systems Educaion, 17,1, 17-20.
[3] Coar, K. (2006). The Open source Definition , Retrieved on 18th Nov 2008 from opensource.org: http://www.opensource.org/docs/osd
[4] Dash, A.K and Agarwal, R. (2001). Dimensional modeling for Data warehouse, ACM SIGSOFT software engineering notes, 26, 1, 83-84.
[5] Golfarelli, M., Maio, D. and Rizzi, S. (1998). Conceptual Design of Data warehouses from E-R schemes, Proceedings of the Hawaii International Conference On System Sciences, January 6-9, Hawaii
[6] Gui, Y., Tang, S., Tong, Y. and Yang,D. (2006). Tripple Driven Data Modeling Methodology in Data warehousing: A case study, ACM workshop on Data warehousing and OLAP, 59-66
[7] Ilczuk, G. and Wakulicz-Deja, A. (2007). Selection of Important attributes for Medical Diagnosis Systems. Transactions on Rough Sets , 7,1, 70-84.
[8] Jones, M. E. and Song, I.Y. (2008). Dimensional modeling: Identification, classification and evaluation of patterns. Decision Support Systems , 59-76.
[9] Kleijen, J. P. (1995). Verification and validation of simulation models. European Journal of Operations Research , 82,1, 145-162.
[10] Kortinik, M. A. and Moody, D. L. (2003). From ER Models to Dimensional Models: Bridging the Gap between OLTP and OLAP Design. Business Intelligence Journal , 8,3, 1-17.
[11] Laender H. F., Freitas, G.M., and Campos, M.L. (2002). MD2- Getting Users Involved in the Development of Data Warehouse Applications. 4th International Conference Workshop Design and Management of Data warehouses. May 27, Toronto, University of British Columbia, 3- 12.
[12] Lambert, B. (1995). Break Old Habits To Define Data Warehousing Requirements. Data Management Review .
[13] Malinowski, E. and Zimanyi, E. (2007). A conceptual model for temporal data warehouses and its transformation to the the ER and object-relational model. Data and Knowledge Engineering ,64, 101-133.
[14] Martyn, T. (2004). Reconsidering Multi-Dimensional Schemas. ACMs Special Interest Group On Management of Data , 33,1, 83-88.
[15] Nguyen, T. M., Tjoa, A. M., and Trujillo, J. (2005). Data Warehousing and Knowledge Discovery: A Chronological View of Research Challenges. Springer , 530-535.
[16] Pearson, W. (2008, 1 24). Dimensional Model components: Dimensions part 1. Retrieved 11 19, 2008, from Database Journal: http://www.databasejournal.com/features/mssql/article.php/3723311/Di mensional-Model-Components--Dimensions-Part-I.htm
[17] Phipps, C. and Davis, K.C. (2003). Automating Data warehouse conceptual Schema Design and Evaluation. Proceedings of the 4th international conference on Design and Management of Data warehouses. May 27, Toronto Canada, 23-32
[18] Pokorny, J. (2003). Modeling stars using XML.
[19] Riadh, B. M., Omar, B., & Sabine, R. (2004). A new OLAP Aggregation Based on the AHC Technique. DOLAP (pp. 65-71). Washington,DC: ACM.
[20] UNAIDS. (2008). 2008 Report on the Global AIDS epidemic. Geneva: WHO Library Cataloguing-in-Publication Data.