Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32128
Automatic Real-Patient Medical Data De-Identification for Research Purposes

Authors: Petr Vcelak, Jana Kleckova


Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1487


[1] CSU - Czech Statistical Office, "─îesk├¢ statistick├¢ ├║řad: ├Ümrtnostn├¡ tabulky (Death-rate Statistics)," Online, 2010-03-02. http: //, 2010.
[2] MZCR - Ministry of Health of the Czech Republic, "Ministerstvo zdravotnictv├¡ ─îeské Republiky: V─østn├¡k ─ì. 2/2010: Pé─ìe o pacienty s cerebrovaskul├írn├¡m onemocn─øn├¡m v ─îeské republice," Online, 2010-03-01. 9Bstn%C3%ADk_%20%C4%8D_02_2010.pdf, 2010.
[3] R. Dolin, L. Alschuler, S. Boyer, C. Beebe, F. Behlen, P. Biron, and A. Shabo Shvo, "HL7 clinical document architecture, release 2," Journal of the American Medical Informatics Association, vol. 13, no. 1, p. 30, 2006.
[4] Jules J. Bernman, "HHSWorkshop on the HIPAA Privacy Rule-s De-Identification Standard," HHS Workshop, March 8-9, 2010, Marriot at Metro Center, Washington, DC, March 8, 2010.
[5] Karlova univerzita v Praze - 2. lékařsk├í fakulta v Praze (Charles University in Prague - 2nd Faculty of Medicine), "Data Standard (DASTA)," Online, 2011-03-02., 2011.
[6] National Institute of Neurological Disorders and Stroke, "Digital Imaging and Communications in Medicine (DICOM)," Online, 2010-03-02., Virginia, 2010.
[7] V. Rohan, P. Sevcik, J. Polivka, Z. Ambler, B. Kreuzberg, and J. Ferda, "KlinickÛ pohled na vÛpočetní tomografii u akutní ischemie mozku (A clinical Approach to Computed Tomography in Acute Cerebral Ischemia)," Česká a slovenská neurologie a neurochirurgie, 2007.
[8] P. Vcelak, J. Kleckova, and V. Rohan, "Cerebrovascular diseases research based on heterogeneous medical data mining and knowledge base," in 2010 International Conference for Internet Technology and Secured Transactions (ICITST). London, United Kingdom: IEEE, Infonomics Society, 2010, pp. 345-350.
[9] Health Level Seven, Inc., "What is hl7?" Online, 2010-03-02. http: //, 2010.