Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32579
Q-Map: Clinical Concept Mining from Clinical Documents

Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala


Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 701


[1] Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl 1), D267-D270.
[2] Lindberg, D. A., & Humphreys, B. L. (1990, November). Concepts, Issues, and Standards. Current Status of the NLM’s Umls Project: The UMLS Knowledge Sources: Tools for Building Better User Interfaces. In Proceedings of the Annual Symposium on Computer Application in Medical Care (p. 121). American Medical Informatics Association.
[3] Schuyler, P. L., Hole, W. T., Tuttle, M. S., & Sherertz, D. D. (1993). The UMLS Metathesaurus: representing different views of biomedical concepts. Bulletin of the Medical Library Association, 81(2), 217.
[4] Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium (p. 17). American Medical Informatics Association.
[5] World Health Organization. (1992). The ICD-10 classification of mental and behavioural disorders: clinical descriptions and diagnostic guidelines (Vol. 1). World Health Organization.
[6] Lipscomb, C. E. (2000). Medical subject headings (MeSH). Bulletin of the Medical Library Association, 88(3), 265.
[7] Donnelly, K. (2006). SNOMED-CT: The advanced terminology and coding system for eHealth. Studies in health technology and informatics, 121, 279.
[8] McDonald, C. J., Huff, S. M., Suico, J. G., Hill, G., Leavelle, D., Aller, R., ... & Williams, W. (2003). LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clinical chemistry, 49(4), 624-633.
[9] PubMed - NCBI. (n.d.). Retrieved from
[10] PMC - NCBI. (n.d.). Retrieved from
[11] Liu, S., Ma, W., Moore, R., Ganesan, V., & Nelson, S. (2005). RxNorm: prescription for electronic drug information exchange. IT professional, 7(5), 17-23.
[12] Tuttle, M. S., Olson, N. E., Keck, K. D., Cole, W. G., Erlbaum, M. S., Sherertz, D. D., ... & Safran, C. (1998). Metaphrase: an aid to the clinical conceptualization and formalization of patient problems in healthcare enterprises. Methods of information in medicine, 37(04/05), 373-383.
[13] Evans, D. A.-W. (1991, April). Automatic indexing using selective NLP and first-order thesauri. RIAO (Vol. 91, pp. 624-643).
[14] Savova, G. K.-S. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507-513.
[15] Browne, A. C. (2000). The specialist lexicon. National Library of Medicine Technical Reports, 18-21.
[16] Aho, A. V. (1975). Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6), 333-340.
[17] Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics, 34(5), 301-310.