Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30172
Classifying Biomedical Text Abstracts based on Hierarchical 'Concept' Structure

Authors: Rozilawati Binti Dollah, Masaki Aono


Classifying biomedical literature is a difficult and challenging task, especially when a large number of biomedical articles should be organized into a hierarchical structure. In this paper, we present an approach for classifying a collection of biomedical text abstracts downloaded from Medline database with the help of ontology alignment. To accomplish our goal, we construct two types of hierarchies, the OHSUMED disease hierarchy and the Medline abstract disease hierarchies from the OHSUMED dataset and the Medline abstracts, respectively. Then, we enrich the OHSUMED disease hierarchy before adapting it to ontology alignment process for finding probable concepts or categories. Subsequently, we compute the cosine similarity between the vector in probable concepts (in the “enriched" OHSUMED disease hierarchy) and the vector in Medline abstract disease hierarchies. Finally, we assign category to the new Medline abstracts based on the similarity score. The results obtained from the experiments show the performance of our proposed approach for hierarchical classification is slightly better than the performance of the multi-class flat classification.

Keywords: Biomedical literature, hierarchical text classification, ontology alignment, text mining.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1640


[1] F. M. Couto, B. Martins and M. J. Silva, "Classifying biological articles using web sources", In Proceedings of the 2004 ACM symposium on Applied Computing, 2004, pp. 111-115.
[2] A. Singh and K. Nakata, "Hierarchical classification of web search results using personalized ontologies", In Proceedings of the 3rd International Conference on Universal Access in Human-Computer Interaction, HCI International 2005, 2005.
[3] A. M. Cohen, "An effective general purpose approach for automated biomedical document classification", AMIA 2006 Symposium Proceeding, 2006, pp. 161-162.
[4] A. K. Pulijala and S. Gauch, "Hierarchical text classification", 2004, URL:
[5] S. Gauch, A. Chandramouli and S. Ranganathan, "Training a hierarchical classifier using inter-document relationships", Technical Report, ITTC-FY2007-TR-31020-01, August 2006.
[6] M. E. Ruiz and P. Srinivasan, "Hierarchical neural networks for text categorization", Information Retrieval, 5, 2002, pp. 87-118.
[7] T. Li, S. Zhu and M. Ogihara, "Hierarchical document classification using automatically generated hierarchy", Journal of Intelligent Information Systems, 29(2), 2007, pp. 211-230.
[8] G. Nenadic, S. Rice, I. Spasic, S. Ananiadou and B. Stapley, "Selecting text features for gene name classification: from documents to terms", In Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine, Vol. 13, 2003, pp. 121-128.
[9] Y. Wang and Z. Gong, "Hierarchical classification of web pages using support vector machine", In Proceedings of 11th International Conference on Asian Digital Libraries, ICADL 2008, Bali, Indonesia. Proceedings, Lecture Notes in Computer Science 5362, Springer, 2008, pp. 12-21.
[10] S. Dumais and H. Chen, "Hierarchical classification of web content", In Proceeding of the SIGIR2000, Athens, GR, 2000, pp. 256-263.
[11] OHSUMED dataset, URL: ohsumed.html.
[12] Medical Subject Heading (MeSH) tree structures, URL:
[13] M.H. Seddiqui and M. Aono, "An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size", Web Semantics: Science, Services and Agents on the World Wide Web, (7), 2009, pp. 344-356.
[14] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines", 2007, URL: