Hany Mahgoub
Mining Association Rules from Unstructured Documents
2708 - 2713
2008
2
8
International Journal of Computer and Information Engineering
https://publications.waset.org/pdf/3514
https://publications.waset.org/vol/20
World Academy of Science, Engineering and Technology
This paper presents a system for discovering
association rules from collections of unstructured documents called
EART (Extract Association Rules from Text). The EART system
treats texts only not images or figures. EART discovers association
rules amongst keywords labeling the collection of textual documents.
The main characteristic of EART is that the system integrates XML
technology (to transform unstructured documents into structured
documents) with Information Retrieval scheme (TFIDF) and Data
Mining technique for association rules extraction. EART depends on
word feature to extract association rules. It consists of four phases
structure phase, index phase, text mining phase and visualization
phase. Our work depends on the analysis of the keywords in the
extracted association rules through the cooccurrence of the keywords
in one sentence in the original text and the existing of the keywords
in one sentence without cooccurrence. Experiments applied on a
collection of scientific documents selected from MEDLINE that are
related to the outbreak of H5N1 avian influenza virus.
Open Science Index 20, 2008