Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30172
Ontology-based Concept Weighting for Text Documents

Authors: Hmway Hmway Tar, Thi Thi Soe Nyaunt

Abstract:

Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.

Keywords: Clustering, Concept Weight, Document clustering, Feature Selection, Ontology

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1328782

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2041

References:


[1] W3C Semantic Web Activity Statement: W3C's Technology and Society domain(2001). www.w3.org/2001/sw/Activity
[2] Smith, B.: Ontology. In: Blackwell Guide to the Philosophy of Computing and Information, pp. 155-166. Oxford Blackwell, Malden (2003).
[3] Berners-Lee, T., Weaving the Web, Harper, San Francisco, 1999
[4] Decker, S., Melnik, S., Van Harmelen, F., Fensel, D., Klein, M., Broekstra, J., Erdmann, M. and Horrocks, I. (2000) ÔÇÿThe semantic web: the roles of XML and RDF-, IEEE Internet Computing, Vol.4, No. 5, pp.63-74.
[5] Ding, Y., and Foo, S., (2002). Ontology Research and Development: Part 1 - A Review of Ontology Generation. Journal of Information Science 28 (2).
[6] Prof.K.Raja(2010) Clustering Technique with Feature Selection for Text Documents.
[7] A. Hotho and S. Staab "Ontology based Text clustering.
[8] Andreas Hotho,"Ontologies improve Text Document Clustering".
[9] Lei Zhang , Zhichao Wang "Ontology-based clustering algorithm with feature weights",2010Journal of Computational Information Systems 6:9 (2010) 2959-2966.
[10] A. Maedche and V. Zacharias, "Clustering Ontology-based Metadata in the Semantic Web." In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02), Helsinki, Finland, pp. 342-360, 2002
[11] Travis D. Breaux "Using Ontology in Hierarchical Information Clustering", Proceedings of the 38th Hawaii International Conference on System Sciences - 2005
[12] L. Jing, M. K. Ng, J. Xu and Z. Huang, Subspace clustering of text documents with feature weighting k- means algorithm, Proc. of PAKDD, pp. 802-812, 2005.
[13] W. Fan, L. Wallace, S. Rich, and Z. Zhang, "Tapping into the power of text mining," the Communications of ACM, 2005.
[14] Jain, A.K, Murty, M.N., and Flynn P.J. 1999. Data clustering: a review. ACM Computing Surveys, pp. 31, 3, 264-323.
[15] M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of document clustering techniques. KDD Workshop on Text Mining-00
[16] P. Berkhin. 2004. Survey of clustering data mining techniques
[Online]. Available: http://www.accrue.com/products/rp_cluster_review.pdf.