TOSOM: A Topic-Oriented Self-Organizing Map for Text Organization
Authors: Hsin-Chang Yang, Chung-Hong Lee, Kuo-Lung Ke
Abstract:
The self-organizing map (SOM) model is a well-known neural network model with wide spread of applications. The main characteristics of SOM are two-fold, namely dimension reduction and topology preservation. Using SOM, a high-dimensional data space will be mapped to some low-dimensional space. Meanwhile, the topological relations among data will be preserved. With such characteristics, the SOM was usually applied on data clustering and visualization tasks. However, the SOM has main disadvantage of the need to know the number and structure of neurons prior to training, which are difficult to be determined. Several schemes have been proposed to tackle such deficiency. Examples are growing/expandable SOM, hierarchical SOM, and growing hierarchical SOM. These schemes could dynamically expand the map, even generate hierarchical maps, during training. Encouraging results were reported. Basically, these schemes adapt the size and structure of the map according to the distribution of training data. That is, they are data-driven or dataoriented SOM schemes. In this work, a topic-oriented SOM scheme which is suitable for document clustering and organization will be developed. The proposed SOM will automatically adapt the number as well as the structure of the map according to identified topics. Unlike other data-oriented SOMs, our approach expands the map and generates the hierarchies both according to the topics and their characteristics of the neurons. The preliminary experiments give promising result and demonstrate the plausibility of the method.
Keywords: Self-organizing map, topic identification, learning algorithm, text clustering.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1334287
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2028References:
[1] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag, 1997.
[2] M. P¨oll¨a, T. Honkela, and T. Kohonen, "Bibliography of self-organizing map (SOM) papers: 2002-2005 addendum," Information and Computer Science, Helsinki University of Technology, Espoo, Finland, Tech. Rep. TKK-ICS-R24, 2009.
[3] T. Kohonen, "Self-organizing formation of topologically correct feature maps," Biological Cybernetics, vol. 43, no. 1, pp. 59-69, 1982.
[4] B. Fritzke, "Growing grid - a self-organizing network with constant neighborhood range and adaption strength," Neural Processing Letter, vol. 2, no. 5, pp. 9-13, 1995.
[5] R. Miikkulainen, "Script recognition with hierarchical feature maps," Connection Science, vol. 2, pp. 83-101, 1990.
[6] P. Koikkalainen, "Tree structured self-organizing maps," in Kohonen Maps, E. Oja and S. Kaski, Eds. Amsterdam, Netherlands: Elsevier, 1999, pp. 121-130.
[7] A. Rauber, M. Dittenbach, and D. Merkl, "Towards automatic contentbased organization of multilingual digital libraries: An English, French and German view of the Russian information agency Nowosti news," in Proceedings of the Third All-Russian Scientific Conference on Digital Libraries: Advanced Methods And Technologies, Digital Collections, September 11-13 2001, pp. 11-13.
[8] A. Rauber, D. Merkl, and M. Dittenbach, "The growing hierarchical selforganizing map: exploratory analysis of high-dimensional data," IEEE Transactions on Neural Networks, vol. 13, no. 6, pp. 1331-1341, 2002.
[9] M. Dittenbach, A. Rauber, and D. Merkl, "Recent advances with the growing hierarchical self-organizing map," in Advances in Self- Organizing Maps, N. Allinson, Y. Ahujun, L. Allinson, and J. Slack, Eds. Lincoln, England: Springer, 2001, pp. 140-145.
[10] S. Kaski, T. Honkela, K. Lagus, and T. Kohonen, "WEBSOM-Selforganizing maps of document collections," Neurocomputing, vol. 21, pp. 101-117, 1998.
[11] Y. Liu, X. Wang, and C. Wu, "ConSOM: A conceptional self-organizing map model for text clustering," Neurocomputing, vol. 71, no. 4-6, pp. 857-862, 2008.
[12] G. A. Miller, "WordNet: A lexical database for English," Communications of the ACM, vol. 38, no. 11, pp. 39-41, 1995.
[13] T. Pedersen, S. Patwardhan, and J. Michelizzi, "WordNet::Similarity - measuring the relatedness of concepts," in HLT-NAACL 2004: Demonstration Papers, D. M. Susan Dumais and S. Roukos, Eds. Boston, Massachusetts, USA: Association for Computational Linguistics, May 2 - May 7 2004, pp. 38-41.
[14] C. H. Lee and H. C. Yang, "A Web text mining approach based on selforganizing map," in Proceedings of the ACM CIKM-99 2nd Workshop on Web Information and Data Management, Kansas City, Missouri, 1999, pp. 59-62.
[15] H. C. Yang and C. H. Lee, "A text mining approach on automatic generation of Web directories and hierarchies," Expert Systems with Applications, vol. 27, no. 4, pp. 645-663, 2004.