Growing Self Organising Map Based Exploratory Analysis of Text Data
Authors: Sumith Matharage, Damminda Alahakoon
Abstract:
Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.
Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1092387
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1999References:
[1] A. Haug, "The implementation of enterprise content management systems in smes,” Journal of Enterprise Information Management, vol. 25, no. 4, pp. 349–372, 2012.
[2] D. Robb, "Text mining tools take on unstructured data,” Computerworld, 2004.
[3] T. Kohonen, "Self-organized formation of topologically correct feature maps,” Biological Cybernetics, vol. 43, pp. 59–69, 1982.
[4] T. Kohonen, "Essentials of the self-organizing map,” Neural Networks, vol. 37, pp. 52–65, 2013.
[5] D. Isa, V. Kallimani, and L. Lee, "Using the self organizing map for clustering of text documents,” Expert Systems with Applications, vol. 36, no. 5, pp. 9584–9591, 2009.
[6] D. Alahakoon, S. K. Halgamuge, and B. Srinivasan, "Dynamic self-organizing maps with controlled growth for knowledge discovery,” IEEE-NN, vol. 11, no. 3, p. 601, May 2000.
[7] M. Cao, A. Li, Q. Fang, E. Kaufmann, and B. J. Kroeger, "Interconnected growing self-organizing maps for auditory and semantic acquisition modeling,” Frontiers in psychology, vol. 5, 2014.
[8] C. D. Wijetunge, Z. Li, I. Saeed, J. Bowne, A. L. Hsu, U. Roessner, A. Bacic, and S. K. Halgamuge, "Exploratory analysis of high-throughput metabolomic data,” Metabolomics, vol. 9, no. 6, pp. 1311–1320, 2013.
[9] K. Wickramasinghe, D. Alahakoon, P. Schattner, and M. Georgeff, "Self-organizing maps for translating health care knowledge: A case study in diabetes management,” in AI 2011: Advances in Artificial Intelligence. Springer, 2011, pp. 162–171.
[10] P. Lokuge and D. Alahakoon, "Improving the adaptability in automated vessel scheduling in container ports using intelligent software agents,” European Journal of Operational Research, vol. 177, no. 3, pp. 1985–2015, 2007.
[11] S. Matharage, O. Alahakoon, D. Alahakoon, S. Kapurubandara, R. Nayyar, M. Mukherji, U. Jagadish, S. Yim, and I. Alahakoon, "Analysing stillbirth data using dynamic self organizing maps,” in DEXA Workshops, F. Morvan, A. M. Tjoa, and R. Wagner, Eds. IEEE Computer Society, 2011, pp. 86–90.
[12] D. Alahakoon, "Controlling the spread of dynamic self-organising maps,” Neural Computing and Applications, vol. 13, no. 2, pp. 168–174, 2004.
[13] R. Amarasiri, L. Wickramasinghe, and D. Alahakoon, "Enhanced cluster visualization using the data skeleton model,” Proceedings of Soft computing and the Web (ISCW), vol. 3, pp. 239–548, 2003.
[14] D. Davies and D. Bouldin, "A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 2, pp. 224–227, 1979.
[15] D. Alahakoon, S. Halgamuge, and B. Srinivasan, "Mining a growing feature map by data skeleton modelling,” Studies in fuzziness and soft computing, vol. 68, pp. 217–250, 2001.
[16] N. Ahmad, D. Alahakoon, and R. Chau, "Cluster identification and separation in the growing self-organizing map: application in protein sequence classification,” Neural Computing and Applications, vol. 19, no. 4, pp. 531–542, 2010.
[17] M. Schkolnick, "Clustering algorithm for hierarchical structures,” ACM Trans. on Database Sys., vol. 2, no. 1, p. 27, Mar. 1977.
[18] D. Merkl, "Text classification with self-organizing maps: Some lessons learned,” Neurocomputing, vol. 21, no. 1-3, pp. 61–77, 1998.
[19] D. D. Lewis, "Test Collections : Reuters-21578,” http://www. daviddlewis.com/resources/testcollections/reuters21578/, 2004, (Online; accessed 01-August-2009).
[20] C. Manning, P. Raghavan, and H. Schutze, Introduction to information retrieval. Cambridge University Press Cambridge, 2008, vol. 1.
[21] J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel, "Performance measures for information extraction,” in Proceedings of DARPA Broadcast News Workshop, 1999, pp. 249–252.