Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Grid-based Supervised Clustering - GBSC

Authors: Pornpimol Bungkomkhun, Surapong Auwatanamongkol

Abstract:

This paper presents a supervised clustering algorithm, namely Grid-Based Supervised Clustering (GBSC), which is able to identify clusters of any shapes and sizes without presuming any canonical form for data distribution. The GBSC needs no prespecified number of clusters, is insensitive to the order of the input data objects, and is capable of handling outliers. Built on the combination of grid-based clustering and density-based clustering, under the assistance of the downward closure property of density used in bottom-up subspace clustering, the GBSC can notably reduce its search space to avoid the memory confinement situation during its execution. On two-dimension synthetic datasets, the GBSC can identify clusters with different shapes and sizes correctly. The GBSC also outperforms other five supervised clustering algorithms when the experiments are performed on some UCI datasets.

Keywords: supervised clustering, grid-based clustering, subspace clustering

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1077439

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1237

References:


[1] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic subspace slustering of high dimensional data for data mining applications," In Proc. ACM SIGMOD International Conference on Management of Data, Seattle, Washington, June 1998, pp. 94-105.
[2] J. S. Aguilar, R. Ruiz, J. C. Riquelme, and R. Giraldez, "SNN: A supervised clustering algorithm," In Proc. 14th International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems (IEA/AIE 2001)
[3] S. H. Al-Harbi, and V. J. Rayward-Smith, "Adaptive k-means for supervised clustering," Applied Intelligence, Volume 24, Number 3, pp. 219-226(8), June 2006.
[4] T. Finley and T. Joachims, "Supervised clustering with support vector machines," In Proc. International conference on Machine learning, Bonn, Germany, August 07 - 11, 2005, pp. 217-224.
[5] A. Jirayusakul, "Supervised growing neural gas algorithm in clustering analysis," Ph.D. dissertation, School of Applied Statistics, National Institute of Development Administration, Thailand, 2007, unpublised.
[6] A. Jirayusakul, and S. Auwatanamongkol, "A supervised growing neural gas algorithm for cluster analysis," International Journal of Hybrid Intelligent Systems, Vol. 4, No.2, 2007.
[7] S. B. Kotsiantis and P. E. Pintelas, "Recent Advances in Clustering: A Brief Survey," Transactions on Information Science and Applications, 2004,1(1):73-81.
[8] X. Li and N. Ye, "Grid-and Dummy-Cluster-Based Learning of Normal and Intrusive Clusters of Computer Intrusion Detection," Journal of Quality and Reliability Engineering International, Vol. 18, No. 3, pp. 231-242.
[9] X. Li and N. Ye, "A Supervised Clustering Algorithm for Computer Intrusion Detection," Knowledge and Information Systems, Vol. 8, No.4, pp. 498-509.
[10] X. Li and N. Ye, "A Supervised Clustering and Classification Algorithm for Mining Data With Mixed Variables," IEEE Transactions on Systems, Man, and Cybernetics-Part A, Vol. 36, No. 2, pp. 396-406.
[11] N. H. Park and W. S. Lee, "Statistical Grid-based Clustering over Data Streams," SIGMODD Record, Vol.33, No.1, March 2004.
[12] L. Parsans, E. Haque, and H. Liu, "Subspace Clustering for High Dimensional Data: A Review," ACM SIGKDD Explorations Newsletter, 6(1):90-105, June 2004.
[13] C. M. Procopiuc, 1997, "Clustering Problems and their Applications (a Survey)," Available: http://www.cs.duke.edu/~magda.
[14] G. Sheikholeslami, S. Chatterjee, and A. Zhang, "WaveCluster: A Multi- Resolution Clustering Approach for Very Large Spatial Databases," In Proc. International Conference on Very Large Databases, New York City, August 24-27, 1998.
[15] J. Sinkkonen, S. Kaski, and J. Nikkila, "Discriminative Clustering: Optimal Contingency Tables by Learning Metrics," In Proc. European Conference on Machine Learning (ECML-02), Springer-Veriag, London, pp. 418-430.
[16] N. Slonim and N. Tishby, "Agglomerative Information Bottleneck," In Proc. Neural Information Processing Systems, pp. 617-623, 1999.
[17] P-N. Tan, M. Steinbach, and V. Kumar, "Introduction to Data Mining. Boston: Pearson Education, Inc., pp. 604-608.
[18] N. Tishby, F. C. Pereira, and W. Bialek, "The Information Bottleneck Method," In Proceedings of the Allerton Conference on Communication and Computation, 1999.
[19] Y. Qu and Z. Xu, "Supervised Clustering Analysis for Microarray Data Based on Multivariate Gauussian Mixture," Bioinformatics. 20 (Aug) : 1275-1288.
[20] University of California at Irving, Machine Learning Repository. Available: http:/www.ics.edu/~mlearn/MLRepository.html
[21] N. Ye and X. Li, "A Scalable Clustering Technique for Intrusion Signature Recognition," In Proc. The 2001 IEEE Workshop on Information Assurance and Security United States Military Academy, West Point, NY, 5-6 June, 2001.
[22] N. Ye and X. Li, 2005, "Method for Classifying Data using Clustering and Classification Algorithm Supervised," Available: http://www.patentstorm.us/patents/6907436/fulltext.html.
[23] N. Zeidat, C. F. Eick, and Z. Zhao, "Supervised Clustering: Algorithms and Applications," Available: http://www2.cs.uh.edu/~ceick/kdd/ZEZ06.pdf.