Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32804
Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

This paper introduces new algorithms (Fuzzy relative of the CLARANS algorithm FCLARANS and Fuzzy c Medoids based on randomized search FCMRANS) for fuzzy clustering of relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd) in which the within cluster dissimilarity of each cluster is minimized in each iteration by recomputing new medoids given current memberships, FCLARANS minimizes the same objective function minimized by FCMdd by changing current medoids in such away that that the sum of the within cluster dissimilarities is minimized. Computing new medoids may be effected by noise because outliers may join the computation of medoids while the choice of medoids in FCLARANS is dictated by the location of a predominant fraction of points inside a cluster and, therefore, it is less sensitive to the presence of outliers. In FCMRANS the step of computing new medoids in FCMdd is modified to be based on randomized search. Furthermore, a new initialization procedure is developed that add randomness to the initialization procedure used with FCMdd. Both FCLARANS and FCMRANS are compared with the robust and linearized version of fuzzy c-medoids (RFCMdd). Experimental results with different samples of the Reuter-21578, Newsgroups (20NG) and generated datasets with noise show that FCLARANS is more robust than both RFCMdd and FCMRANS. Finally, both FCMRANS and FCLARANS are more efficient and their outputs are almost the same as that of RFCMdd in terms of classification rate.

Keywords: Data Mining, Fuzzy Clustering, Relational Clustering, Medoid-Based Clustering, Cluster Analysis, Unsupervised Learning.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1328376

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2356

References:


[1] P. H. A. Sneath and R. R. Sokal, "Numerical Taxonomy- The Principles and Practice of Numerical Classification," W. H. Freeman, San Francisco, 1973.
[2] S. Guha, R. Rastogi, and K. Shim, "CURE: An efficient algorithm for large databases," in Proceedings of SIGMOD -98, Seattle, June 1998, pp. 73-84.
[3] L. Kaufman and P. J. Rousseeuw, "Clustering by means of medoids" in Statistical Data Analysis Based on the Norm," Y. Dodge, Ed., pp. 405- 416. North Holland Elsevier, Amsterdam, 1987.
[4] L. Kaufman and P. J. Rousseeuw, "Finding Groups in Data, an Introduction to Cluster Analysis," John Wiley &Sons, Brussels, Belgium, 1990.
[5] R. T. Ng and J. Han, "Efficient and effective clustering methods for spatial data mining," in Proceedings of the 20th VLDB Conference, Santiago, Chile, Sept. 1994, pp. 144-155.
[6] News Group Dataset. Available at: www.cs.cmu.edu/afs/cs.cmu.edu.
[7] K. C. Gouda and E. Diday, "Symbolic clustering using a new similarity measure," IEEE Transactions on Systems, Man, and Cybernetics, vol. 20, pp. 368-377, 1992.
[8] G. D. Ramkumar and A. Swami, "Clustering data without distance functions," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 21, pp. 9-14, 1998.
[9] Y. El Sonbaty and M. A. Ismail, "Fuzzy clustering for symbolic data," IEEE Transactions on Fuzzy Systems, vol. 6, pp. 195-204, 1998.
[10] P. Bajcsy and N. Ahuja, "Location- and density-based hierarchical clustering using similarity analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, pp. 1011-1015, 1998.
[11] E. H. Ruspini, "Numerical methods for fuzzy clustering," Information Science, vol. 2, pp. 319-350, 1970.
[12] R.J. Hathaway, J.W. Devenport, and J.C. Bezdek, "Relational dual of the c-means clustering algorithms," Pattern Recognition, vol. 22, no. 2, pp. 205-212, 1989.
[13] P. Maji and S. K. Pal, "Rough-Fuzzy C-Medoids Algorithm and Selection of Bio-Basis for Amino Acid, Sequence Analysis," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 6, JUNE 2007.
[14] Z.R. Yang and R. Thomson, "Bio-Basis Function Neural Network for Prediction of Protease Cleavage Sites in Proteins," IEEE Trans.Neural Networks, vol. 16, no. 1, pp. 263-274, 2005.
[15] D. Dubois and H. Prade, "Rough Fuzzy Sets and Fuzzy Rough Sets," Int-l J. General Systems, vol. 17, pp. 191-209, 1990.
[16] M.S. Johnson and J.P. Overington, "A Structural Basis for Sequence Comparisons: An Evaluation of Scoring Methodologies," J. Molecular Biology, vol. 233, pp. 716-738, 1993.
[17] Q. Zhang and I. Couloigner, "A New and Efficient K-Medoid Algorithm for Spatial Clustering," Gervasi et al. (Eds.): ICCSA 2005, LNCS 3482, pp. 181 - 189, 2005.
[18] J. C. Bezdek, "Pattern Recognition with Fuzzy Objective Function Algorithms," Plenum Press, New York, 1981.
[19] R. N. Davé and S. Sen, "Robust Fuzzy Clustering of Relational Data," IEEE Transactions on Fuzzy Systems, vol. 10, no. 6, 2002.
[20] D. Lewis. Reuters-21578 Text Categorization Test Collection. Available http://www.research.att.com.
[21] Dae-Won Kim, Kwang H. Lee and Doheon Lee, "Fuzzy cluster validation index based on inter-cluster proximity," Pattern Recognition Letters 24 (2003) 2561-2574.
[22] R.Krishnapuram, R.Joshi, A.Nasraoui and O.Yi, "Low-complexity fuzzy relational clustering algorithms for Web mining," Fuzzy Systems, IEEE Transactions, vol.9, pp. 595--607, Aug 2001.
[23] M. Ester, H. Kriegel, and X. Xu, "Knowledge discovery in large spatial databases: focusing techniques for efficient class identification," International Symposium on Advances in Spatial Databases, vol. 951. Springer, Portland, ME, pp. 67-82, 1995.
[24] M. Camila, N. Barioni, L. Razente, J.M. Traina, and C. Traina, "Accelerating k-medoid-based algorithms through metric access methods," Journal of Systems and Software vol. 81, pp. 343-355, 2008.
[25] R. J. Hathaway and J. C. Bedeck, "NERF c-means: Non-Euclidean relational fuzzy clustering," Pattern Recognition, vol. 27, pp. 429-437, 1994.
[26] M. A. Ismail "Soft Clustering: Algorithms and validity of solutions," Fuzzy Computing. Amsterdam, the Netherlands: Elsevier, 1988, pp. 445- 471.