**Commenced**in January 2007

**Frequency:**Monthly

**Edition:**International

**Paper Count:**30579

##### IMDC: An Image-Mapped Data Clustering Technique for Large Datasets

**Authors:**
Faruq A. Al-Omari,
Nabeel I. Al-Fayoumi

**Abstract:**

In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.

**Keywords:**
Data Mining,
Pattern Discovery,
Data Clustering,
predictive analysis,
Image-mapping

**Digital Object Identifier (DOI):**
doi.org/10.5281/zenodo.1333210

**References:**

[1] Biernacki, G. Celeux, and G. Govaert, ''Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood,'' IEEE Trans on pattern analysis and Machine Intelligence,22(7), pp. 719-725, 2000.

[2] R. Ostrovsky and Y. Rabani, ''Polynomial time approximation schemes for geometric k-clustering,'' Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp.349, Telcordia Technologies, Morristown, NJ, USA, 2000..

[3] S. Guha, R. Rastogi and K. Shim,''ROCK: A Robust Clustering Algorithm for Categorical Attributes,'' Proceedings of the 15th International Conference on Data Engineering, pp.512, Sydney, Australia, March 1999.

[4] L.O. Hall and L.O.; B. Ozyurt,''Scaling genetically guided fuzzy clustering,'' Proceedings of the 3rd International Symposium on Uncertainty Modeling and Analysis, pp.328, College Park, Maryland, March1995.

[5] D.E. Tamir, C.Y. Park; W.S. Yoo, ''Vector quantization and clustering: a pyramid approach,'' Proceedings of the Data Compression Conference(DCC'95), pp.482, Utah, USA, March 1995.

[6] N.K. Ratha, A.K Jain, and M.J. Chung Editor(s): Cantoni, V., Lombardi, L., Mosconi, M., Savini, M., Setti, A. ''Clustering using a coarse-grained parallel genetic algorithm: a preliminary study,'' International Conference on Computer Architectures for Machine Perception, pp.331, Como, Italy, Sept. 1995.

[7] Lee and V. Estivill-Castro, ''Effective and Efficient Boundary-based Clustering for Three-dimensional Geoinformation Studies,'' Proceedings of the Third International Symposium on Cooperative Database Systems for Advanced Applications (codas), pp.82, Beijing, China, April 2001.

[8] S. Guha, N. Mishra, R. Motwani, L. O'Callaghan, ''Clustering data streams,'' Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp.359, Redondo Beach, California, USA, 2000.

[9] Ching-Huang Yun and Kun-Ta Chuang and Ming-Syan Chen ''An Efficient Clustering Algorithm for Market Basket Data Based on Small Large Ratios,'' 25th Annual International Computer Software and Applications Conference (COMPSAC'01), pp.505, Chicago, Illinois, USA, October 2001.

[10] Bouguettaya, ''On-Line Clustering,'' IEEE Transactions on Knowledge and Data Engineering, pp. 333-339, April 1996.

[11] H. Nagesh and A. Choudhary ''A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets,'' Proceedings of the 2000 International Conference on Parallel Processing, pp.447, August 2000.

[12] Judd, P. McKinley, and A. Jain, ''Large-Scale Parallel Data Clustering,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.871-876, 1998.

[13] Petridis, V. and Kaburlasos, V.G. ''Clustering and Classification in Structured Data Domains Using Fuzzy Lattice Neurocomputing (FLN),'' IEEE Transactions on Knowledge and Data Engineering, pp. 245-260, March 2001.

[14] Mu-Chun Su, Chien-Hsing Chou, ''Modified Version of the K-Means Algorithm with a Distance Based on Cluster Symmetry", Patterns Analysis and Machine Intelligence, 23(6): pp.674-680, June 2001.

[15] Cheng-Fa Tsai, Han-Chang Wu, Chun-Wei Tsai, ''A New Data Clustering Approach for Data Mining in Large Databases,'' In Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, 2002. I-SPAN '02. , pp.315, Makati City, Metro Manila, Philippines, May, 2002.

[16] M. Dash H. Liu X. Xiaowei, ''Merging distance and density based clustering,'' Proceedings of the Seventh International Conference on Database Systems for Advanced Applications, pp. 332-39, Hong Kong, China, 2001.

[17] Sarafls, A.M.S. Zalzala, and P.W. Trinder ''A genetic rule-based data clustering toolkit,'' Proceedings of the 2002 Congress on Evolutionary Computation, CEC '02., Volume: 2 , pp.1238 -1243, 2002.

[18] C. Ordonez, E. Omiecinski, and N. Ezquerra, ''A fast algorithm to cluster high dimensional basket data,'' Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), 633 - 636, San Jose, CA, USA, Nov. 2001.

[19] S. E. Umbaugh, ''Computer Vision and Image Processing A Practical Approach Using CVIPtools'', Prentice Hall, 1998.

[20] Yoke Khim Ung and Mokhtarian, F., ''Multi-scale spline-based contour data compression and reconstruction through curvature scale space'', In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 4, pp. 2123 -2126, 2000.

[21] Pinheiro, A.M.G.; Izquierdo, E.; Ghanhari, M., ''Shape matching using a curvature based polygonal approximation in scale-space'', In Proceedings of the International Conference on Image Processing, 2000, Vol. 2, pp. 538 -541, 2000.

[22] T. Zhang, R. Ramakrishanan, M. Livny, ''BIRCH: an efficient clustering method for very large databases,'' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 103-114, 1996.

[23] V. Ganti, R. Ramakrishanan, and J. Gehrke, ''Clustering large datasets in arbitrary metric spaces'', In Proceedings of the 15th Int. Conference On Data Engineering, pp. 502-511, March 1999.

[24] R. J. Schalkoff, ''Digital Image Processing and Computer Vision'', John Wiley and Sons Inc.,1989.

[25] M. Goebel, and L. Gruenwald, ''A survey of data mining and knowledge discovery software tools'', ACMKDD, Explorations, 1(1): pp. 20-33, 1999.

[26] Mokhtarian, F. and A. K. Mackworth, ''Scale-Based Description and Recognition of Planar Curves and Two-Dimensional Shapes,'' IEEE Trans. PAMI, vol. 8, no. 1, pp. 34-43, 1986.