Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Multidimensional Data Mining by Means of Randomly Travelling Hyper-Ellipsoids

Authors: Pavel Y. Tabakov, Kevin Duffy

Abstract:

The present study presents a new approach to automatic data clustering and classification problems in large and complex databases and, at the same time, derives specific types of explicit rules describing each cluster. The method works well in both sparse and dense multidimensional data spaces. The members of the data space can be of the same nature or represent different classes. A number of N-dimensional ellipsoids are used for enclosing the data clouds. Due to the geometry of an ellipsoid and its free rotation in space the detection of clusters becomes very efficient. The method is based on genetic algorithms that are used for the optimization of location, orientation and geometric characteristics of the hyper-ellipsoids. The proposed approach can serve as a basis for the development of general knowledge systems for discovering hidden knowledge and unexpected patterns and rules in various large databases.

Keywords: Classification, clustering, data minig, genetic algorithms.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1328146

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1390

References:


[1] S.S.R. Abidi, K.M. Hoe and A. Goh, "Analyzing Data Clusters: A Rough Set Approach to Extract Cluster-defining Symbolic Rules." Lecture Notes in Computer Science 2189: Advances in Intelligent Data Analysis. Fourth International Conference (IDA-01), Cascais, Portugal, 2001.
[2] P. Adriaans and D. Zantinge, Data Mining, Addison-Wesley, England, 1997.
[3] S. Audic and J.M. Claverie, "Detection of eukaryotic promoters using Markov transition matrices", Computer Chemistry, vol.21, no.4, pp. 223-227, 1997.
[4] V.B. Bajic, Sin Lam Tan, Yutaka Suzuki and Sumio Sugano, "Promoter prediction analysis on the whole human genome", Nature Biotechnology, vol.22, pp. 1467-1473, 2004.
[5] M.J.A. Berry and G. Linoff, Data Mining Techniques. For Marketing, Sales and Customer Support, John Wiley & Sons, Inc., 1997.
[6] P. Bucher, "Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences ", Journal of Molecular Biology, vol.212, pp. 563-578, 1990.
[7] J.W. Fickett and A.G. Hatzigeorgiou, "Eukaryotic promoter recognition", Genom Research, vol.7, no.9, pp. 861-878, 1997.
[8] D.B. Fogel, Evolutionary Computation (Second edition), IEEE Press, New York, 2000.
[9] F.R. Gantmacher, The Theory of Matrices, Chelsea Publishing Company, N.Y., 1959.
[10] D.E. Goldberg, Genetic Algorithms in Search, Optimisation, and Machine Learning, Addison-Wesley, Reading, MA, 1989.
[11] J.A. Hartigan, Clustering Algorithms, John Wiley & Sons, 1975.
[12] J.H. Holland, Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, MI, 1976.
[13] E.R. Hruschka and N.F. Ebecken, "A Clustering Genetic Algorithm for Extracting Rules from Supervised Neural Network Models in Data Mining Tasks", Int. Journal of Computers, Systems and Signals, vol.1, no.1, pp. 17-29, 2000.
[14] A.K. Jain, M.N. Murty and P.J. Flynn, "Data Clustering: A Review", ACM Computing Surveys, vol.31, no.3, pp. 264-323, 1999.
[15] N. Kasabov, Evolving Neural Networks, MIT Press, 1996.
[16] S.Y. Kung, Digital Neural Networks, PTR Prentice Hall, Engelwood Cliffs, NJ, 1993.
[17] A.G. Pedersen, P. Baldi, Y. Chauvin and S. Brunak, "The biology of eukaryotic promoter prediction - a review", Computers and Chemistry, vol.23, pp. 191-207, 1999.
[18] P.Y. Tabakov and V.B. Baji'c, "Genetic Algorithms and Extraction of Rules for Detection of Short DNA Motifs", Int. Journal of Computers, Systems and Signals, vol. 1, no. 1, pp. 106-117, 2000.
[19] Xiaowo Wang, Zhenyu Xuan, Xiaoyue Zhao, Yanda Li and Michael Q. Zhang, " High-resolution human core-promoter prediction with CoreBoost HM", Genome Research, vol.19, pp. 266-275, 2009.