Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30174
An Attribute-Centre Based Decision Tree Classification Algorithm

Authors: Gökhan Silahtaroğlu

Abstract:

Decision tree algorithms have very important place at classification model of data mining. In literature, algorithms use entropy concept or gini index to form the tree. The shape of the classes and their closeness to each other some of the factors that affect the performance of the algorithm. In this paper we introduce a new decision tree algorithm which employs data (attribute) folding method and variation of the class variables over the branches to be created. A comparative performance analysis has been held between the proposed algorithm and C4.5.

Keywords: Classification, decision tree, split, pruning, entropy, gini.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1084171

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1058

References:


[1] Coley, A D., (1999). An Introduction to Genetic Algorithms for Scientists and Engineers. World Scientific, Singapore, 188p.
[2] Pham, D.T., and Karaboga, D., 2000. Intelligent Optimization Techniques. Springer, London, Great Britain, 261p.
[3] Han J., & Kamber, Micheline. (2001). Data Mining Concepts and Techniques, Morgan Kaufman Publishers Academic Press.
[4] Authori Fausett L.(1994). Fundamentals of Neural Networks, Prentice- Hall, New Jersey.
[5] Maulik U. & Sanghamitra B.(2000). Genetic Algorithm-based clustering technique, Journal of the Pattern Recognition, Pergamon, issue: 33.
[6] Bill F. (Ed.) (1992). Information retrieval : data structures & algorithms. Prentice Hall.
[7] Breiman, L., J. H. Fried man, R. A. Ol shen, and C. J. Stone. (1984). Classification and regression trees. Mon terey, Calif., U.S.A.Wadsworth, Inc.
[8] Shafer J.C., Agrawal R., Mehta M.: "SPRINT: A Scalable Parallel Classifier for Data Mining", Proc. of the 22th International Conference on Very Large Databases, Mumbai (Bombay), India, Sept. 1996.
[9] Mitchell T.(1997). Machine Learning, McGraw-Hill International.
[10] Quinlan,J.Ross. (1987). Simplifying decision trees, International Journal of Man-Machine Studies,issue: 27(3), (pp. 221 - 234).
[11] Breiman L., & Friedman J. H., & Olshen R. A., & Stone C. J. (1984). Classification and Regression Trees, Wadsworth, Belmont.
[12] Mehta M., & Agrawal R., & Rissanen J. (1996). SLIQ: A Fast Scalable Classifier for Data Mining, Proceedings of 5th International Extending Database Technology Conference.France. (pp. 18-32). Springer-Verlag, London.
[13] Agrawal R. & Shafer J.C. (1996). Parallel Mining of Association Rules, Proceedings. of IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6. (962- 969). IEEE Educational Activities Department. USA.
[14] Hettich, S. , & Bay, S. D. (1999). The UCI KDD Archive, Department of Information and Computer Science, University of California, Irvine, CA. Retrieved September 1, 2008, from http://kdd.ics.uci.edu.
[15] Pham D.T., & Chan A.B.(1998). Control Chart Pattern Recognition using a New Type of Self Organizing Neural Network. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering. Vol 212, No 1, (pp. 115-127). Professional Engineering Publishing.
[16] Keogh, E. & Pazzani, M. (2001). Derivative Dynamic Time Warping. In First SIAM International Conference on Data Mining (SDM'2001), Chicago, USA.
[17] Alcock R.J., & Manolopoulos Y.(1999). Time-Series Similarity Queries Employing a Feature-Based Approach. 7th Hellenic Conference on Informatics. Ioannina,Greece.