Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30172
A Decision Boundary based Discretization Technique using Resampling

Authors: Taimur Qureshi, Djamel A Zighed

Abstract:

Many supervised induction algorithms require discrete data, even while real data often comes in a discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce a technique by using resampling (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether the resampling technique can lead to better discretization points, which opens up a new paradigm to construction of soft decision trees.

Keywords: Bootstrap, discretization, resampling, soft decision trees.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1060699

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1091

References:


[1] D.A.Zighed,S.Rabasda,R.Rakotomalala. Discretization Methods in Supervised Learning. Encyclopedia of Computer Science and Technology, vol40,pp35-50,1998.
[2] L.Breiman, J.H.Friedman, R.A.Olshen, C.J.Stone. Classification and Regression Trees. Wadsworth International, San Francisco, 1984.
[3] J.Quinlan.C4.5:Programs for Machine Learning. M.Kaufmann,SanMateo,CA,1993.
[4] L.Wehenkel. An Information Quality Based Decision Tree Pruning Method. Proceedings of the 4th International Conference on Information Processing and Management of Uncertainty in Knowledge Based Systems, IPMU-92(1992).
[5] R.Kerber. Discretization of Numeric Attributes. Proceedings of the Tenth National Conference on Artificial Intelligence, MIT Press, Cambridge, MA, 1992, pp.123-128.
[6] D.A.Zighed, R.Rakotomalala and S.Rabasda. Discretization Method for Continuous Attributes in Induction Graphs. Proceeding of the 13th European Meetings on Cybernetics and System Research, 1996, pp.997- 1002.
[7] U.M.Fayyad, K.Irani. Multi-interval Discretization of Continuous-Valued Attributes for Classification Learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann,San Mateo,CA,1993,pp1022-1027
[8] D.A.Zighed and R.Rickotomalala. A Method for Non Arborescent Induction Graphs. Technical Report, Laboratory ERIC, University of Lyon 2 , 1996.
[9] Mooney, C Z Duval, R D (1993). Bootstrapping. A Nonparametric Approach to Statistical Inference. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-095. Newbury Park, CA: Sage.
[10] J. Catlett. On changing continuous attributes into ordered discrete attributes. In Proceedings of the European Working Session on Learning, pages 164-178., 1991.
[11] J. Y. Ching, A. K. C. Wong, and K. C. C. Chan. Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(7):641- 651, 1995.
[12] T. Elomaa and J. Rousu. General and efficient multisplitting of numerical attributes. Machine Learing, 36(3):201-244, 1999.
[13] A. Kusiak. Feature transformation methods in data mining. IEEE Trans. on Electronics Packaging Manufacturing, 24(3):214-221, 2001.
[14] H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4):393-423, 2002.
[15] J. R. Quinlan. Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research, 4:77-90, 1996.
[16] Y. Peng and P. Flach. Soft Discretization to Enhance the Continuous Decision Tree Induction. Integrating Aspects of Data Mining, Decision Support and Meta-Learning, Christophe Giraud-Carrier, Nada Lavrac and Steve Moyle, editors, pages 109-118, ECML/PKDD-01 workshop notes, September 2001.
[17] Efron B, Tibshirani R. An Introduction to the Bootstrap. Chapman and Hall, 1998.
[18] Y. Yang and G. I. Webb. Discretization for naive-bayes learning: managing discretization bias and variance. Technical Report 2003/131, School of Computer Science and Software Engineering, Monash University, 2003.
[19] Blake, C.L. Merz, C.J. UCI Repository of machine learning databases
[http://www.ics.uci.edu/ mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science. (1998).