Using Fractional Factorial Designs for Variable Importance in Random Forest Models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33087
Using Fractional Factorial Designs for Variable Importance in Random Forest Models

Authors: Ewa. M. Sztendur, Neil T. Diamond

Abstract:

Random Forests are a powerful classification technique, consisting of a collection of decision trees. One useful feature of Random Forests is the ability to determine the importance of each variable in predicting the outcome. This is done by permuting each variable and computing the change in prediction accuracy before and after the permutation. This variable importance calculation is similar to a one-factor-at a time experiment and therefore is inefficient. In this paper, we use a regular fractional factorial design to determine which variables to permute. Based on the results of the trials in the experiment, we calculate the individual importance of the variables, with improved precision over the standard method. The method is illustrated with a study of student attrition at Monash University.

Keywords: Random Forests, Variable Importance, Fractional Factorial Designs, Student Attrition.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1058377

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1995

References:


[1] Box, G.E.P. and Hunter, J.S. and Hunter, W.G., Statistics for Experi-menters, 2nd ed. Hoboken, New Jersey: John Wiley & Sons, 2005.
[2] Breiman, L. and Cutler, A., "Random Forests", Salford Sytems, www.salfordsystems.com, 2008.
[3] Hastie, T. and R.Tibshirani and J.Friedman, The Elements of Statistical Learning, 2nd. Ed., New York: Springer, 2009.
[4] Liaw, A. and M.Wiener, “Classification and Regression by random Forest”, R News, 2(3), 18-22, 2002.
[5] Liaw, A. and M.Wiener, randomForest: Breiman and Cutler’s random forests for classification and regression, R package version 4.6-12., http:/CRAN.R-project.org/package=randomForest, 2012.
[6] Margolon, B.H., “Results on factorial designs of resolution IV for the 2n and 2n3m series”, Technometrics, 10, 431-444, 1969.
[7] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, http://www.R-project.org, 2012.