Ensembling Adaptively Constructed Polynomial Regression Models

Gints Jekabsons

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Ensembling Adaptively Constructed Polynomial Regression Models

Authors: Gints Jekabsons

Abstract:

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Keywords: Basis function construction, heuristic search, modelensembles, polynomial regression.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1328716

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1628

References:

[1] J. O. Rawlings, Applied Regression Analysis: A Research Tool, 2nd ed. CA: Wadsworth & Brooks/Cole, 1998.
[2] D. W. Aha and R. L. Bankert, "A comparative evaluation of sequential feature selection algorithms," Learning from Data, D. Fisher, H. J. Lenz, Eds., New York: Springer, 1996, pp. 199-206.
[3] L. Todorovski, P. Ljubic, and S. Dzeroski, "Inducing polynomial equations for regression," Lecture notes in computer science, Lecture notes in artificial intelligence, 3201, Berlin: Springer, pp. 441-452, 2004.
[4] G. Jekabsons and J. Lavendels, "Polynomial regression modelling using adaptive construction of basis functions", IADIS International Conference, Applied Computing 2008, Algarve, Portugal, 2008, to be published
[5] P. Pudil, J. Novovicova, and J. Kittler, "Floating search methods in feature selection," Pattern Recognition Letters, vol. 15, pp. 1119-1125, 1994.
[6] K. P. Burnham and D. R. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, 2002.
[7] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection," Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, San Mateo, CA, pp. 1137-1145, 1995.
[8] D. Opitz and R. Maclin, ÔÇ×Popular Ensemble Methods: An Empirical Study," Journal of Artificial Intelligence Research, vol. 11, pp. 169-198, 1999.
[9] M. L. Ginsberg, Essentials of Artificial Intelligence. Morgan Kaufmann, 1993.
[10] A. L. Blum and P. Langley, "Selection of relevant features and examples in machine learning," Artificial Intelligence, vol. 97, pp. 245-271, 1997.
[11] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 2nd edition. Prentice Hall, Englewood Cliffs, New Jersey 07632, 2002.
[12] H. Akaike, "A new look at the statistical model identification," IEEE Transactions on Automatic Control, vol. 19, pp. 716-723, 1974.
[13] C. M. Hurvich and C.-L. Tsai, "Regression and time series model selection in small samples," Biometrika, vol. 76, pp. 297-307, 1989.
[14] S. D. Stearns, "On selecting features for pattern classifiers," Proceedings of the 3rd International Joint Conference on Pattern Recognition, IEEE, pp. 71-75., 1976.
[15] J. F. Elder IV, "The Generalization Paradox of Ensembles," Journal of Computational and Graphical Statistics, vol. 12, pp. 853-864, 2003.
[16] J. Reunanen, "Overfitting in making comparisons between variable selection methods," Journal of Machine Learning Research, vol. 3, pp. 371-382, 2003.
[17] J. Loughrey and P. Cunningham, "Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets," 24rth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2004), pp. 33-43, 2004.
[18] D. M. Allen, "The prediction sum of squares as a criterion for selection of predictor variables," Tech. Rep. 23, Department of Statistics, University of Kentucky, 1971.
[19] R. Kohavi and G. H. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[20] F. E. Harrell Jr., Regression Modelling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer, 2001.
[21] L. Breiman. "Heuristics of instability and stabilization in model selection," Annals of Statistics, vol. 24, pp. 2350-2383, 1996.
[22] S. Kotsiantis and P. Pintelas, "Combining Bagging and Boosting," International Journal of Computational Intelligence, vol. 1, pp. 324- 333., 2004.
[23] I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd ed., SF: Morgan Kaufmann, 2005.
[24] J. Rissanen, "Modeling by shortest data description," Automatica, vol. 14, pp. 465-471, 1978.