Time Series Regression with Meta-Clusters
Authors: Monika Chuchro
Abstract:
This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.
Keywords: Clustering, Data analysis, Data mining, Predictive models.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1336090
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1957References:
[1] E. W. Steyerberg, "Assumptions in regression models: Additivity and linearity”, in Clinical Prediction Models, New York: Springer, 2009, pp. 213-230.
[2] J. W. Osborne, "Improving your data transformations: Applying the Box-CoX transformation”, in Practical Assessment Research and Evaluation, Vol. 15, No. 12, 2010, pp. 1-9.
[3] R. H. Myers, Classical and Modern Regression with Applications, 2nd Edition. Duxbury Press. Belmont, California, 1990.
[4] A. M. Glenberg, Learning From Data, 2nd Edition. Lawrence Earlbaum Associates, Mahwah, New Jersey, 1996.
[5] N. Karunanithi, D. Whitley, Y. K. Maalaiya, "Using neural networks in reliability prediction” in IEEE Software, vol. 9, issue:4, pp. 53-59, Jul. 1992.
[6] A. K. Jain, M. N. Murty, P. J. Flynn, "Data Clustering: A review”, in ACM Computing Surveys, Vol. 31, No 3, 1999, pp. 264-323.
[7] J. B. MacQueen, Some Methods for classification and Analysis of Multivariate Observations, in: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297.
[8] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford, England: Oxford University Press, 1995.
[9] I. H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Elsevier, San Francisco, 2005.
[10] E. Lozano, E. Acuna, Comparing Clustering and Metaclustering Algorithms, in:Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science Vol. 6871, 2011, pp 306-319.
[11] Z. Yujing, T. Jianshan, J. Garcia-Frias, G.R. Gao, An adaptive meta-clustering approach: combining the information from different clustering results in: Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society, pp. 276-287.
[12] M. Meila, Comparing Clusterings - An Axiomatic View, in: Proceedings of the 22nd International Conference on Machine Learning, 2005.
[13] L. Hubert and P. Arabie. Comparing partitions, 1985,Vol.2, pp.193–218
[14] W. Rand, Objective criteria for the evaluation of clustering methods. The American Statistical Association, Vol.6, 1971, pp.846–850.
[15] D. J. Divya, D. B. Gayathri, A Meta Clustering Approach For Ensemble Problem, in: International Journal of Image Processing and Vision Sciences (IJIPVS), Vol-1 Iss-3,4, 2012, pp. 98-102.
[16] R. Caruana, M. Elhawary, N. Nguyen, C. Smith, Meta Clustering http://www.cs.cornell.edu/~caruana/ICDM06.metaclust.caruana.pdf.