Time Series Forecasting Using Various Deep Learning Models
Authors: Jimeng Shi, Mahek Jain, Giri Narasimhan
Abstract:
Time Series Forecasting (TSF) is used to predict the target variables at a future time point based on the learning from previous time points. To keep the problem tractable, learning methods use data from a fixed length window in the past as an explicit input. In this paper, we study how the performance of predictive models change as a function of different look-back window sizes and different amounts of time to predict into the future. We also consider the performance of the recent attention-based transformer models, which had good success in the image processing and natural language processing domains. In all, we compare four different deep learning methods (Recurrent Neural Network (RNN), Long Short-term Memory (LSTM), Gated Recurrent Units (GRU), and Transformer) along with a baseline method. The dataset (hourly) we used is the Beijing Air Quality Dataset from the website of University of California, Irvine (UCI), which includes a multivariate time series of many factors measured on an hourly basis for a period of 5 years (2010-14). For each model, we also report on the relationship between the performance and the look-back window sizes and the number of predicted time points into the future. Our experiments suggest that Transformer models have the best performance with the lowest Mean Absolute Errors (MAE = 14.599, 23.273) and Root Mean Square Errors (RSME = 23.573, 38.131) for most of our single-step and multi-steps predictions. The best size for the look-back window to predict 1 hour into the future appears to be one day, while 2 or 4 days perform the best to predict 3 hours into the future.
Keywords: Air quality prediction, deep learning algorithms, time series forecasting, look-back window.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1183References:
[1] R. Adhikari and R. K. Agrawal, An Introductory Study on Time Series Modeling and Forecasting, Germany: LAP Lambert Academic Publishing, 2013.
[2] J. H. Cochrane, "Time Series for Macroeconomics and Finance," University of Chicago, Chicago, 1997.
[3] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th Edition, 2020.
[4] Kyoung-jaeKim, "Financial time series forecasting using support vector machines," Neurocomputing, vol. 55, no. 1-2, pp. 307-319, September 2003.
[5] P. Cortez, M. Rio, M. Rocha and P. Sousa, "Multi-scale Internet traffic forecasting using neural networks and time series methods," Expert Systems, vol. 29, no. 2, pp. 143-155, May 2012.
[6] D. T. Do, J. Lee and H. Nguyen-Xuan, "Fast evaluation of crack growth path using time series forecasting," Engineering Fracture Mechanics, vol. 218, September 2019.
[7] J. S. Praxis and T. Chaudhuri, "A Time Series Analysis-Based Forecasting Framework for the Indian Healthcare Sector," Journal of Insurance and Financial Management, vol. 3, April 2017.
[8] A. Agrawal, V. Kumar, A. Pandey and I. Khan, "An Application of Time Series Analysis for Weather Forecasting," International Journal of Engineering Research and Applications, vol. 2, no. 2, pp. 974-980, Mar-Apr 2012.
[9] S. Du, T. Li, Y. Yang and S.-J. Horng, "Deep Air Quality Forecasting Using Hybrid Deep Learning Framework," IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 6, pp. 2412-2424, June 2021.
[10] G. E. P. Box, G. M. Jenkins, G. C. Reinsel and G. M. Ljung, Time Series Analysis: Forecasting and Control, Hoboken, New Jersey: John Wiley & Sons, Inc., 2016.
[11] K. Hipel and A. McLeod, Time Series Modelling of Water Resources and Environmental Systems, Elsevier Science, 1994.
[12] J. W. Galbraith and V. Zinde-Walsh, "Autoregression-Based Estimators for ARFIMA Models," CIRANO Working Papers, 2001.
[13] C. Hamzacebi, "Improving artificial neural networks’ performance in seasonal time series forecasting," Information Sciences, vol. 178, no. 23, pp. 4550-4559, December 2008.
[14] R. F. Engle, "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation," Econometrica, vol. 50, no. 4, pp. 987-1007, July 1982.
[15] S. Radha and T. M., "Forecasting Short Term Interest Rates Using ARMA, ARMA-GARCH and ARMA-EGARCH Models," SSRN Electronic Journal, January 2006.
[16] S. Ramzan, S. Ramzan and F. M. Zahid, "Modeling and forecasting exchange rate dynamics in Pakistan using ARCH family of models," Electronic Journal of Applied Statistical Analysis, vol. 5, November 2012.
[17] B. E. Hansen, "Threshold autoregression in economics," Statistics and Its Interface, vol. 4, p. 123–127, 2011.
[18] S. Karasu, A. Altan, Z. Sarac and R. Hacioglu, "Prediction of wind speed with non-linear autoregressive (NAR) neural networks," in 25th Signal Processing and Communications Applications Conference, Antalya, Turkey, May 2017.
[19] N. Doulamis, A. Doulamis and K. Ntalianis, "Recursive Non-linear Autoregressive models (RNAR): Application to traffic prediction of MPEG video sources," in 11th European Signal Processing Conference, Toulouse, France, 2002.
[20] P. ROBINSON, "The Estimation of a Nonlinear Moving Average Model," Stochastic Processes arced their Applications, pp. 81-90, 1977.
[21] R. Perrelli, "Introduction to ARCH & GARCH models," Department of Economics, University of Illinois, 2001.
[22] LijuanCao, "Support vector machines experts for time series forecasting," Neurocomputing, vol. 51, pp. 321-339, April 2003.
[23] T.-T. Chen and S.-J. Lee, "A weighted LS-SVM based learning system for time series forecasting," Information Sciences, vol. 299, pp. 99-116, April 2015.
[24] H. Nie, G. Liu, X. Liu and Y. Wang, "Hybrid of ARIMA and SVMs for Short-Term Load Forecasting," Energy Procedia, vol. 16, pp. 1455-1460, 2012.
[25] G. P. Zhang, "Time series forecasting using a hybrid ARIMA and neural network model," Neurocomputing, vol. 50, pp. 159-175, January 2003.
[26] M. E. Nor, H. M. Safuan, N. F. M. Shab, M. A. A. Abdullah, N. A. I. Mohamad and M. H. Lee, "Neural network versus classical time series forecasting models," in AIP Conference Proceedings: THE 3RD ISM International Statistical Conference 2016 (ISM-III): Bringing Professionalism and Prestige in Statistics, Kuala Lumpur, Malaysia, May 2017.
[27] W. Kristjanpoller and M. C. Minutolo, "Gold price volatility: A forecasting approach using the Artificial Neural Network–GARCH model," Expert Systems with Applications, vol. 42, no. 20, pp. 7245-7251, November 2015.
[28] S. Masum, Y. Liu and J. Chiverton, "Multi-step Time Series Forecasting of Electric Load Using Machine Learning Models," in International Conference on Artificial Intelligence and Soft Computing, 2018.
[29] P. Coulibaly and C. K. Baldwin, "Nonstationary hydrological time series forecasting using nonlinear dynamic methods," Journal of Hydrology, vol. 307, no. 1–4, pp. 164-174, June 2005.
[30] B. B. Sahoo, R. Jha, A. Singh and D. Kumar, "Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting," Acta Geophys, vol. 67, p. 1471–1481, 2019.
[31] S.-Y. Shih, F.-K. Sun and H.-y. Lee, "Temporal pattern attention for multivariate time series forecasting," Machine Learning, vol. 108, p. 1421–1441, 2019.
[32] A. Tealab, "Time series forecasting using artificial neural networks methodologies: A systematic review," Future Computing and Informatics Journal, vol. 3, no. 2, pp. 334-340, December 2018.
[33] J. F. Torres, D. Hadjout, A. Sebaa, F. Martínez-Álvarez and A. Troncoso, "Deep Learning for Time Series Forecasting: A Survey," Big Data, vol. 9, no. 1, February 5, 2021.
[34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, "Attention is all you need," in 31st Conference on Neural Information Processing Systems (NIPS), 2017.
[35] A. Zeyer, P. Bahar, K. Irie, R. Schlüter and H. Ney, "A Comparison of Transformer and LSTM Encoder Decoder Models for ASR," in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , December 2019.
[36] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov and S. Zagoruyko, "End-to-end object detection with transformers," in European Conference on Computer Vision, 2020.
[37] X. Wang, R. Girshick, A. Gupta and K. He, "Non-local neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
[38] N. Parmar, A. Vaswani, J. Uszkoreit, Ł. Kaiser, N. Shazeer, A. Ku and D. Tran, in Proceedings of the 35th International Conference on Machine Learning, 2018.
[39] Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, vol. 521, p. 436–444, 2015.
[40] N. Wu, B. Green, X. Ben and S. O'Banion, "Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case," arXiv preprint arXiv:2001.08317, 2020.
[41] P. Le and W. Zuidema, "Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs," arXiv preprint, 2016.
[42] A. H. Ribeiro, K. Tiels, L. A. Aguirre and T. Schön, "Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness," in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, 2020.
[43] S. Li, W. Li, C. Cook, C. Zhu and Y. Gao, "Independently recurrent neural network (indrnn): Building a longer and deeper rnn," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
[44] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, November 1997.
[45] S. Saxena, "Introduction to Long Short Term Memory (LSTM)," 16 March 2021. (Online). Available: https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-memory-lstm/.
[46] P. Srivastava, "Essentials of Deep Learning: Introduction to Long Short-Term Memory," 10 December 2017. (Online). Available: https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/.
[47] K. Cho, B. v. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk and Y. Bengio, "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation," arXiv preprint arXiv, 2014.
[48] S. Kostadinov, "Understanding GRU Networks," 16 December 2017. (Online). Available: https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be.
[49] J. Alammar, "The Illustrated Transformer," 27 June 2018. (Online). Available: https://jalammar.github.io/illustrated-transformer/.
[50] A. M. Rush, "The Annotated Transformer," 3 April 2018. (Online). Available: http://nlp.seas.harvard.edu/2018/04/03/attention.html.
[51] D. Dua and C. Graff, "UCI Machine Learning Repository," University of California, Irvine, School of Information and Computer Sciences, 2017. (Online). Available: https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data.
[52] Q. Zhang, "Housing Price Prediction Based on Multiple Linear Regression," Scientific Programming, vol. 2021, 2019.
[53] S. G. K. Patro and K. K. Sahu, "Normalization: A preprocessing stage," arXiv preprint arXiv, 2015.
[54] Pytorch, "torch.optim.ADAMW," Facebok, 2019. (Online). Available: https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html.