Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32601
Deep Reinforcement Learning Approach for Trading Automation in the Stock Market

Authors: Taylan Kabbani, Ekrem Duman


Deep Reinforcement Learning (DRL) algorithms can scale to previously intractable problems. The automation of profit generation in the stock market is possible using DRL, by combining  the financial assets price ”prediction” step and the ”allocation” step of the portfolio in one unified process to produce fully autonomous systems capable of interacting with its environment to make optimal decisions through trial and error. This work represents a DRL model to generate profitable trades in the stock market, effectively overcoming the limitations of supervised learning approaches. We formulate the trading problem as a Partially observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market, such as liquidity and transaction costs. We then solved the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and achieved a 2.68 Sharpe ratio on the test dataset. From the point of view of stock market forecasting and the intelligent decision-making mechanism, this paper demonstrates the superiority of DRL in financial markets over other types of machine learning and proves its credibility and advantages of strategic decision-making.

Keywords: Autonomous agent, deep reinforcement learning, MDP, sentiment analysis, stock market, technical indicators, twin delayed deep deterministic policy gradient.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 338


[1] J. Patel, S. Shah, P. Thakkar, and K. Kotecha, “Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques,” Expert Systems with Applications, vol. 42, no. 1, pp. 259–268, 2015.
[2] A. Tsantekidis, N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and A. Iosifidis, “Forecasting stock prices from the limit order book using convolutional neural networks,” in 2017 IEEE 19th Conference on Business Informatics (CBI), vol. 01, pp. 7–12, 2017.
[3] A. Ntakaris, J. Kanniainen, M. Gabbouj, and A. Iosifidis, “Mid-price prediction based on machine learning methods with technical and quantitative indicators,” PLOS ONE, vol. 15, pp. 1–39, 06 2020.
[4] Y. Hao and Q. Gao, “Predicting the trend of stock market index using the hybrid neural network based on multiple time scale feature learning,” Applied Sciences, vol. 10, no. 11, 2020.
[5] M. M. L. de Prado, “The 10 reasons most machine learning funds fail,” WGSRN: Data Collection & Empirical Methods (Topic), 2018.
[6] T. L. Meng and M. Khushi, “Reinforcement learning in financial markets,” Data, vol. 4, no. 3, 2019.
[7] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2010.
[8] S. Chakraborty, “Capturing financial markets to apply deep reinforcement learning,” 2019.
[9] M. R. Vargas, C. E. M. dos Anjos, G. L. G. Bichara, and A. G. Evsukoff, “Deep leaming for stock market prediction using technical indicators and financial news articles,” in 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 2018.
[10] M. Corazza and F. Bertoluzzo, “Q-learning-based financial trading systems with applications,” Working Papers 2014:15, Department of Economics, University of Venice ”Ca’ Foscari”, 2014.
[11] Z. Tan, C. Quek, and P. Y. Cheng, “Stock trading with cycles: A financial application of anfis and reinforcement learning,” Expert Systems with Applications, vol. 38, no. 5, pp. 4741–4755, 2011.
[12] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, “Deep direct reinforcement learning for financial signal representation and trading,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 653–664, 2017.
[13] O. Alagoz, H. Hsu, A. J. Schaefer, and M. S. Roberts, “Markov decision processes: A tool for sequential decision making under uncertainty,” Medical Decision Making, vol. 30, no. 4, p. 474–483, 2009.
[14] R. S. Sutton, F. Bach, and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press Ltd, 2018.
[15] R. E. Bellman, Dynamic programming. Princeton University Press, 2010.
[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” 2013.
[17] V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–33, 02 2015.
[18] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” CoRR, vol. abs/1509.06461, 2015.
[19] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” 2019.
[20] S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” CoRR, vol. abs/1802.09477, 2018.
[21] F. Bertoluzzo and M. Corazza, “Testing different reinforcement learning configurations for financial trading: Introduction and applications,” Procedia Economics and Finance, vol. 3, pp. 68–77, 2012. International Conference Emerging Markets Queries in Finance and Business, Petru Maior University of Tˆırgu-Mures, ROMANIA, October 24th - 27th, 2012.
[22] L. Conegundes and A. C. M. Pereira, “Beating the stock market with a deep reinforcement learning day trading system,” in 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 2020.
[23] C. Kirkpatrick and J. R. Dahlquist, “Technical analysis: The complete resource for financial market technicians,” 2006.
[24] J. R. Nofsinger, “The impact of public information on investors,” Journal of Banking & Finance, vol. 25, no. 7, pp. 1339–1366, 2001.
[25] Z. Xiong, X.-Y. Liu, S. Zhong, H. Yang, and A. Walid, “Practical deep reinforcement learning approach for stock trading,” 2018.
[26] “Investopedia – slippage definition.” terms/s/slippage.asp.
[Online; accessed 02-October-2021].
[27] Z. Jiang, D. Xu, and J. Liang, “A deep reinforcement learning framework for the financial portfolio management problem,” 2017.
[28] A. Akhmetzyanov, R. Yagfarov, S. Gafurov, M. Ostanin, and A. Klimchik, “Continuous control in deep reinforcement learning with direct policy derivation from q network,” in Human Interaction, Emerging Technologies and Future Applications II, (Cham), pp. 168–174, Springer International Publishing, 2020.
[29] T. T.-L. Chong, W.-K. Ng, and V. K.-S. Liew, “Revisiting the performance of macd and rsi oscillators,” Journal of Risk and Financial Management, vol. 7, no. 1, pp. 1–12, 2014.
[30] J. Granville, Granville’s New Key to Stock Market Profits. Papamoa Press, 2018.
[31] X. Ding, Y. Zhang, T. Liu, and J. Duan, “Using structured events to predict stock price movement: An empirical investigation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Doha, Qatar), pp. 1415–1425, Association for Computational Linguistics, Oct. 2014.
[32] D. Araci, “Finbert: Financial sentiment analysis with pre-trained language models,” 2019.
[33] “Yahoo finance.”
[34] G. Manoim, “exchange-calendars.” exchange-calendars/.
[35] “Kaggle – daily financial news for 6000+ stocks.” https://www.kaggle. com/miguelaenlle/massive-stock-news-analysis-db-for-nlpbacktests.
[Online; accessed 15-November-2021].
[36] “Kaggle – sun, j. (2016, august). daily news for stock market prediction.”
[Online; accessed 15-November-2021].
[37] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” 2019.
[38] W. F. Sharpe, “The sharpe ratio,” The Journal of Portfolio Management, vol. 21, no. 1, pp. 49–58, 1994.
[39] S. Kau, “Algorithmic trading using reinforcement learning augmented with hidden markov model. working paper, stanford university.,” 2017.