On Dialogue Systems Based on Deep Learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32807
On Dialogue Systems Based on Deep Learning

Authors: Yifan Fan, Xudong Luo, Pingping Lin

Abstract:

Nowadays, dialogue systems increasingly become the way for humans to access many computer systems. So, humans can interact with computers in natural language. A dialogue system consists of three parts: understanding what humans say in natural language, managing dialogue, and generating responses in natural language. In this paper, we survey deep learning based methods for dialogue management, response generation and dialogue evaluation. Specifically, these methods are based on neural network, long short-term memory network, deep reinforcement learning, pre-training and generative adversarial network. We compare these methods and point out the further research directions.

Keywords: Dialogue management, response generation, reinforcement learning, deep learning, evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 718

References:


[1] A.F. Agarap. A neural network architecture combining gated recurrent unit (GRU) and support vector machine (SVM) for intrusion detection in network traffic data. In Proceedings of the 10th International Conference on Machine Learning and Computing, pages 26–30, 2018.
[2] M.Z. Alom, T.M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M.S. Nasrin, M. Hasan, B.C. Van Essen, A.A.S. Awwal, and V.K. Asari. A state-of-the-art survey on deep learning theory and architectures. Electronics, 8(3):292, 2019.
[3] K. Arulkumaran, M.P. Deisenroth, M. Brundage, and A.A. Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
[4] K. Asadi and J.D. Williams. Sample-efficient deep reinforcement learning for dialog control. arXiv preprint arXiv:1612.06000, 2016.
[5] T. Baltruˇsaitis, C. Ahuja, and L. Morency. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2):423–443, 2018.
[6] T. B Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
[7] E. Bruni and R. Fern´andez. Adversarial evaluation for open-domain dialogue generation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 284–288, 2017.
[8] P. Budzianowski, T. Wen, B. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gaˇsi´c. Multiwoz - a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, page 50165026, 2018.
[9] H. Chen, X. Liu, D. Yin, and J. Tang. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, 19(2):25–35, 2017.
[10] L. Chen, Z. Chen, B. Tan, S. Long, M. Gaˇsi´c, and K. Yu. Agentgraph: Toward universal dialogue management with structured deep reinforcement learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(9):1378–1391, 2019.
[11] H. Cuay´ahuitl, D. Lee, S. Ryu, Y. Cho, S. Choi, S. Indurthi, S. Yu, H. Choi, I. Hwang, and J. Kim. Ensemble-based deep reinforcement learning for chatbots. Neurocomputing, 366:118–130, 2019.
[12] J. Deriu, A. Rodrigo, A. Otegi, G. Echegoyen, S. Rosset, E. Agirre, and M. Cieliebak. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review, pages 1–56, 2020.
[13] J. Devlin, M.W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Volume 1: Long and Short Papers), page 41714186, 2019.
[14] L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou, and H.-W Hon. Unified language model pre-training for natural language understanding and generation. In Proceedings of the 2019 Advances in Neural Information Processing Systems, pages 13063–13075, 2019.
[15] O. Duˇsek, J. Novikova, and V. Rieser. Evaluating the state-of-the-art of end-to-end natural language generation: The E2E NLG challenge. Computer Speech & Language, 59:123–156, 2020.
[16] P. Ehrenbrink, S. Osman, and S. M¨oller. Google now is for the extraverted, cortana for the introverted: Investigating the influence of personality on ipa preference. In Proceedings of the 29th Australian Conference on Computer-Human Interaction, pages 257–265, 2017.
[17] M. Eric and C. D. Manning. Key-value retrieval networks for task-oriented dialogue. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 37–49, 2017.
[18] S. Feng, H. Chen, K Li, and D. Yin. Posterior-GAN: Towards informative and coherent response generation with posterior generative adversarial network. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, pages 7708–7715, 2020.
[19] M. Ghazvininejad, C. Brockett, and M. Chang. A knowledge-grounded neural conversation model. In Proceedings of the 2018 National Conference on Artificial Intelligence, pages 5110–5117, 2018.
[20] T. Holstein, M. Wallmyr, J. Wietzke, and R. Land. Current Challenges in Compositing Heterogeneous User Interfaces for Automotive Purposes, pages 531–542. Computer Science, 2015.
[21] V. Ilievski, C. Musat, A. Hossmann, and M. Baeriswyl. Goal-oriented chatbot dialog management bootstrapping with transfer learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence Organization, pages 4115–4120, 2018.
[22] A. Kannan and O. Vinyals. Adversarial evaluation of dialogue models. arXiv preprint arXiv:1701.08198, 2017.
[23] J. Kim, S. Oh, O.-W. Kwon, and H. Kim. Multi-turn chatbot based on query-context attentions and dual wasserstein generative adversarial networks. Applied Sciences, 9(18):3908, 2019.
[24] A. Kumar, P. Ku, A. Goyal, A. Metallinou, and D.H. Tur. Ma-dst: Multi-attention based scalable dialog state tracking. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, pages 8107–8114, 2020.
[25] H. Kumar, A. Agarwal, R. Dasgupta, and S. Joshi. Dialogue act sequence labeling using hierarchical encoder with CRF. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 3440–3446, 2018.
[26] J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky. Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1192–1202, 2016.
[27] J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, and D. Jurafsky. Adversarial learning for neural dialogue generation. In Proceedings of the 22nd Empirical Methods in Natural Language Processing, page 21572169, 2017.
[28] Y. Li, K. Qian, W.Y. Shi, and Z. Yu. End-to-end trainable non-collaborative dialog system. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, pages 8293–8302, 2020.
[29] Z.M. Li, J. Kiseleva, and M.D. Rijke. Dialogue generation: From imitation learning to inverse reinforcement learning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pages 6722–6728, 2019.
[30] R. Lowe, M. Noseworthy, I.V. Serban, N. Angelard-Gontier, and J. Pineau. Towards an automatic turing test: Learning to evaluate dialogue responses. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1116–1126, 2017.
[31] R. Lowe, I.V. Serban, M. Noseworthy, L. Charlin, and J. Pineau. On the evaluation of dialogue systems with next utterance classification. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 264–269, 2016.
[32] V.N. Lu, J. Wirtz, W. H. Kunz, S. Paluch, T. Gruber, A. Martins, and P. G. Patterson. Service robots, customers and service employees: What can we learn from the academic literature and where are the gaps? Journal of Service Theory and Practice, 2020.
[33] A. Madotto, C.S. Wu, and P. Fung. Mem2seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 1468–1478, 2018.
[34] N. Majumder, S.J. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and E. Cambria. Dialoguernn: An attentive rnn for emotion detection in conversations. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pages 6818–6824, 2019.
[35] E. Merdivan, D. Singh, S. Hanke, and A. Holzinger. Dialogue systems for intelligent human computer interactions. Electronic Notes in Theoretical Computer Science, 343:5771, 2019.
[36] F. Mi, M. Huang, J. Zhang, and B. Faltings. Meta-learning for low-resource natural language generation in task-oriented dialogue systems. In Proceedings of the 28th International Joint Conference on Artificial Intelligence Organization, pages 3151–3157, 2019.
[37] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations, pages 5998–6008, 2017.
[38] T. Mikolov, I. Sutskever, K. Chen, Greg S. C., and J. Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 2013 Advances in neural information processing systems, pages 3111–3119, 2013.
[39] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[40] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
[41] N. Mrkˇsi´c, D.O. S´eaghdha, B. Thomson, M. Gaˇsi´c, P.-H. Su, D. Vandyke, T.-H. Wen, and S. Young. Multi-domain dialog state tracking using recurrent neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, volume 2, pages 794–799, 2015.
[42] N. Mrkˇsi´c, D.O. S´eaghdha, T.-H. Wen, B. Thomson, and S. Young. Neural belief tracker: Data-driven dialogue state tracking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, volume 1, pages 1777–1788, 2017.
[43] A. Papangelis and Y. Stylianou. Single-model multi-domain dialogue management with deep learning. In Advanced Social Interaction with Agents, pages 71–77. 2019.
[44] M.-J. Peng, Y.W. Qin, C.X. Tang, and X.M. Deng. An e-commerce customer service robot based on intention recognition model. Journal of Electronic Commerce in Organizations, 14(1):34–44, 2016.
[45] J. Pennington, R. Socher, and C.D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
[46] L.-B Qin, X. Xu, W.-X Che, Y. Zhang, and T. Liu. Dynamic fusion network for multi-domain end-to-end task-oriented dialog. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, page 63446354, 2020.
[47] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P.J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
[48] I.V. Serban, R. Lowe, P. Henderson, L. Charlin, and J. Pineau. A survey of available corpora for building data-driven dialogue systems: The journal version. Dialogue & Discourse, 9(1):1–49, 2018.
[49] I.V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages 3776–3783, 2016.
[50] X.Y. Shen, H. Su, S.Z. Niu, and V. Demberg. Improving variational encoder-decoders in dialogue generation. In Proceedings of the 32nd Association for the Advancement of Artificial Intelligence, pages 5456–5462, 2018.
[51] O. Sihombing, N. Zendrato, Y. Laia, M. Nababan, D. Sitanggang, W. Purba, D. Batubara, S. Aisyah, E. Indra, and S. Siregar. Smart home design for electronic devices monitoring based wireless gateway network using cisco packet tracer. Journal of Physics Conference Series, 1007(1):12–21, 2018.
[52] H.-Y Song, W.-N Zhang, and T. Liu. Open domain multi-round dialogue strategy learning based on dqn. Journal of Chinese Information Processing, 32:99–108, 2018.
[53] H. Su, X.Y. Shen, P.W. Hu, W.J. Li, and Y. Chen. Dialogue generation with gan. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 8163–8163, 2018.
[54] M.H. Su, C.H.Wu, K.Y. Huang, T.H. Yang, and T.C. Huang. Dialog state tracking for interview coaching using two-level LSTM. In Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, pages 1–5, 2016.
[55] S. Subramanian, S.R. Mudumba, A. Sordoni, A. Trischler, A. C. Courville, and C. Pal. Towards text generation with adversarially learned neuraloutlines. In Proceedings of the 32nd Conference on Neural Information Processing Systems, volume 31, pages 2–9, 2018.
[56] X.W. Tong, Z.X. Fu, M.Y. Shang, D.Y. Zhao, and R. Yan. One ruler for all languages: Multi-lingual dialogue evaluation with adversarial multi-task learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence Organization, pages 4432–4437, 2018.
[57] V.K. Tran and L.M. Nguyen. Natural language generation for spokendialogue system usingrnn encoder-decoder networks. In Proceedings of the 21st Conference on Computational Natural Language Learning, pages 442–451, 2017.
[58] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkorei, L. Jones, A. Gomez, and L. Kaiser. Attention is all you need. In Proceedings of the 2017 Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
[59] J. Wang, J.H. Liu, W. Bi, X.J. Liu, K.J. He, R.F. Xu, and M. Yang. Improving knowledge-aware dialogue generation via knowledge base question answering. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, pages 1–8, 2020.
[60] X.-G Wang, X.-Y Cheng, J. Zhou, and W. Xu. State tracking networks for dialog state tracking. In Proceedings of the Workshops of the 32nd AAAI Conference on Artificial Intelligence, pages 746–751, 2018.
[61] T.-H. Wen and S. Young. Recurrent neural network language generation for spoken dialogue systems. Computer Speech & Language, 63:101017, 2020.
[62] Y. Wu, Z. Li, W. Wu, and M. Zhou. Response selection with topic clues for retrieval-based chatbots. Neurocomputing, 316:251–261, 2018.
[63] Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, volume 1, pages 496–505, 2017.
[64] Z. Wu, Z. Liu, J. Lin, Y. Lin, and S. Han. Lite transformer with long-short range attention. In Proceedings of the 8th International Conference on Learning Representations, pages 1–12, 2020.
[65] R. Yan. chitty-chitty-chat bot: Deep learning for conversation AI. In Proceedings of the 2018 International Joint Conference on Artificial Intelligence Organization, pages 5520–5526, 2018.
[66] H.-T. Ye, K.-L. Lo, S.-Y. Su, and Y.-N. Chen. Knowledge-grounded response generation with deep attentional latent-variable model. Computer Speech & Language, page 101069, 2020.
[67] H.N. Zhang, Y.Y. Lan, J.F. Guo, J. Xu, and X.Q. Cheng. Reinforcing coherence for sequence to sequence model in dialogue generation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence Organization, pages 4567–4572, 2018.
[68] W.-N. Zhang, Y.-Z. Zhang, and T. Liu. Survey of evaluation methods for dialogue systems. Science in China: Information Science, 47(8):953966, 2017. (In chinese).
[69] W.E. Zhang, Q.Z. Sheng, A. Alhazmi, and C. Li. Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology, 11(3):1–41, 2020.
[70] T. Zhao, K. Lee, and M. Eskenazi. Unsupervised discrete sentence representation learning for interpretable neural dialog generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), page 10981107, 2018.
[71] T.-C. Zhao, K. Xie, and M. Eskenazi. Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, page 12081218, 2019.
[72] Y.-Q. Zhao and Y. Xiang. Dialog generation based on hierarchical encoding and deep reinforcement learning. Journal of Computer Applications, 37(10):2813–2818, 2017. (In chinese).
[73] G.B. Zhou, Q. Luo, Y.J. Xiao, F. Lin, B. Chen, and Q. He. Elastic responding machine for dialog generation with dynamically mechanism selecting. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 5730–5737, 2018.
[74] H. Zhou, M. Huang, T.-Y. Zhang, X.-Y. Zhu, and L. Bing. Emotional chatting machine: Emotional conversation generation with internal and external memory. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 730–739, 2018.
[75] H. Zhou, M. Huang, and X. Zhu. Context-aware natural language generation for spoken dialogue systems. In Proceedings of the 26th International Conference on Computational Linguistics, pages 2032–2041, 2016.