Evaluating Generative Neural Attention Weights-Based Chatbot on Customer Support Twitter Dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32881
Evaluating Generative Neural Attention Weights-Based Chatbot on Customer Support Twitter Dataset

Authors: Sinarwati Mohamad Suhaili, Naomie Salim, Mohamad Nazim Jambli


Sequence-to-sequence (seq2seq) models augmented with attention mechanisms are increasingly important in automated customer service. These models, adept at recognizing complex relationships between input and output sequences, are essential for optimizing chatbot responses. Central to these mechanisms are neural attention weights that determine the model’s focus during sequence generation. Despite their widespread use, there remains a gap in the comparative analysis of different attention weighting functions within seq2seq models, particularly in the context of chatbots utilizing the Customer Support Twitter (CST) dataset. This study addresses this gap by evaluating four distinct attention-scoring functions—dot, multiplicative/general, additive, and an extended multiplicative function with a tanh activation parameter — in neural generative seq2seq models. Using the CST dataset, these models were trained and evaluated over 10 epochs with the AdamW optimizer. Evaluation criteria included validation loss and BLEU scores implemented under both greedy and beam search strategies with a beam size of k = 3. Results indicate that the model with the tanh-augmented multiplicative function significantly outperforms its counterparts, achieving the lowest validation loss (1.136484) and the highest BLEU scores (0.438926 under greedy search, 0.443000 under beam search, k = 3). These findings emphasize the crucial influence of selecting an appropriate attention-scoring function to enhance the performance of seq2seq models for chatbots, particularly highlighting the model integrating tanh activation as a promising approach to improving chatbot quality in customer support contexts.

Keywords: Attention weight, chatbot, encoder-decoder, neural generative attention, score function, sequence-to-sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 0


[1] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, pp. 3104–3112, 2014.
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[3] K. Cho, B. van Merrienboer, C¸ . G¨ulc¸ehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL (A. Moschitti, B. Pang, and W. Daelemans, eds.), pp. 1724–1734, ACL, 2014.
[4] M. Hardalov, I. Koychev, and P. Nakov, “Towards automated customer support,” in Artificial Intelligence: Methodology, Systems, and Applications: 18th International Conference, AIMSA 2018, Varna, Bulgaria, September 12–14, 2018, Proceedings 18, pp. 48–59, Springer, 2018.
[5] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, p. 3776–3783, AAAI Press, 2016.
[6] A. Xu, Z. Liu, Y. Guo, V. Sinha, and R. Akkiraju, “A new chatbot for customer service on social media,” in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, (New York, NY, USA), p. 3506–3510, Association for Computing Machinery, 2017.
[7] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[8] T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (Stroudsburg, PA, USA), Association for Computational Linguistics, 2015.
[9] C. Xing, W. Wu, Y. Wu, J. Liu, Y. Huang, M. Zhou, and W.-Y. Ma, “Topic aware neural response generation.,” in AAAI (S. P. Singh and S. Markovitch, eds.), pp. 3351–3357, AAAI Press, 2017.
[10] Z. Wang, Z. Wang, Y. Long, J. Wang, Z. Xu, and B. Wang, “Enhancing generative conversational service agents with dialog history and external knowledge,” Comput. Speech Lang., vol. 54, pp. 71–85, 2019.
[11] F. Kassawat, D. Chaudhuri, and J. Lehmann, “Incorporating joint embeddings into goal-oriented dialogues with multi-task learning,” in European Semantic Web Conference, pp. 225–239, Springer, 2019.
[12] G.-P. Yang and H. Tang, “Supervised attention in sequence-to-sequence models for speech recognition,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7222–7226, 2022.
[13] S. M. Suhaili, N. Salim, and M. N. Jambli, “A comparative analysis of generative neural attention-based service chatbot,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 8, 2022.
[14] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” 2016. cite arxiv:1607.04606Comment: Accepted to TACL. The two first authors contributed equally.
[15] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. J´egou, and T. Mikolov, “Fasttext.zip: Compressing text classification models.,” CoRR, vol. abs/1612.03651, 2016.
[16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119, 2013.
[17] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
[18] C. C. Aggarwal, Neural networks and deep learning: A textbook. Cham, Switzerland: Springer International Publishing, 2 ed., 2023.
[19] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., “Tensorflow: A system for large-scale machine learning.,” in OSDI, vol. 16, pp. 265–283, 2016.
[20] L. Mou and Z. Jin, Tree-Based Convolutional Neural Networks: Principles and Applications. Springer Publishing Company, Incorporated, 1st ed., 2018.
[21] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks.,” in AISTATS (Y. W. Teh and D. M. Titterington, eds.), vol. 9 of JMLR Proceedings, pp. 249–256, JMLR.org, 2010.