Variational Explanation Generator: Generating Explanation for Natural Language Inference Using Variational Auto-Encoder

Zhen Cheng; Xinyu Dai; Shujian Huang; Jiajun Chen

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Variational Explanation Generator: Generating Explanation for Natural Language Inference Using Variational Auto-Encoder

Authors: Zhen Cheng, Xinyu Dai, Shujian Huang, Jiajun Chen

Abstract:

Recently, explanatory natural language inference has attracted much attention for the interpretability of logic relationship prediction, which is also known as explanation generation for Natural Language Inference (NLI). Existing explanation generators based on discriminative Encoder-Decoder architecture have achieved noticeable results. However, we find that these discriminative generators usually generate explanations with correct evidence but incorrect logic semantic. It is due to that logic information is implicitly encoded in the premise-hypothesis pairs and difficult to model. Actually, logic information identically exists between premise-hypothesis pair and explanation. And it is easy to extract logic information that is explicitly contained in the target explanation. Hence we assume that there exists a latent space of logic information while generating explanations. Specifically, we propose a generative model called Variational Explanation Generator (VariationalEG) with a latent variable to model this space. Training with the guide of explicit logic information in target explanations, latent variable in VariationalEG could capture the implicit logic information in premise-hypothesis pairs effectively. Additionally, to tackle the problem of posterior collapse while training VariaztionalEG, we propose a simple yet effective approach called Logic Supervision on the latent variable to force it to encode logic information. Experiments on explanation generation benchmark—explanation-Stanford Natural Language Inference (e-SNLI) demonstrate that the proposed VariationalEG achieves significant improvement compared to previous studies and yields a state-of-the-art result. Furthermore, we perform the analysis of generated explanations to demonstrate the effect of the latent variable.

Keywords: Natural Language Inference, explanation generation, variational auto-encoder, generative model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 698

References:

[1] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” in BlackboxNLP@EMNLP, 2018.
[2] O.-M. Camburu, T. Rocktäschel, T. Lukasiewicz, and P. Blunsom, “e-snli: Natural language inference with natural language explanations,” in NeurIPS, 2018.
[3] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in NIPS, 2014.
[4] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” CoRR, vol. abs/1409.0473, 2014.
[5] T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in EMNLP, 2015.
[6] K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” in NIPS, 2015.
[7] S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Józefowicz, and S. Bengio, “Generating sentences from a continuous space,” in CoNLL, 2015.
[8] S. Zhao, J. Song, and S. Ermon, “Infovae: Information maximizing variational autoencoders,” ArXiv, vol. abs/1706.02262, 2017.
[9] T. Zhao, R. Zhao, and M. Eskénazi, “Learning discourse-level diversity for neural dialog models using conditional variational autoencoders,” in ACL, 2017.
[10] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” CoRR, vol. abs/1312.6114, 2013.
[11] B. MacCartney and C. D. Manning, Natural language inference. Citeseer, 2009.
[12] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in ICML, 2014.
[13] H. Shah and D. Barber, “Generative neural machine translation,” in NeurIPS, 2018.
[14] I. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. C. Courville, and Y. Bengio, “A hierarchical latent variable encoder-decoder model for generating dialogues,” in AAAI, 2016.
[15] T. Zhao, K. Lee, and M. Eskénazi, “Unsupervised discrete sentence representation learning for interpretable neural dialog generation,” in ACL, 2018.
[16] Y. Bao, H. Zhou, S. Huang, L. Li, L. Mou, O. Vechtomova, X. Dai, and J. Chen, “Generating sentences from disentangled syntactic and semantic spaces,” in ACL, 2019.
[17] B. Zhang, D. Xiong, J. Su, H. Duan, and M. Zhang, “Variational neural machine translation,” in EMNLP, 2016.
[18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in NAACL-HLT, 2018.
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017.
[20] S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A large annotated corpus for learning natural language inference,” in EMNLP, 2015.
[21] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in ACL, 2001.
[22] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. D.-I. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in NeurIPS 2019, 2019.
[23] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2015.
[24] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.
[25] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in AISTATS, 2010.