A Black-box Approach for Response Quality Evaluation of Conversational Agent Systems
Authors: Ong Sing Goh, C. Ardil, Wilson Wong, Chun Che Fung
Abstract:
The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more domain specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time to achieve high quality responses. This paper discusses the inappropriateness of the existing measures for response quality evaluation and the call for new standard measures and related considerations are brought forward. As a short-term solution for evaluating response quality of conversational agents, and to demonstrate the challenges in evaluating systems of different nature, this research proposes a blackbox approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems, AnswerBus, START and AINI.
Keywords: Evaluation, conversational agents, Response Quality, chatterbots
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1076854
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1930References:
[1] J. Lin, V. Sinha, B. Katz, K. Bakshi, D. Quan, D. Huynh, and D. Karger, "What Makes a Good Answer? The Role of Context in Question Answering," presented at.the 9th International Conference on Human- Computer Interaction, 2003.
[2] L. Hirschman and R. Gaizauskas., "Natural Language Question Answering: The View from Here," Natural Language Engineering, vol. 7, pp. 275-300, 2001.
[3] U. Hermjakob, "Parsing and Question Classification for Question Answering," presented at the ACL Workshop on Open-Domain Question Answering, 2001.
[4] Z. Zheng, "Developing a Web-based Question Answering System," presented at.the 11th International Conference on World Wide Web, 2002a.
[5] C. Kwok, D. Weld, and O. Etzioni, "Scaling Question Answering to the Web," ACM Transactions on Information Systems, vol. 19, pp. 242-262, 2001.
[6] P. Zweigenbaum, "Question Answering in Biomedicine," presented at the 10th Conference of the European Chapter of the Association for Computational Linguistics, 2003.
[7] H. Chung, K. Han, H. Rim, S. Kim, J. Lee, Y. Song, and D.Yoon, "A Practical QA System in Restricted Domains," presented at the ACL Workshop on Question Answering in Restricted Domains, 2004.
[8] F. Benamara, "Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment," presented at the ACL Workshop on Question Answering in Restricted Domains, 2004.
[9] F. Benamara and P. Saint-Dizier, "Advanced Relaxation for Cooperative Question Answering," in.New Directions in Question Answering: MIT Press, 2004.
[10] W. Wong, O. S. Goh, M. I. Desa, and S. Sahib, "Online Cyberlaw Knowledge Base Construction Using Semantic Network," presented at International Conference on Computational Intelligence for Modelling, Control and Automation, Rhodes, Greece, 2004.
[11] O. S. Goh, C. C. Fung, and M. P. Lee, "Intelligent Agents for an Internet-based Global Crisis Communication System," Journal of Technology Management and Entrepreneurship, vol. 2, pp. 65-78, 2005.
[12] B. Katz and J. Lin, "START and Beyond.," presented at the 6th World Multiconference Systemics, Cybernetics and Informatics, 2002.
[13] B. Katz, "Annotating the World Wide Web using Natural Language," presented at the 5th Conference on Computer Assisted Information Searching on the Internet., 1997.
[14] D. Moldovan, M. Pasca, M. Surdeanu, and S. Harabagiu., "Performance Issues and Error Analysis in an Open-Domain Question Answering System," presented at the 40th Annual Meeting of the Association for Computational Linguistics, 2002.
[15] J. Allen, D. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and A. Stent, "Towards conversational human-computer interaction," AI Magazine, vol. 22, 2001.
[16] J. Cassell, "Embodied Conversation: Integrating Face and Gesture into Automatic Spoken Dialogue Systems," in Spoken Dialogue Systems, Luperfoy, Ed.: MIT Press, to appear.
[17] R. J. Lempert, S. W. Popper, and S. C. Bankes, Shaping the next one hundred years: new methods for quantitative, long-term policy analysis. Santa Monica, CA.: RAND, 2003.
[18] O. S. Goh and C. C. Fung, "Automated Knowledge Extraction from Internet for a Crisis Communication Portal," in First International Conference on Natural Computation. Changsha, China: Lecture Notes in Computer Science (LNCS), 2005, pp. 1226-1235.
[19] J. A. Fodor, Elm and the Expert: An Introduction to Mentalese and Its Semantics: Cambridge University Press, 1994.
[20] R. A. Brooks, "The Cog Project: Building a Humanoid Robot," presented at The 1st International Conference on Humanoid Robots and Human friendly Robots, Tsukuba, Japan, 1998.
[21] O. S. Goh, A. Depickere, C. C. Fung, and K. W. Wong, "Top-down Natural Language Query Approach for Embodied Conversational Agent," presented at the International MultiConference of Engineers and Computer Scientists 2006, Hong Kong, 2006.
[22] M. King, "Evaluating Natural Language Processing Systems," Communications of the ACM., vol. 39, pp. 73-79, 1996.
[23] E. Voorhees, "Overview of TREC 2003," presented at the 12th Text Retrieval Conference, 2003.
[24] J. Facemire, "A Proposed Metric for the Evaluation of Natural Language Systems," presented at the IEEE Energy and Information Technologies in the Southeast,, 1989.
[25] G. Guida and G. Mauri, "A Formal Basis for Performance Evaluation of Natural Language Understanding Systems.," Computational Linguistics., vol. 10, pp. 15-30, 1984.
[26] A. Srivastava and V. Rajaraman, "A Vector Measure for the Intelligence of a Question-Answering (Q-A) System," IEEE Transactions on Systems: Man and Cybernetics., vol. 25, pp. 814-823, 1995.
[27] J. Allen, Natural Language Understanding: Benjamin/Cummins Publishing, 1995.
[28] E. Nyberg and T. Mitamura, "Evaluating QA Systems on Multiple Dimensions," presented at the Workshop on QA Strategy and Resources, 2002.
[29] A. Diekema, O. Yilmazel, and E. Liddy., "Evaluation of Restricted Domain Question-Answering Systems," presented at the ACL Workshop on Question Answering in Restricted Domains, 2004.
[30] M. Maybury, "Toward a Question Answering Roadmap," presented at the AAAI Spring Symposium on New Directions in Question Answering, 2003.
[31] Z. Zheng, "AnswerBus Question Answering System," presented at the Conference on Human Language Technology, 2002b.
[32] B. Katz, S. Felshin, and J. Lin, "The START Multimedia Information System: Current Technology and Future Directions," presented at the International Workshop on Multimedia Information Systems, 2002.