Response Quality Evaluation in Heterogeneous Question Answering System: A Black-box Approach
Authors: Goh Ong Sing, C. Ardil, Wilson Wong, Shahrin Sahib
The evaluation of the question answering system is a major research area that needs much attention. Before the rise of domain-oriented question answering systems based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when question answering systems began to be more domains specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time achieve higher quality responses The research in this paper discusses the inappropriateness of the existing measure for response quality evaluation and in a later part, the call for new standard measures and the related considerations are brought forward. As a short-term solution for evaluating response quality of heterogeneous systems, and to demonstrate the challenges in evaluating systems of different nature, this research presents a black-box approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems (i.e. AnswerBus, START and NaLURI).
Keywords: Evaluation, question answering, response quality.
Digital Object Identifier (DOI):
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1868References:
[1] Benamara, F., Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment. In Proceedings of the ACL Workshop on Question Answering in Restricted Domains, 2004.
[2] Benamara, F. & Saint-Dizier, P., Advanced Relaxation for Cooperative Question Answering. In New Directions in Question Answering. MIT Press, 2004.
[3] Chung, H., Han, K., Rim, H., Kim, S., Lee, J., Song, Y. & Yoon, D., A Practical QA System in Restricted Domains. In Proceedings of the ACL Workshop on Question Answering in Restricted Domains, 2004.
[4] Diekema, A., Yilmazel, O. & Liddy, E., Evaluation of Restricted Domain Question-Answering Systems. In Proceedings of the ACL Workshop on Question Answering in Restricted Domains, 2004.
[5] Facemire, J., A Proposed Metric for the Evaluation of Natural Language Systems. In Proceedings of the IEEE Energy and Information Technologies in the Southeast, 1989.
[6] Guida, G. & Mauri, G., A Formal Basis for Performance Evaluation of Natural Language Understanding Systems. Computational Linguistics, 10(1):15-30, 1984.
[7] Hirschman, L. & Gaizauskas, R., Natural Language Question Answering: The View from Here. Natural Language Engineering, 7(4):275-300, 2001.
[8] Hermjakob, U., Parsing and Question Classification for Question Answering. In Proceedings of the ACL Workshop on Open-Domain Question Answering, 2001.
[9] Lin, J., Sinha, V., Katz, B., Bakshi, K., Quan, D., Huynh, D. & Karger, D., What Makes a Good Answer? The Role of Context in Question Answering. In Proceedings of the 9th International Conference on Human-Computer Interaction, 2003.
[10] Katz, B. & Lin, J., START and Beyond. In Proceedings of the 6th World Multiconference Systemics, Cybernetics and Informatics, 2002.
[11] Katz, B., Annotating the World Wide Web using Natural Language. In Proceedings of the 5th Conference on Computer Assisted Information Searching on the Internet, 1997.
[12] Katz, B., Felshin, S. & Lin, J., The START Multimedia Information System: Current Technology and Future Directions. In Proceedings of the International Workshop on Multimedia Information Systems, 2002.
[13] King, M., Evaluating Natural Language Processing Systems. Communications of the ACM, 39(1):73-79, 1996.
[14] Kwok, C., Weld, D. & Etzioni, O., Scaling Question Answering to the Web. ACM Transactions on Information Systems, 19(3):242-262, 2001.
[15] Maybury, M., Toward a Question Answering Roadmap. In Proceedings of the AAAI Spring Symposium on New Directions in Question Answering, pp. vii-xi, 2003.
[16] Moldovan, D., Pasca, M., Surdeanu, M. & Harabagiu, S., Performance Issues and Error Analysis in an Open-Domain Question Answering System. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002.
[17] Srivastava, A. & Rajaraman, V., A Vector Measure for the Intelligence of a Question-Answering (Q-A) System. IEEE Transactions on Systems_Man and Cybernetics, 25(5):814-823, 1995.
[18] Wong, W., Practical Approach to Knowledge-based Question Answering with Natural Language Understanding and Advanced Reasoning. Thesis (MSc), Kolej Universiti Teknikal Kebangsaan Malaysia, 2004.
[19] Wong, W., Sing, G. O., Mohammad-Ishak, D. & Shahrin, S., Online Cyberlaw Knowledge Base Construction using Semantic Network. In Proceedings of the IASTED International Conference on Applied Simulation and Modeling, 2004a.
[20] Wong, W., Sing, G. O. & Mokhtar, M., Syntax Preprocessing in Cyberlaw Web Knowledge Base Construction. In Proceedings of the International Conference on Intelligent Agents, Web Technologies and Internet Commerce, 2004b.
[21] Voorhees, E., Overview of TREC 2003. In Proceedings of the 12th Text Retrieval Conference, 2003.
[22] Zheng, Z., Developing a Web-based Question Answering System. In Proceedings of the 11th International Conference on World Wide Web, 2002a.
[23] Zheng, Z., AnswerBus Question Answering System. In Proceedings of the Conference on Human Language Technology, 2002b.
[24] Zweigenbaum, P., Question Answering in Biomedicine. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, 2003.
[25] Allen, J., Natural Language Understanding. Benjamin/Cummins Publishing, 1995.
[26] Nyberg, E. & Mitamura, T., Evaluating QA Systems on Multiple Dimensions. In Proceedings of the Workshop on QA Strategy and Resources, 2002.