Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31464
Context Detection in Spreadsheets Based on Automatically Inferred Table Schema

Authors: Alexander Wachtel, Michael T. Franzen, Walter F. Tichy


Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.

Keywords: Natural language processing, end user development; natural language interfaces, human computer interaction, data recognition, dialog systems, spreadsheet.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 824


[1] H. Liberman, “End-User Development: An Emerging Paradigm,” 2006.
[2] M. Hurst, “The interpretation of tables in texts,” University of Ediburgh, Ph.D., 2000.
[3] R. Abraham, “Header and Unit Inference for Spreadsheets Through Spatial Analyses,” in IEEE Symposium on Visual Languages – Human Centric Computing, 2004.
[4] B. A. Myers, “Invited Research: Overview End-User Programming,” CHI, 2006.
[5] B. M. Christopher Scaffidi, Mary Shaw, “Estimating the numbers of end users and end user programmers,” in Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, ser. VLHCC ’05. IEEE Computer Society, 2005.
[6] B. Ballard, “Programming in natural language: NLC as a prototype,” Association for Computing Machinery (ACM), 1979.
[7] A. Biermann, “Toward Natural Language Computation,” American Journal of Computational Linguistics, 1980.
[8] “An experimental study of natural language programming,” Int. J. Man-Machine Studies, 1983.
[9] A. Wachtel, “Initial implementation of natural language turn-based dialog system,” International Conference on Intelligent Human Computer Interaction (IHCI), 2015.
[10] C. D. Frye, “Microsoft Excel 2013, Step by Step,” O’Reilly Media, 2013.
[11] A. Wachtel, “A Natural Language Dialog System Based on Active Ontologies,” Proceedings of the Ninth International Conference on Advances in Computer-Human Interactions, 2016.
[12] D. Guzzoni, “Active: A unified platform for building intelligent web interaction assistants,” in Web Intelligence and Intelligent Agent Technology Workshops, 2006. WI-IAT 2006 Workshops. 2006 IEEE/WIC/ACM International Conference on. IEEE, 2006, pp. 417–420.
[13] J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” 2001.
[14] X. Wang and D. Wood, Tabular abstraction, editing, and formatting. Citeseer, 1996.
[15] J. Guthrie, “Literacy as multidimensional: Locating information and reading comprehension,” in Educational Psychologist, 22, 1987.
[16] J. Hu, “Why table ground-truthing is hard,” in International Conference on Document Analysis and Recognition, 2001.
[17] D. Pinto, A. McCallum, X. Wei, and W. B. Croft, “Table extraction using conditional random fields,” in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003, pp. 235–242.
[18] M. D. Adelfio and H. Samet, “Schema extraction for tabular data on the web,” Proceedings of the VLDB Endowment, vol. 6, no. 6, pp. 421–432, 2013.
[19] Z. Chen and M. Cafarella, “Automatic web spreadsheet data extraction,” in Proceedings of the 3rd International Workshop on Semantic Search over the Web. ACM, 2013, p. 1.
[20] J. Callan, M. Hoy, C. Yoo, and L. Zhao, “Clueweb09 data set,” 2009.
[21] I. Jolliffe, Principal component analysis. Wiley Online Library, 2002.
[22] D. Lopresti, “A tabular survey of automated table processing,” in GREC, 1999.
[23] D. Embley, “Table-processing paradigms: A research survey,” in International Journal of Document Analysis, 2006.
[24] Y. Tijerno, “Towards ontology generation from tables,” in Springer Science, 2005.
[25] Z. Chen, “Automatic web spreadsheet data extraction,” in Proceedings of the 3rd International Workshop on Semantic Search over the Web ACM, 2013.
[26] “Senbazuru: A prototype spreadsheet database management system,” in VLDB Endowment 6, 2013.
[27] S. Gulwani, “NLyze: Interactive programming by natural language for spreadsheet data analysis and manipulation,” SIGMOD, 2014.
[28] “Automating string processing in spreadsheets using input-output examples,” in ACM SIGPLAN, 2011.
[29] A. Cypher, “Watch what I do: programming by demonstration,” in MIT Press, 1993.
[30] F. Paternò, “End user development: Survey of an emerging field for empowering people,” in ISRN Software Engineering, vol. 2013, 2013.
[31] A. Ko, “Designing the Whyline: A Debugging Interface for Asking Questions About Program Failures,” in CHI, 2004.
[32] S. Gulwani, “Spreadsheet data manipulation using examples,” in ACM, 2012.
[33] V. W. Christian Dorner, Michael Spahn, “End user development: Approaches towards a flexible software design,” in Proceedings of the European Conference on Information Systems, 2008.
[34] P. Sestoft, “Sheet-defined functions: Implementation and initial evaluation,” 2013.
[35] J. Cunha, “Bidirectional Transformation of Model-Driven Spreadsheets,” Springer Lecture Notes in Computer Science, 2012.
[36] A. Begel, “Spoken Language Support for Software Development,” Ph.D. Thesis, Berkeley, 2005.
[37] J. E. Sammet, “The Use of English as a Programming Language,” Communication of the ACM, March 1966.
[38] D. Ferrucci, “Building Watson: An Overview of the DeepQA Project,” Association for the Advancement of Artificial Intelligence, 2010.
[39] H. Liu, “Toward a programmatic semantics of natural language,” Visual Languages and Human Centric Computing, 2004.
[40] C. L. Ortiz, “The Road to Natural Conversational Speech Interfaces,” IEEE Internet Computing, March 2014.
[41] H. Liu, “Metafor: Visualizing stories as code,” 10th international conference on Intelligent user interfaces, 2005.
[42] S. Körner, “Transferring Research Into the Real World - How to Improve RE with AI in the Automotive Industry,” 2014.
[43] V. Le, “SmartSynth: Synthesizing Smartphone Automation Scripts from Natural Language,” MobiSys, 2013.
[44] J. R. Bellegarda, “Spoken Language Understanding for Natural Interaction: The Siri Experience,” Springer New York, 2014.
[45] J. D. Williams, “Spoken dialogue systems: challenges and opportunities for research,” 2009.
[46] S. Seneff, “Response planning and generation in the MERCURY flight reservation system,” 2002.
[47] J. Allen, “PLOW: A Collaborative Task Learning Agent,” Association for the Advancement of Artificial Intelligence, 2007.
[48] W. F. Tichy, “Universal Programmability - How AI Can Help. Artificial Intelligence Synergies in Software Engineering,” May 2013.
[49] K. Bollacker, “Freebase: a collaboratively created graph database for structuring human knowledge,” in ACM SIGMOD, 2008.
[50] A. Budanitsky, “Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures,” in Workshop on WordNet and Other Lexical Resources, 2001.
[51] F. Mahdisoltani, “Yago3: A knowledge base from multilingual wikipedias,” in 7th Biennial Conference on Innovative Data Systems Research CIDR 2015, 2015.
[52] G. Limaye, “Annotating and searching web tables using entities, types and relationships,” in VLDB Endowment Bd. 3, 2010.