Extraction of Significant Phrases from Text
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33093
Extraction of Significant Phrases from Text

Authors: Yuan J. Lui

Abstract:

Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This paper introduces a new domain independent keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes, and a new machine learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that this algorithm performs better than other keyphrase extraction tools and that it significantly outperforms Microsoft Word 2000-s AutoSummarize feature. The domain independence of this algorithm has also been confirmed in our experiments.

Keywords: classification, keyphrase extraction, machine learning, summarization

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1072619

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2050

References:


[1] E. D-Avanzo, B. Magnini and A. Vallin, "Keyphrase extraction for summarization purposes: the LAKE system at DUC-2004", Document Understanding Workshop, Boston, USA, 2004.
[2] E. D-Avanzo and B. Magnini, "A keyphrase-based approach to summarization: the LAKE system at DUC-2005", Document Understanding Workshop, Vancouver, Canada, 2005.
[3] R. Fishkin and J. Pollard, "Search engine ranking factors v2", http://www.seomoz.org/article/search-ranking-factors, 2007.
[4] E. Frank, G. Paynter, I. Witten, C. Gutwin and C. Nevill-Manning, "Domain-specific keyphrase extraction", Proceedings of 16th International Joint Conference on Artificial Intelligence, California, USA, Morgan Kaufmann, pp. 668-673, 1999.
[5] Y. Lui, "An improved keyphrase extraction algorithm", Proceedings of PREP2005, Lancaster, UK, 2005.
[6] Y. Lui, R. Brent and A. Calinescu, "Extracting significant phrases from text", Proceedings of IEEE Data Mining and Information Retrieval, Ontario, Canada, IEEE Computer, pp. 361-366, 2007.
[7] I. Mani, "Automatic summarization", John Benjamins, 2001.
[8] O. Medelyan and I. Witten, "Thesaurus based automatic keyphrase indexing", Proceedings of 6th ACM/ IEEE-CS Joint Conference on Digital Libraries, North Carolina, USA, ACM Press, pp. 296-297, 2006.
[9] R. Quinlan, "C4.5: programs for machine learning", Morgan Kaufmann, 1993.
[10] G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval", Information Processing and Management, Vol. 24, No. 5, pp. 513-523, 1988.
[11] G. Salton and M. McGill, "Introduction to modern information retrieval", McGraw-Hill, 1983.
[12] M. Song, I. Song and X. Hu, "KPSpotter: a flexible information gainbased keyphrase extraction system", Proceedings of 5th ACM International Workshop on Web Information and Data Management, Louisiana, USA, ACM Press, pp. 50-53, 2003.
[13] D. Sullivan, "Death of a meta tag", http://searchenginewatch.com/ showPage.html? page=2165061, 2002.
[14] P. Tan, M. Steinbach and V. Kumar, "Introduction to data mining", Addison-Wesley, 2006.
[15] P. Turney, "Extraction of keyphrases from text: evaluation of four algorithms", Technical Report ERB-1051, National Research Council of Canada, 1997.
[16] P. Turney, "Learning to extract keyphrases from text", Technical Report ERB-1057, National Research Council of Canada, 1999.
[17] P. Turney, "Coherent keyphrase extraction via web mining", Proceedings of 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, CogPrints, pp. 434-439, 2003.
[18] I. Witten and E. Frank, "Data mining: practical machine learning tools and techniques with Java implementations", Morgan Kaufmann, 2000.
[19] Y. Zhang, N. Zincir-Heywood and E. Milios, "World wide web site summarization", Web Intelligence and Agent Systems, Vol. 2, Issue 1, pp. 39-53, 2004.