Examining the Value of Attribute Scores for Author-Supplied Keyphrases in Automatic Keyphrase Extraction

Vicky Min-How Lim; Siew Fan Wong; Tong Ming Lim

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Examining the Value of Attribute Scores for Author-Supplied Keyphrases in Automatic Keyphrase Extraction

Authors: Vicky Min-How Lim, Siew Fan Wong, Tong Ming Lim

Abstract:

Automatic keyphrase extraction is useful in efficiently locating specific documents in online databases. While several techniques have been introduced over the years, improvement on accuracy rate is minimal. This research examines attribute scores for author-supplied keyphrases to better understand how the scores affect the accuracy rate of automatic keyphrase extraction. Five attributes are chosen for examination: Term Frequency, First Occurrence, Last Occurrence, Phrase Position in Sentences, and Term Cohesion Degree. The results show that First Occurrence is the most reliable attribute. Term Frequency, Last Occurrence and Term Cohesion Degree display a wide range of variation but are still usable with suggested tweaks. Only Phrase Position in Sentences shows a totally unpredictable pattern. The results imply that the commonly used ranking approach which directly extracts top ranked potential phrases from candidate keyphrase list as the keyphrases may not be reliable.

Keywords: Accuracy, Attribute Score, Author-supplied keyphrases, Automatic keyphrase extraction.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1061040

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1293

References:

[1] K. Sarkar, M. Nasipuri, and S. Ghose, "A New Approach to Keyphrase Extraction Using Neural Networks," IJCSI International Journal of Computer Science Issues, vol. 7, no. 2, 2010.
[2] N. Kumar and K. Srinathan, "Automatic keyphrase extraction from scientific documents using N-gram filtration technique," Proceeding of the eighth ACM symposium on Document engineering - DocEng -08, p. 199, 2008.
[3] K. Frantzi, S. Ananiadou, and H. Mima, "Automatic recognition of multi-word terms: the C-value/NC-value method," International Journal on Digital Libraries, vol. 3, no. 2, pp. 115-130, Aug. 2000.
[4] E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill- Manning, "Domain-Specific Keyphrase Extraction," in Proceedings of the 14th ACM international conference on Information and knowledge management, 2005, pp. 668-671.
[5] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevillmanning, "KEA: Practical Automatic Keyphrase Extraction," in Proceedings of the fourth ACM conference on Digital libraries, 1999.
[6] P. Turney, "Learning to Extract Keyphrases from Text," National Research Council of Canada, 1999.
[7] A. Csomai, "Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.," University Of North Texas, 2008.
[8] P. D. Turney, "Extraction of Keyphrases from Text: Evaluation of Four Algorithms," October, p. 31, 1997.
[9] S. R. El-Beltagy and A. Rafea, "KP-Miner: A keyphrase extraction system for English and Arabic documents," Information Systems, vol. 34, no. 1, pp. 132-144, Mar. 2008.
[10] S. N. Kim and M.-Y. Kan, "Re-examining automatic keyphrase extraction approaches in scientific articles," Proceedings of the Workshop on Multiword Expressions Identification, Interpretation, Disambiguation and Applications - MWE -09, no. August, p. 9, 2009.
[11] O. Medelyan and I. H. Witten, "Domain-Independent Automatic Keyphrase Indexing with Small Training Sets," Journal of the American Society for Information Science & Technology, vol. 59, no. 7, pp. 1026- 1040, 2008.
[12] Y. Park, R. J. Byrd, and B. K. Boguraev, "Automatic Glossary Extraction: Beyond Terminology," in Proceedings of the 19th international conference on Computational linguistics - Volume 1, 2002.
[13] P. D. Turney, "Learning Algorithms for Keyphrase Extraction," Information Retrieval - INRT 34-99, 1999.