Unsupervised Keyphrase Extraction: Ranking Step and Single-Word Phrase Problem
Abstract
Keyphrases provide a compact representation of a document‘s content and can be efficiently used to enhance Web search results and improve natural language processing tasks. This paper extends the state-of-the-art in unsupervised keyphrase extraction from scientific paper abstracts. It aims to demonstrate the existence of a dataset-dependent single-word phrase problem explicitly. We also aim to investigate how different unsupervised algorithms handle the task of ranking both single-word and multi-word phrases and to observe the effect the single-word phrase problemhas on phrase ranking. This paper helps analyze the reasons allowing algorithms to perform better or worse in comparison to each other and shows how the gained insights can enhance the quality of the existing algorithms.
Keywords
Keyphrase extraction, one-word phrase problem, keyphrase length, natural language processing