A Knowledge-Base Oriented Approach for Automatic Keyword Extraction
Abstract
Automatic keyword extraction is an important
subfield of information extraction process. It is a
difficult task, where numerous different techniques and
resources have been proposed. In this paper, we
propose a generic approach to extract keyword from
documents using encyclopedic knowledge. Our two-step
approach first relies on a classification step for identifying
candidate keywords followed by a learning-to-rank
method depending on a user-defined keyword profile to
order the candidates. The novelty of our approach relies
on i) the usage of the keyword profile ii) generic features
derived from Wikipedia categories and not necessarily
related to the document content. We evaluate our
system on keyword datasets and corpora from standard
evaluation campaign and show that our system improves
the global process of keyword extraction.
subfield of information extraction process. It is a
difficult task, where numerous different techniques and
resources have been proposed. In this paper, we
propose a generic approach to extract keyword from
documents using encyclopedic knowledge. Our two-step
approach first relies on a classification step for identifying
candidate keywords followed by a learning-to-rank
method depending on a user-defined keyword profile to
order the candidates. The novelty of our approach relies
on i) the usage of the keyword profile ii) generic features
derived from Wikipedia categories and not necessarily
related to the document content. We evaluate our
system on keyword datasets and corpora from standard
evaluation campaign and show that our system improves
the global process of keyword extraction.
Keywords
Automatic keyword extraction, encyclopedic knowledge.