Lexical Patterns Based on Maximal Frequent Secuences for Automatic Keyphrase Extraction

Yanet Hernández Casimiro, Yulia Ledeneva, René Arnulfo García Hernández, Marco Antonio Ramos Corchado


This paper presents a method for the automatic keyphrase extraction task using lexical patterns. First, the patterns are obtained from a set of data and converted into regular expression search patterns, allowing to consider sequences of characters that define a phrase without depending on its syntactic or semantic characteristics and thus obtain a list of possible candidates. Besides, to select the best, only those that obtained a high weight will be considered, in the following four weights: Boolean (B), Precision (P), Recall (R), and F-Measure (F); which corresponds to the result obtained from each evaluated pattern, therefore a list is generating of the best 5,10 and 15 keyphrases for each document. The evaluation of the method was realized by length (L) and combination (C), where the combination takes the best candidates for each length (1 to 4). The method was tested in corpus of scientific articles using the SemEval-2010 data set for task 5.


Lexical patterns, key phrases, automatic key phrase extraction, maximal frequent sequences

Full Text: PDF