Lexical Patterns Based on Maximal Frequent Secuences for Automatic Keyphrase Extraction

Authors

  • Yanet Hernández Casimiro Universidad Autónoma del Estado de México
  • Yulia Ledeneva Universidad Autónoma del Estado de México
  • René Arnulfo García Hernández Universidad Autónoma del Estado de México
  • Marco Antonio Ramos Corchado Universidad Autónoma del Estado de México

DOI:

https://doi.org/10.13053/cys-25-1-3868

Keywords:

Lexical patterns, key phrases, automatic key phrase extraction, maximal frequent sequences

Abstract

This paper presents a method for the automatic keyphrase extraction task using lexical patterns. First, the patterns are obtained from a set of data and converted into regular expression search patterns, allowing to consider sequences of characters that define a phrase without depending on its syntactic or semantic characteristics and thus obtain a list of possible candidates. Besides, to select the best, only those that obtained a high weight will be considered, in the following four weights: Boolean (B), Precision (P), Recall (R), and F-Measure (F); which corresponds to the result obtained from each evaluated pattern, therefore a list is generating of the best 5,10 and 15 keyphrases for each document. The evaluation of the method was realized by length (L) and combination (C), where the combination takes the best candidates for each length (1 to 4). The method was tested in corpus of scientific articles using the SemEval-2010 data set for task 5.

Downloads

Published

2021-02-15

Issue

Section

Articles