Learning Relevant Models using Symbolic Regression for Automatic Text Summarization

Eder Vazquez Vazquez, Yulia Ledeneva, René Arnulfo García Hernández

Abstract


Natural Language Processing (NLP) methods allow us to understand and manipulate natural language text or speech to do useful things. There are several specific techniques in this area, and although new approaches to solving the problems arise, its evaluation remains similar. NLP methods are regularly evaluated by a gold standard, which contains the correct results which must be obtained by a method. In this situation, it is desirable that NLP methods can close as possible to the results of the gold standard being evaluated. One of the most outstanding NLP task is the Automatic Text Summarization (ATS). ATS task consists in reducing the size of a text while preserving their information content. In this paper, a method for describing the ideal behavior (gold standard) of an ATS system, is proposed. The proposed method can obtain models that describe the ideal behavior which is described by the topline. In this work, eight models for ATS are obtained. These models generate better results than other models used in the state-of-the-art on ATS task.

Keywords


Natural language processing, gold standard, topline, symbolic regression, data modeling, automatic text summarization task

Full Text: PDF