Calculating the Upper Bounds for Multi-Document Summarization using Genetic Algorithms

Jonathan Rojas Simón, Yulia Ledeneva, René Arnulfo García Hernández


Over the last years, several Multi-Document Summarization (MDS) methods have been presented in Document Understanding Conference (DUC) workshops. Since DUC01, several methods have been presented in approximately 268 publications of the state-of-the-art, that have allowed the continuous improvement of MDS, however in most works the upper bounds were unknowns. Recently, some works have been focused to calculate the best sentence combinations of a set of documents and in previous works we have been calculated the significance for single-document summarization task in DUC01 and DUC02 datasets. However, for MDS task has not performed an analysis of significance to rank the best multi-document summarization methods. In this paper, we propose a method based on Genetic Algorithms  for calculating the best sentence combinations of DUC01 and DUC02 datasets in MDS through a meta-document representation. Moreover, we have calculated three heuristics mentioned in several works of state-of-the-art to rank the most recent MDS methods, through the calculus of upper bounds and lower bounds.


Topline, Multi-Document Summarization, Genetic Algorithms, Upper bounds, Significance

Full Text: PDF