A Comprehensive Review on Automatic Text Summarization
Abstract
This article presents a broad overview of Automatic Text Summarization (ATS) as a downstream Natural Language Processing (NLP) task. We explore the bibliometrics, available data, methods, summary evaluation techniques, and summarization models. We start from the early methods of text summarization suggested by earlier research on the problem in the middle of the 20th century and follow the developments in the methods, approaches, and data available until recent times. We observe Artificial Neural Network (ANN) models replacing Extractive Summarization methods in favor of Abstractive ones. Finally, we compare the performance of the state-of-the-art summarization models on different datasets from various domains. And conclude that Abstractive Summarization models outperform Extractive ones based on the ROUGE score because, most of the time, “golden” or reference summaries are abstractive. However, that does not necessarily mean that Extractive summaries are bad. It only suggests that the Extractive Summary lexicon fails to match the reference summary lexicon sufficiently. Thus, we suppose there have to be other means to assess Extractive Summary quality, and at the same time, there is a need to evaluate the reference summary quality as well.
Keywords
Text summarization; natural language processing; information extraction