Detecting Salient Events in Large Corpora by a Combination of NLP and Data Mining Techniques

Authors

  • Delphine Battistelli STIH, Université Paris Sorbonne, France
  • Thierry Charnois GREYC, Université de Caen, France, MoDyCo, UMR 7114, Université Paris Ouest Nanterre La Défense, France
  • Jean Luc Minel GREYC, Université de Caen, France
  • Charles Teissèdre STIH, Université Paris Sorbonne, France

DOI:

https://doi.org/10.13053/cys-17-2-1527

Keywords:

Dates, temporal adverbials, event extraction, sequential pattern.

Abstract

In this paper, we present a framework and a system that extracts “salient” events relevant to a query from a large collection of documents, and which also enables events to be placed along a timeline. Each event is represented by a sentence extracted from the collection. We have conducted some experiments showing the interest of the method for this issue. Our method is based on a combination of linguistic modeling (concerning temporal adverbial meanings), symbolic natural language processing techniques (using cascades of morpho-lexical transducers) and data mining techniques (namely, sequential pattern mining under constraints). The system was applied to a corpus of newswires in French provided by the Agence France Presse (AFP). Evaluation was performed in partnership with French newswire agency journalists.

Downloads

Published

2013-06-29