Anaphoric Connectives and Long-Distance Discourse Relations in Czech
Abstract
This paper is a linguistic as well as technical survey for the development of a shallow discourse parser for Czech. It focuses on long-distance discourse relations signalled by (mostly) anaphoric discourse connectives. Proceeding from the division of connectives on “structural” and “anaphoric” according to their (in)ability to accept distant (non-adjacent) text segments as their left-sided arguments, and taking into account results of related analyses on English data in the framework of the Penn Discourse Treebank, we analyze a large amount of language data in Czech. We benefit from the multilayer manual annotation of various language aspects from morphology to discourse, coreference and bridging relations in the Prague Dependency Treebank 3.0. We describe the linguistic parameters of long-distance discourse relations in Czechin connection with their anchoring connective, and suggest possible ways of their detection. Our empirical research also outlines some theoretical consequences for the underlying assumptions in discourse analysis and parsing, e.g. the risk of relying too much on different (language-specific?) part-of-speech categorizations of connectives or the different perspectives in shallow and global discourse analyses (the minimality principle vs. higher text structure).
Keywords
Anaphoric connectives, long-distance discourse relations