Paraphrase and Textual Entailment Generation in Czech

Zuzana Neverilova

Abstract


Paraphrase and textual entailment generationcan support natural language processing (NLP) tasksthat simulate text understanding, e.g., text summarization,plagiarism detection, or question answering. Aparaphrase, i.e., a sentence with the same meaning,conveys a certain piece of information with new wordsand new syntactic structures. Textual entailment, i.e., aninference that humans will judge most likely true, canemploy real-world knowledge in order to make someimplicit information explicit. Paraphrases can also beseen as mutual entailments. We present a new systemthat generates paraphrases and textual entailments froma given text in the Czech language. First, the process isrule-based, i.e., the system analyzes the input text, producesits inner representation, transforms it accordingto transformation rules, and generates new sentences.Second, the generated sentences are ranked accordingto a statistical model and only the best ones are output.The decision whether a paraphrase or textual entailmentis correct or not is left to humans. For this purpose wedesigned an annotation game based on a conversationbetween a detective (the human player) and his assistant(the system). The result of such annotation is acollection of annotated pairs text–hypothesis. Currently,the system and the game are intended to collect data inthe Czech language. However, the idea can be appliedfor other languages. So far, we have collected 3,321 H–T pairs. From these pairs, 1,563 were judged correct(47.06 %), 1,238 (37.28 %) were judged incorrectentailments, and 520 (15.66 %) were judged non-senseor unknown.

Keywords


Games with a purpose, paraphrase, textual entailment, natural language generation

Full Text: PDF