Paraphrase and Textual Entailment Generation in Czech

Authors

  • Zuzana Neverilova Masaryk University, Brno,

DOI:

https://doi.org/10.13053/cys-18-3-2040

Keywords:

Games with a purpose, paraphrase, textual entailment, natural language generation

Abstract

Paraphrase and textual entailment generationcan support natural language processing (NLP) tasksthat simulate text understanding, e.g., text summarization,plagiarism detection, or question answering. Aparaphrase, i.e., a sentence with the same meaning,conveys a certain piece of information with new wordsand new syntactic structures. Textual entailment, i.e., aninference that humans will judge most likely true, canemploy real-world knowledge in order to make someimplicit information explicit. Paraphrases can also beseen as mutual entailments. We present a new systemthat generates paraphrases and textual entailments froma given text in the Czech language. First, the process isrule-based, i.e., the system analyzes the input text, producesits inner representation, transforms it accordingto transformation rules, and generates new sentences.Second, the generated sentences are ranked accordingto a statistical model and only the best ones are output.The decision whether a paraphrase or textual entailmentis correct or not is left to humans. For this purpose wedesigned an annotation game based on a conversationbetween a detective (the human player) and his assistant(the system). The result of such annotation is acollection of annotated pairs text–hypothesis. Currently,the system and the game are intended to collect data inthe Czech language. However, the idea can be appliedfor other languages. So far, we have collected 3,321 H–T pairs. From these pairs, 1,563 were judged correct(47.06 %), 1,238 (37.28 %) were judged incorrectentailments, and 520 (15.66 %) were judged non-senseor unknown.

Author Biography

Zuzana Neverilova, Masaryk University, Brno,

obtained a degree in ComputerScience in 2005 from the Faculty of Informatics,Masaryk University, Brno (Czech Republic).She is currently pursuing her Ph.D. degreein Computer Science with specialization in NaturalLanguage Processing. Her broad research interestsinclude semantic analysis, ontologies, andgames with a purpose. From 2005, she focusedon visualization of ontologies and library data, from2010, she participated in the EuDML project thataimed to aggregate metadata from European digitalmathematics libraries. Presently, she teachescomputational linguistics at the Faculty of Arts.

Downloads

Published

2014-09-29