Sentence Similarity Computation based on WordNet and VerbNet
Abstract
Sentence similarity computing is increasingly growing in several applications, such as question answering, machine-translation, information retrieval and automatic abstracting systems. This paper firstly sums up several methods to calculate similarity between sentences which consider semantic and syntactic knowledge. Second, it presents a new method for the sentence similarity measure that aggregates, in a linear function, three components: the Lexical similarity Lexsim including the common words, the semantic similarity SemSim using the synonymy words and the syntactico-semantic similarity SynSemSim based on common semantic arguments, notably, thematic role and semantic class. Concerning the word-based semantic similarity, a measure is computed to estimate the semantic degree between words by exploiting the WordNet ”is a” taxonomy. Moreover, the semantic argument determination is based on the VerbNet database. The proposed method yielded competitive results compared to previously proposed measures and with regard to the Li’s benchmark, which shown a high correlation with human ratings. Furthermore, experiments performed on the Microsoft Paraphrase Corpus showed the best F-measure values compared to other measures for high similarity thresholds.
Keywords
Sentence similarity, syntactico-semantic similarity, thematic role, semantic class, WordNet, VerbNet