Cross-language Plagiarism Detection Using BabelNet’s Statistical Dictionary

Marc Franco-Salvador, Parth Gupta, Paolo Rosso

Abstract


In recent years there have been important advances in the field of automatic plagiarism detection. One variant is cross-language plagiarism detection, which tries to detect plagiarism between documents in different languages. Most of the existing approaches to this task make use of statistical dictionaries to deal with the translations of words in the documents. A statistical dictionary provides, for a given word, the list of possible translations with their respective probabilities. The objective of this paper is to analyze the performance of the statistical dictionary of multilingual semantic network - Babelnet for cross-language plagiarism detection. In the evaluation we compare its results with those offered by a statistical dictionary trained by the well-known IBM M1 aligment model, both using state-of-the-art model CL-ASA as a base. The results of the experiments indicate that Babelnet is a good alternative as statistical dictionary.


Keywords


Cross-language plagiarism detection, textual similarity, statistical dictionary, BabelNet.

Full Text: PDF (Spanish)