Improved Statistical Machine Translation by Cross-Linguistic Projection of Named Entities Recognition and Translation
Abstract
One of the existing difficulties in natural languageprocessing applications is the lack of appropriatetools for the recognition, translation, and/or transliterationof named entities (NEs), specifically for lessresourcedlanguages. In this paper, we propose a newmethod to automatically label multilingual parallel datafor Arabic-French pair of languages with named entitytags and build lexicons of those named entities with theirtransliteration and/or translation in the target language.For this purpose, we bring in a third well-resourcedlanguage, English, that might serve as pivot, in orderto build an Arabic-French NE Translation lexicon. Evaluationson the Arabic-French pair of languages usingEnglish as pivot in the transitive model showed the effectivenessof the proposed method for mining Arabic-French named entities and their translations. Moreover,the integration of this component in statistical machinetranslation outperformed the baseline system.
Keywords
Named entity, pivot language, machine translation.