Improved Statistical Machine Translation by Cross-Linguistic Projection of Named Entities Recognition and Translation

Authors

  • Rahma Sellami Sfax University
  • Fatima Deffaf UQAM
  • Fatiha Sadat UQAM
  • Lamia Hadrich Belguith Sfax University

DOI:

https://doi.org/10.13053/cys-19-4-2329

Keywords:

Named entity, pivot language, machine translation.

Abstract

One of the existing difficulties in natural languageprocessing applications is the lack of appropriatetools for the recognition, translation, and/or transliterationof named entities (NEs), specifically for lessresourcedlanguages. In this paper, we propose a newmethod to automatically label multilingual parallel datafor Arabic-French pair of languages with named entitytags and build lexicons of those named entities with theirtransliteration and/or translation in the target language.For this purpose, we bring in a third well-resourcedlanguage, English, that might serve as pivot, in orderto build an Arabic-French NE Translation lexicon. Evaluationson the Arabic-French pair of languages usingEnglish as pivot in the transitive model showed the effectivenessof the proposed method for mining Arabic-French named entities and their translations. Moreover,the integration of this component in statistical machinetranslation outperformed the baseline system.

Author Biographies

Rahma Sellami, Sfax University

is a doctoral student in ComputerSciences at the University of Sfax, Tunisia. She isa researcher at ANLP Research Group of MIRACLLaboratory. Her Ph.D. thesis aims to exploit comparablecorpora for statistical machine translation.Her main interest focuses on Arabic language processing.

Fatima Deffaf, UQAM

is a master student in ComputerSciences at the University of Quebec in Montreal,Canada. Her master thesis aims to exploit parallelcorpora for named entity recognition and translation.

Fatiha Sadat, UQAM

is an Associate Professor at theUniversity of Quebec in Montreal, Canada. Shereceived her doctoral degree in 2003 from theComputer Science Department, Nara Institute ofScience and Technology, Nara, Japan. Her researchincludes work on cross-language informationretrieval, social media analysis, multilingualontologies, machine translation, natural languageprocessing, morphological analysis and computationalanalysis of Arabic dialects. In the past, FatihaSadat was a researcher at the National ResearchCouncil of Canada and the National Institute ofInformatics, as a post-doctoral fellow under theJSPS program (Japan Society for the Promotion ofScience).

Lamia Hadrich Belguith, Sfax University

is a Professor of ComputerScience at Sfax University, Tunisia, and Headof the Arabic NLP Research Group at MIRACLLaboratory. Her research interest is mainly focusedon Arabic language processing and its applications.She is also interested in a number of othertopics such as summarization, question answering,and Tunisian dialect processing. She has publishedextensively in her field.

Downloads

Published

2015-12-18