Script Independent Morphological Segmentation for Arabic Maghrebi Dialects: An Application to Machine Translation
DOI:
https://doi.org/10.13053/cys-23-3-3267Keywords:
Arabic dialects, morphological segmentation, machine translationAbstract
This research deals with resources creation for under-resourced languages. We try to adapt existing resources for other resourced-languages to process less resourced ones. We focus on Arabic dialects of the Maghreb, namely Algerian, Moroccan and Tunisian. We first adapt a well known statistical word segmenter to segment Algerian dialect texts written in both Arabic and Latin scripts. We demonstrate that unsupervised morphological segmentation could be applied to Arabic dialects regardless of used script. Next, we use this kind of segmentation to improve statistical machine translation scores between the tree Maghrebi dialects and French. We use a parallel multidialectal corpus that includes six Arabic dialects in addition to MSA and French. We achieved interesting results. Regardsto word segmentation, the rate of correctly segmented words reached 70% for those written in Latin scriptand 79% for those written in Arabic script. For machine translation, the unsupervised morphological segmentation helped to decrease out of vocabulary words rates by a minimum of 35%.Downloads
Published
2019-09-25
Issue
Section
Articles of the Thematic Issue
License
Hereby I transfer exclusively to the Journal "Computación y Sistemas", published by the Computing Research Center (CIC-IPN),the Copyright of the aforementioned paper. I also accept that these
rights will not be transferred to any other publication, in any other format, language or other existing means of developing.I certify that the paper has not been previously disclosed or simultaneously submitted to any other publication, and that it does not contain material whose publication would violate the Copyright or other proprietary rights of any person, company or institution. I certify that I have the permission from the institution or company where I work or study to publish this work.The representative author accepts the responsibility for the publicationof this paper on behalf of each and every one of the authors.
This transfer is subject to the following conditions:- The authors retain all ownership rights (such as patent rights) of this work, except for the publishing rights transferred to the CIC, through this document.
- Authors retain the right to publish the work in whole or in part in any book they are the authors or publishers. They can also make use of this work in conferences, courses, personal web pages, and so on.
- Authors may include working as part of his thesis, for non-profit distribution only.