Segmentation Strategies to Face Morphology Challenges in Brazilian-Portuguese/English Statistical Machine Translation and Its Integration in Cross-Language Information Retrieval

Authors

  • Marta Ruiz Costa-jussà Institute for Infocomm Research

DOI:

https://doi.org/10.13053/cys-19-2-1550

Keywords:

Morphology, factored-based machine translation, cross-language information retrieval

Abstract

The use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate a lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO and MORFESSOR) which allow reducing the vocabulary and data sparseness. Then, to these segmentations we add the morphological information of a POS language model. We combine all these approaches using a Minimum Bayes Risk strategy. Experiments show significant improvements from the enhanced system over the baseline system on the Brazilian-Portuguese/English language pair. Finally, we report a case study of the impact of enhancing the statistical machine translation system with morphology in a cross-language application system such as ONAIR which allows users to look for information in video fragments through queries in natural language.

Author Biography

Marta Ruiz Costa-jussà, Institute for Infocomm Research

Marta R. Costa-jussà is a Telecommunication Engineer by the Universitat Politècnica de Catalunya (UPC, Barcelona). She received her Ph.D. from the UPC in 2008. Her research experience is mainly in Machine Translation (MT), she also has experience in Automatic Speech Recognition (ASR) and Information Retrieval (IR). She has worked at LIMSI-CNRS (Paris), Universitat Politècnica de Catalunya (Barcelona), Universitat Pompeu Fabra (Barcelona), Barcelona Media Innovation Center (Barcelona), Universidade de São Paulo (São Paulo), Institute for Infocomm Research (Singapore) and Instituto Politécnico Nacional (Mexico). She has received prestigious and competitive fellowships such as Formaci´on del Personal Universitario (FPU) and Juan de la Cierva (from the Spanish Government), BE-DGR (Grants for Abroad Research, from Catalonia), FAPESP Visiting Professor (from São Paulo research foundation) and an IOF Marie Curie (from the European Commission). She has participated in 12 European and National (Spanish, French, and Brazilian) projects. She has organized 5 conferences/workshops in the areas of MT and IR, taught several tutorials and seminars, given more than 20 invited talks and published over 90 papers in international scientific journals and conferences receiving several awards. She has been cooperating with companies (TaUYou, UniversalDoctor and BMMT) as a consultant.

Downloads

Published

2015-06-01