Segmentation Strategies to Face Morphology Challenges in Brazilian-Portuguese/English Statistical Machine Translation and Its Integration in Cross-Language Information Retrieval

Autores/as

  • Marta Ruiz Costa-jussà Institute of Mathematics and Statistics

DOI:

https://doi.org/10.13053/cys-19-2-1550

Palabras clave:

Morphology, factored-based machine translation, cross-language information retrieval

Resumen

The use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate a lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO andMORFESSOR) which allow reducing the vocabulary and data sparseness. Then, to these segmentations we add the morphological information of a POSlanguage model. We combine all these approaches using a Minimum Bayes Risk strategy. Experiments show significant improvements from the enhanced system over the baseline system on the Brazilian-Portuguese/English language pair. Finally, we report a case study of the impact of enhancing the statistical machine translation system with morphology in a cross-language application system such as ONAIR which allows users to look for information in video fragments through queries in natural language.

Biografía del autor/a

Marta Ruiz Costa-jussà, Institute of Mathematics and Statistics

Marta R. Costa-jussà is a Telecommunication Engineer by the Universitat Politècnica de Catalunya (UPC, Barcelona). She received her Ph.D. from the UPC in 2008. Her research experience is mainly in Machine Translation (MT), she also has experience in Automatic Speech Recognition (ASR) and Information Retrieval (IR). She has worked at LIMSI-CNRS (Paris), Universitat Politècnica de Catalunya (Barcelona), Universitat Pompeu Fabra (Barcelona), Barcelona Media Innovation Center (Barcelona), Universidade de São Paulo (São Paulo), Institute for Infocomm Research (Singapore) and Instituto Politécnico Nacional (Mexico). She has received prestigious and competitive fellowships such as Formaci´on del Personal Universitario (FPU) and Juan de la Cierva (from the Spanish Government), BE-DGR (Grants for Abroad Research, from Catalonia), FAPESP Visiting Professor (from São Paulo research foundation) and an IOF Marie Curie (from the European Commission). She has participated in 12 European and National (Spanish, French, and Brazilian) projects. She has organized 5 conferences/workshops in the areas of MT and IR, taught several tutorials and seminars, given more than 20 invited talks and published over 90 papers in international scientific journals and conferences receiving several awards. She has been cooperating with companies (TaUYou, UniversalDoctor and BMMT) as a consultant.

Descargas

Publicado

2015-06-01