Identification of Verbal Phraseological Units in Mexican News Stories

Authors

  • Belém Priego Sánchez Université Paris 13
  • David Pinto Benemérita Universidad Autónoma de Puebla

DOI:

https://doi.org/10.13053/cys-19-4-2328

Keywords:

Verbal phraseological units, supervised machine learning, lexicon

Abstract

Verbal Phraseological Units are phrasesmade up of two or more words in which at least one of thewords is a verb that plays the role of the predicate. Oneof the characteristics of this type of expression is that itsglobal meaning rarely can be deduced from the meaningof its components. The automatic recognition of this typeof linguistic structures is a very important task, since theyare a standard way of expressing a concept or idea. Inthis paper we present the results obtained when differentsupervised machine learning methods are employed fordetermining whether or not a verbal phraseological unitis present in a given story of a newspaper. The experimentshave been carried out using a supervised corpusof news stories (written in Mexican Spanish). Besidethe results obtained in the experiments aforementioned,we provide access to a new lexicon having phrases asentries (instead of single words), in which each entry isassociated to a real value (normalized between zero andone) indicating its probability of being a verbal phraseologicalunit.

Author Biographies

Belém Priego Sánchez, Université Paris 13

obtained her Master degreein Computer Science from the Benem´ eritaUniversidad Aut ´onoma de Puebla, M´exico, in 2012.She is actually a Ph.D. student at the LDI laboratoryof the University of Paris XIII, France, andstrongly collaborates with the LKE research groupof the BUAP university in Mexico. Her areas of interestinclude computer science, phraseology, lexicalacquisition of multiword expressions (MWEs)for natural language processing applications, corpuslinguistics, and computational linguistics ingeneral.

David Pinto, Benemérita Universidad Autónoma de Puebla

obtained hisPh.D. in Computer Science in the area of ArtificialIntelligence and Pattern Recognition from thePolytechnic University of Valencia, Spain, in 2008.He is actually a full time professor at the Faculty ofComputer Science of the Benem´ erita UniversidadAut ´onoma de Puebla in which he is the currentleader of the Language & Knowledge EngineringResearch Group. His areas of interest includeclustering, information retrieval, crosslingual NLPtasks, and computational linguistics in general.

Downloads

Published

2015-12-18