Text Analysis Using Different Graph-Based Representations

Authors

  • Esteban Castillo Juarez Universidad de las Américas Puebla (UDLAP)
  • Ofelia Cervantes Villagómez Universidad de las Américas Puebla (UDLAP)
  • Darnes Vilariño Ayala Benemérita Universidad Autónoma de Puebla (BUAP)

DOI:

https://doi.org/10.13053/cys-21-4-2551

Keywords:

Text modeling, graph-based representation, co-occurrence graphs, text classification, feature-vector approach, graph similarity approach

Abstract

This paper presents an overview of different graph-based representations proposed to solve text classification tasks. The core of this manuscript is to highlight the importance of enriched/non-enriched co-occurrence graphs as analternative to traditional features representation models like vector representation, where most of the time these models can not map all the richness of text documents that comes from the web (social media, blogs, personalweb pages, news, etc). For each text classification task the type of graph created as well as the benefits of using it are presented and discussed. In specific, the type of features/patterns extracted, the implemented classification/similarity methods and the results obtained in datasets are explained. The theoretical and practical implications of using co-occurrence graphs are also discussed, pointing out the contributions and challenges of modeling text document as graphs.

Author Biographies

Esteban Castillo Juarez, Universidad de las Américas Puebla (UDLAP)

Esteban Castillo is a Computer Science Ph.D. student at the Universidad de las Américas Puebla (UDLAP), Mexico. He obtained his Master degree from the Benemérita Universidad Autónoma de puebla (BUAP) in 2012. His research interests include: Natural Language Processing, Data Mining, Machine learning, Graph Theory, Data Science and computational linguistics in general.

Ofelia Cervantes Villagómez, Universidad de las Américas Puebla (UDLAP)

Ofelia Cervantes is a full time professor of computer science in the Department of Computing, Electronics, and Mechatronics at Universidad de las Américas Puebla (UDLAP), Mexico. She obtained her PhD from the Institut National Polytechnique á Grenoble, France in 1988. Her research interests include: Database systems, social network analysis, Natural Language Processing, Information retrieval, Big Data analytic, among others.

Darnes Vilariño Ayala, Benemérita Universidad Autónoma de Puebla (BUAP)

Darnes Vilariño s a full time professor at the Faculty of Computer Science of the Benemérita Universidad Autónoma de puebla (BUAP). She obtained her PhD in mathematics in the area of optimization at the Havana's University of Cuba in 1997. Her research interests include: information retrieval, Natural Language Processing tasks, and computational linguistics in general.

Downloads

Published

2017-12-23