Text Analysis Using Different Graph-Based Representations

Esteban Castillo Juarez; Ofelia Cervantes Villagómez; Darnes Vilariño Ayala

doi:10.13053/cys-21-4-2551

Text Analysis Using Different Graph-Based Representations

Authors

Esteban Castillo Juarez Universidad de las Américas Puebla (UDLAP)
Ofelia Cervantes Villagómez Universidad de las Américas Puebla (UDLAP)
Darnes Vilariño Ayala Benemérita Universidad Autónoma de Puebla (BUAP)

DOI:

https://doi.org/10.13053/cys-21-4-2551

Keywords:

Text modeling, graph-based representation, co-occurrence graphs, text classification, feature-vector approach, graph similarity approach

Abstract

This paper presents an overview of different graph-based representations proposed to solve text classification tasks. The core of this manuscript is to highlight the importance of enriched/non-enriched co-occurrence graphs as analternative to traditional features representation models like vector representation, where most of the time these models can not map all the richness of text documents that comes from the web (social media, blogs, personalweb pages, news, etc). For each text classification task the type of graph created as well as the benefits of using it are presented and discussed. In specific, the type of features/patterns extracted, the implemented classification/similarity methods and the results obtained in datasets are explained. The theoretical and practical implications of using co-occurrence graphs are also discussed, pointing out the contributions and challenges of modeling text document as graphs.

Author Biographies

Esteban Castillo Juarez, Universidad de las Américas Puebla (UDLAP)

Esteban Castillo is a Computer Science Ph.D. student at the Universidad de las Américas Puebla (UDLAP), Mexico. He obtained his Master degree from the Benemérita Universidad Autónoma de puebla (BUAP) in 2012. His research interests include: Natural Language Processing, Data Mining, Machine learning, Graph Theory, Data Science and computational linguistics in general.

Ofelia Cervantes Villagómez, Universidad de las Américas Puebla (UDLAP)

Ofelia Cervantes is a full time professor of computer science in the Department of Computing, Electronics, and Mechatronics at Universidad de las Américas Puebla (UDLAP), Mexico. She obtained her PhD from the Institut National Polytechnique á Grenoble, France in 1988. Her research interests include: Database systems, social network analysis, Natural Language Processing, Information retrieval, Big Data analytic, among others.

Darnes Vilariño Ayala, Benemérita Universidad Autónoma de Puebla (BUAP)

Darnes Vilariño s a full time professor at the Faculty of Computer Science of the Benemérita Universidad Autónoma de puebla (BUAP). She obtained her PhD in mathematics in the area of optimization at the Havana's University of Cuba in 1997. Her research interests include: information retrieval, Natural Language Processing tasks, and computational linguistics in general.

Downloads

Published

2017-12-23

Issue

Vol. 21 No. 4 (2017): Advances in Human Language Technologies (Guest Editor: A. Gelbukh)

Section

Articles of the Thematic Issue

License

Hereby I transfer exclusively to the Journal "Computación y

Sistemas", published by the Computing Research Center (CIC-IPN),

the Copyright of the aforementioned paper. I also accept that these

rights will not be transferred to any other publication, in any other

format, language or other existing means of developing.

I certify that the paper has not been previously disclosed or simultaneo

usly submitted to any other publication, and that it does not contain

material whose publication would violate the Copyright or other

proprietary rights of any person, company or institution. I certify that

I have the permission from the institution or company where I work or

study to publish this work.

The representative author accepts the responsibility for the publication

of this paper on behalf of each and every one of the authors.

This transfer is subject to the following conditions:

The authors retain all ownership rights (such as patent rights) of this work, except for the publishing rights transferred to the CIC, through this document.
Authors retain the right to publish the work in whole or in part in any book they are the authors or publishers. They can also make use of this work in conferences, courses, personal web pages, and so on.
Authors may include working as part of his thesis, for non-profit distribution only.

Text Analysis Using Different Graph-Based Representations

Authors

DOI:

Keywords:

Abstract

Author Biographies

Esteban Castillo Juarez, Universidad de las Américas Puebla (UDLAP)

Ofelia Cervantes Villagómez, Universidad de las Américas Puebla (UDLAP)

Darnes Vilariño Ayala, Benemérita Universidad Autónoma de Puebla (BUAP)

Downloads

Published

Issue

Section

License

Developed By

Information

Language