A Graph-based Approach to Text Genre Analysis

Ahmed Ragab Nabhan; Khaled Shaalan

doi:10.13053/cys-20-3-2471

A Graph-based Approach to Text Genre Analysis

Authors

Ahmed Ragab Nabhan Fayoum University - Sears Holdings (USA)
Khaled Shaalan The British University in Dubai - University of Edinburgh (UK)

DOI:

https://doi.org/10.13053/cys-20-3-2471

Keywords:

word graphs, genres analysis, topological features

Abstract

Genre characterization can be achieved by a variety of methods that employ lexical, syntactic, and presentation features of text to highlight key domain differences and stylistic preferences. However, these traditional methods cannot uncover some important macro-structural features that are embedded in text. Representation of text as a word graph can enable effective frameworks for analysis and identification of key topological features that characterize genres of text. In this study, we investigated graph features such as clustering coefficients, centralization, diameter, and average path lengths for eight text genres. The findings indicated key patterns that vary from a genre to another according to the stylistic differences in text. Furthermore, evidence of subgenres was found through some graph features such as number of connected components and node heterogeneity.

Author Biographies

Ahmed Ragab Nabhan, Fayoum University - Sears Holdings (USA)

Has a PhD in Computer Science from University of Vermont, USA. He is a senior software engineer with Sears Holdings Corporation sepcializing in information retrieval. He is also a lecturer in Computer Science, Faculty of Computers and Information, Fayoum University, Egypt. Dr. Nabhan’s research is focused on graph data mining, computational biology, complex networks, and statistical natural language processing.

Khaled Shaalan, The British University in Dubai - University of Edinburgh (UK)

Is a full professor of Computer Science at the British University in Dubai (BUiD), UAE, an Honorary Fellow at the School of Informatics, University of Edinburgh (UoE), UK, and a tenured full professor of Computer Science and Information (on Secondment) at the Faculty of Computers and Information (FCI), Cairo University (CU), Egypt. Recently, Prof Shaalan has been contributing to a wide range of research topics in Arabic Natural Language Processing, including machine translation, parsing, spelling and grammatical checking, named entity recognition, and diacritization. He has published over 100 referred publications and the impact of my research using GoogleScholar H index metric is 20. Prof Shaalan has actively and extensively supported the local and international academic community. He is the founder and CoChair of The International Conference on Arabic Computational Linguistic (ACLing).

Downloads

Published

2016-09-30

Issue

Vol. 20 No. 3 (2016): Thematic Issue: Natural Language Processing and Computational Linguistics (Guest Editor: Alexander Gelbukh)

Section

Articles

License

Hereby I transfer exclusively to the Journal "Computación y

Sistemas", published by the Computing Research Center (CIC-IPN),

the Copyright of the aforementioned paper. I also accept that these

rights will not be transferred to any other publication, in any other

format, language or other existing means of developing.

I certify that the paper has not been previously disclosed or simultaneo

usly submitted to any other publication, and that it does not contain

material whose publication would violate the Copyright or other

proprietary rights of any person, company or institution. I certify that

I have the permission from the institution or company where I work or

study to publish this work.

The representative author accepts the responsibility for the publication

of this paper on behalf of each and every one of the authors.

This transfer is subject to the following conditions:

The authors retain all ownership rights (such as patent rights) of this work, except for the publishing rights transferred to the CIC, through this document.
Authors retain the right to publish the work in whole or in part in any book they are the authors or publishers. They can also make use of this work in conferences, courses, personal web pages, and so on.
Authors may include working as part of his thesis, for non-profit distribution only.

A Graph-based Approach to Text Genre Analysis

Authors

DOI:

Keywords:

Abstract

Author Biographies

Ahmed Ragab Nabhan, Fayoum University - Sears Holdings (USA)

Khaled Shaalan, The British University in Dubai - University of Edinburgh (UK)

Downloads

Published

Issue

Section

License

Developed By

Information

Language