A Graph-based Approach to Text Genre Analysis

Authors

  • Ahmed Ragab Nabhan Fayoum University - Sears Holdings (USA)
  • Khaled Shaalan The British University in Dubai - University of Edinburgh (UK)

DOI:

https://doi.org/10.13053/cys-20-3-2471

Keywords:

word graphs, genres analysis, topological features

Abstract

Genre characterization can be achieved by a variety of methods that employ lexical, syntactic, and presentation features of text to highlight key domain differences and stylistic preferences. However, these traditional methods cannot uncover some important macro-structural features that are embedded in text. Representation of text as a word graph can enable effective frameworks for analysis and identification of key topological features that characterize genres of text. In this study, we investigated graph features such as clustering coefficients, centralization, diameter, and average path lengths for eight text genres. The findings indicated key patterns that vary from a genre to another according to the stylistic differences in text. Furthermore, evidence of subgenres was found through some graph features such as number of connected components and node heterogeneity.

Author Biographies

Ahmed Ragab Nabhan, Fayoum University - Sears Holdings (USA)

Has a PhD in Computer Science from University of Vermont, USA. He is a senior software engineer with Sears Holdings Corporation sepcializing in information retrieval. He is also a lecturer in Computer Science, Faculty of Computers and Information, Fayoum University, Egypt. Dr. Nabhan’s research is focused on graph data mining, computational biology, complex networks, and statistical natural language processing.

Khaled Shaalan, The British University in Dubai - University of Edinburgh (UK)

Is a full professor of Computer Science at the British University in Dubai (BUiD), UAE, an Honorary Fellow at the School of Informatics, University of Edinburgh (UoE), UK, and a tenured full professor of Computer Science and Information (on Secondment) at the Faculty of Computers and Information (FCI), Cairo University (CU), Egypt. Recently, Prof Shaalan has been contributing to a wide range of research topics in Arabic Natural Language Processing, including machine translation, parsing, spelling and grammatical checking, named entity recognition, and diacritization. He has published over 100 referred publications and the impact of my research using GoogleScholar H index metric is 20. Prof Shaalan has actively and extensively supported the local and international academic community. He is the founder and CoChair of The International Conference on Arabic Computational Linguistic (ACLing).

Downloads

Published

2016-09-30