Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features

Authors

  • Zahurul Islam AG Texttechnology, Institut fur Informatik,
  • Alexander Mehler Goethe-Universit ¨ at, Frankfurt, Germany

DOI:

https://doi.org/10.13053/cys-17-2-1516

Keywords:

Text readability, Wikipedia, enthropy, information transmission, evaluation of features.

Abstract

This paper presents a classifier of text readability based on information-theoretic features.The classifier was developed based on a linguistic approach to readability that explores lexical, syntactic and semantic features. For this evaluation we extracted a corpus of 645 articles from Wikipedia together with their quality judgments. We show that information-theoretic features perform as well as their linguistic counterparts even if we explore several linguistic levels at once.

Author Biographies

Zahurul Islam, AG Texttechnology, Institut fur Informatik,

Alexander Mehler, Goethe-Universit ¨ at, Frankfurt, Germany

Downloads

Published

2013-06-29