Tunisian Dialect Sentiment Analysis: A Natural Language Processing-based Approach
Abstract
Social media platforms have been witnessing
a significant increase in posts written in the Tunisian
dialect since the uprising in Tunisia at the end of
2010. Most of the posted tweets or comments reflect
the impressions of the Tunisian public towards social,
economical and political major events. These opinions
have been tracked, analyzed and evaluated through
sentiment analysis systems. In the current study,
we investigate the impact of several preprocessing
techniques on sentiment analysis using two sentiment
classification models: Supervised and lexicon-based.
These models were trained on three Tunisian datasets
of different sizes and multiple domains. Our results
emphasize the positive impact of preprocessing phase
on the evaluation measures of both sentiment classifiers
as the baseline was significantly outperformed when
stemming, emoji recognition and negation detection
tasks were applied. Moreover, integrating named
entities with these tasks enhanced the lexicon-based
classification performance in all datasets and that of the
supervised model in medium and small sized datasets.
a significant increase in posts written in the Tunisian
dialect since the uprising in Tunisia at the end of
2010. Most of the posted tweets or comments reflect
the impressions of the Tunisian public towards social,
economical and political major events. These opinions
have been tracked, analyzed and evaluated through
sentiment analysis systems. In the current study,
we investigate the impact of several preprocessing
techniques on sentiment analysis using two sentiment
classification models: Supervised and lexicon-based.
These models were trained on three Tunisian datasets
of different sizes and multiple domains. Our results
emphasize the positive impact of preprocessing phase
on the evaluation measures of both sentiment classifiers
as the baseline was significantly outperformed when
stemming, emoji recognition and negation detection
tasks were applied. Moreover, integrating named
entities with these tasks enhanced the lexicon-based
classification performance in all datasets and that of the
supervised model in medium and small sized datasets.
Keywords
Tunisian sentiment analysis;text preprocessing; named entities.