Automatic Opinion Extraction from Short Hebrew Texts using Machine Learning Techniques

Authors

  • Dror Mughaz Department of Computer Science, Jerusalem College of Technology, Jerusalem
  • Tzeviya Fuchs Department of Computer Science, Bar-Ilan University, Ramat-Gan
  • Dan Bouhnik Department of Computer Science, Jerusalem College of Technology, Jerusalem

DOI:

https://doi.org/10.13053/cys-22-4-3071

Keywords:

Automatic classification, machine learning, sentiment analysis

Abstract

Sentiment analysis deals with classifying written texts according to their polarity. Previous research in this topic has been conducted mostly for Latin languages, and no research has been done for Hebrew. This is important because it turns out that the task of text classification is extremely language-dependent. Furthermore, the work on sentiment analysis for English texts was mostly performed on relatively long documents. In this work, we focus specifically on classifying Modern Hebrew sentences according to their polarity. We compare various Machine Learning algorithms and techniques of classification. We added optimizations and methods that have not previously been used, and adjusted commonly used techniques so they would suit a Hebrew corpus. We elaborate on the differences in classifying short texts versus long ones and about the uniqueness of working specifically with Hebrew. Finally, our model achieved nearly 93% accuracy, which is higher than accuracies achieved previously in this field.

Downloads

Published

2018-12-30