Automatic Opinion Extraction from Short Hebrew Texts using Machine Learning Techniques
Abstract
Sentiment analysis deals with classifying written texts according to their polarity. Previous research in this topic has been conducted mostly for Latin languages, and no research has been done for Hebrew. This is important because it turns out that the task of text classification is extremely language-dependent. Furthermore, the work on sentiment analysis for English texts was mostly performed on relatively long documents. In this work, we focus specifically on classifying Modern Hebrew sentences according to their polarity. We compare various Machine Learning algorithms and techniques of classification. We added optimizations and methods that have not previously been used, and adjusted commonly used techniques so they would suit a Hebrew corpus. We elaborate on the differences in classifying short texts versus long ones and about the uniqueness of working specifically with Hebrew. Finally, our model achieved nearly 93% accuracy, which is higher than accuracies achieved previously in this field.
Keywords
Automatic classification, machine learning, sentiment analysis