Sentiment Analysis with Khasi Low-Resource Language through Generation of Sentiment Words using Machine Learning

Banteilang Mukhim, Arnab Kumar Maji, Sufal Das

Abstract


Sentiment Analysis is a Natural Language Processing (NLP) technique to find out the opinion and classify the opinion expressed in a text data with polarity (e.g., positive, negative and neutral). Khasi NLP is just starting to take shape, and ways back as compared to some Indian languages. Sentiment analysis with low resource language is a challenging task as the input data has limited annotated data. The proposed method suggests employing machine translation for the Khasi-English language pair to extract emotion-carrying words from Khasi text using an English emotion word dictionary. Despite the lack of specific sentiment analysis resources for Khasi, this approach enables the identification of sentiment-bearing phrases. After generation of Khasi sentiment words, a transformer-based model is considered for sentiment analysis as a validation tool.

Keywords


Sentiment analysis, khasi sentiment words, emotion mining, khasi language, khasi sentiment classification

Full Text: PDF