Harnessing Uncleaned Data for Stress Detection in Tamil and Telugu Code-Mixed Texts

doi:10.13053/cys-29-3-5499

Harnessing Uncleaned Data for Stress Detection in Tamil and Telugu Code-Mixed Texts

Luis Ramos, Moein Shahiki-Tash, Zahra Ahani, Alex Eponon, Olga Kolesnikova, Hiram Calvo

Abstract

Stress is a common experience in daily life, but it can significantly impact mental well-being in certain situations, making the development of robust detection models imperative. This proposal introduces a methodical approach to the stress detection in code-mixed texts for Dravidian languages. The challenge encompassed two datasets, targeting Tamil and Telugu languages respectively. This proposal underscores the importance of testing uncleaned text, such as deleting emojis, special characters, etc., in classification methodologies. In this proposal were evaluated Logistic Regression, Random Forest and Support Vector Machine algorithms featuring three textual representations: TF-IDF, word and character N-grams. This proposal demonstrated strong performance across both languages, achieving a Macro F1-score of 0.75 for Tamil and 0.74 for Telugu, surpassing the results obtained using other complex techniques involving LLMs. The results underscore the value of uncleaned text for mental state detection and the challenges of classifying code-mixed texts in Dravidian languages, indicating that there is potential to be explored, especially in Tamil and Telugu texts.

Keywords

Stress, NLP, Machine Learning, LLM, SMOTE, Code-Mixed, Tamil, Telugu

Full Text: PDF

Username
Password
Remember me