Prediction of Lexical-Semantic Relations In Belarusian: From Word2Vec to LLM
Abstract
This paper examines the ability of language models to capture semantic relations between words in a low-resource language. We describe experiments on automatic prediction of lexical-semantic relations in Belarusian using models of Word2Vec, BERT, and LLM families which differ in neural architecture, feature types and NLP applications. Training and f ine-tuning of the models was carried out on the datasets compiled for our study: Belarusian corpora with UD POS tagging and a database of synonyms and antonyms extracted from Belarusian dictionaries. Model performance was evaluated by pseudo-disambiguation test (Word2Vec CBOW and skip-grams) as well as by expert assessments (roberta-small-Belarusian, Gemini 2.5 Pro). The results proved to be valid and can be applied to create and enrich lexical databases, to analyse word co-occurrence, to improve machine translation, paraphrasing, summarization, and other systems related to automatic processing of the Belarusian language.
Keywords
Lexical-semantic relations, the Belarusian language, Word2Vec, BERT, Large Language Models