The Impact of Training Methods on the Development of Pre-trained Language Models

Authors

  • Diego Uribe TecNM Instituto Tecnologico de La Laguna
  • Enrique Cuan
  • Elisa Urquizo

DOI:

https://doi.org/10.13053/cys-28-1-4718

Keywords:

language models, pre-training tasks, BERT, fine-tuning

Abstract

The focus of this work is to analyze the implications of pre-training tasks in the development of language models for learning linguistic representations. In particular, we study three pre-trained BERT models and their corresponding unsupervised training tasks (e.g. MLM, Distillation, etc.). To consider similarities and differences, we fine-tune these language representation models on the classification task of four different categories of short answer responses. This fine-tuning process is implemented with two different neural architectures: with just one additional output layer and with a multilayer perceptron. In this way, we enrich the comparison of the pre-trained BERT models from three perspectives: the pre-training tasks in the development of language models, the fine-tuning process with different neural architectures, and the computational cost demanded on the classification of short answer responses.

Downloads

Published

2024-03-20

Issue

Section

Articles