Spoken Language Identification for Short Utterance with Transfer Learning

Authors

  • Ana Montalvo-Bereau Advanced Technologies Application Center (CENATAV)
  • Jose Ramón Calvo-de-Lara Advanced Technologies Application Center (CENATAV)
  • Gabriel Hernández-Sierra Advanced Technologies Application Center (CENATAV)
  • Flavio Reyes-Díaz Advanced Technologies Application Center (CENATAV)

DOI:

https://doi.org/10.13053/cys-28-3-5180

Keywords:

Spoken language recognition, deep learning, transfer learning, multi-task learning

Abstract

Spoken language recognition is a research field that has received considerable attention due to its impact on several tasks related to multilingual speech processing. While it has been demonstrated that the useof contextual and auxiliary task information can enhance the results within this field, this avenue has not beenfully explored. In the present work, we propose to address the spoken language recognition task in short utterances by considering two speech-related tasks as auxiliaries in a multi-tasking architecture. The primary task was language recognition, with sex and speaker identity serving as auxiliary tasks. Three models from disparate approaches were implemented and trained ina single-task and multi-task learning paradigm. The models considered were 2D-CNN based, one of which was a proposed configuration designed to address less than a second utterances. The experiments were conducted on a subset of the VoxForge corpus, with a markedly limited amount of signals. The results demonstrate that the spoken language recognition task benefits from multi-task learning by using sex and speaker identity as auxiliary tasks over three different models.

Downloads

Published

2024-09-17

Issue

Section

Articles of the Thematic Section