Spoken Language Identification for Short Utterance with Transfer Learning

Ana Montalvo-Bereau, Jose Ramón Calvo-de-Lara, Gabriel Hernández-Sierra, Flavio Reyes-Díaz

Abstract


Spoken language recognition is a research field that has received considerable attention due to its impact on several tasks related to multilingual speech processing. While it has been demonstrated that the useof contextual and auxiliary task information can enhance the results within this field, this avenue has not beenfully explored. In the present work, we propose to address the spoken language recognition task in short utterances by considering two speech-related tasks as auxiliaries in a multi-tasking architecture. The primary task was language recognition, with sex and speaker identity serving as auxiliary tasks. Three models from disparate approaches were implemented and trained ina single-task and multi-task learning paradigm. The models considered were 2D-CNN based, one of which was a proposed configuration designed to address less than a second utterances. The experiments were conducted on a subset of the VoxForge corpus, with a markedly limited amount of signals. The results demonstrate that the spoken language recognition task benefits from multi-task learning by using sex and speaker identity as auxiliary tasks over three different models.

Keywords


Spoken language recognition, deep learning, transfer learning, multi-task learning

Full Text: PDF