Gender Recognition of Teen and Adult Voices in Non-Tonal and Tonal Languages in Uncontrolled Environments
Abstract
Voice gender recognition systems is a term that refers the automatization of gender detection by an acoustic signal of voice. These systems can be trained in uncontrolled environments, whose audios present different types of noises and speaker characteristics. However, the current systems present a bias in the training language, which is usually mainly English. The present work focused on the gender recognition of adult and teen voices in a group of tonal languages and Spanish under uncontrolled environments. The features used were 7 derived from pitch, and two from the mean of the fourth formant and vocal tract length. Two scenarios were built: a training-test scenario on one dataset, and a second validation scenario using the other dataset. The metrics used were accuraccy, recall, F1-score, and area under the ROC curve. The algorithms used were Multilayer Perceptron and Random Forest. Despite the bias in the datasets, the biological features and the algorithms were robust to language change.
Keywords
Voice gender recognition; fundamental frequency; vocal tract length; tonal language; spanish language