WavLM-Based Automatic Pronunciation Assessment for Yuhmu Speech: A Low-Resource Language
Abstract
This paper presents an approach to classify correct and incorrect pronunciation in Yuhmu, an endangered Indigenous Minority Language, using acoustic embeddings combined with SVM and MLP models. Unlike typical low-resource language tasks focused on automatic speech recognition (ASR) or machine translation, this work employs deep acoustic representations to detect phonetic quality, achieving high accuracy and consistency across different embedding sizes. The results highlight the potential of leveraging labeled audio data and advanced speech models like WavLM to provide phonetic feedback and support language revitalization. This research establishes a foundation for deeper computational phonetic analysis in Yuhmu and opens avenues for future exploration in direct audio-to-audio translation, automatic phonetic segmentation, and detailed phoneme-level evaluation, contributing to the documentation and preservation of underrepresented languages.
Keywords
Low resources languages, Yuhmu language, supervised learning, speech analysis