Fine-Tuning Strategies for Dutch Dysarthric Speech Recognition: Evaluating the Impact of Healthy, Disease-Specific, and Speaker-Specific Data

Sep 1, 2024·
Spyretta Leivaditi
,
Tatsunari Matsushima
Matt Coler
Matt Coler
,
Shekhar Nayak
,
Vass Verkhodanova
· 1 min read
Abstract
Despite significant advancements in automatic speech recognition technology (ASR) the performance of such systems on dysarthric speech is still inadequate for widespread use. One key reason is the lack of sufficiently rich and diverse dysarthric speech datasets to train machine learning models that could handle all types and varieties of such speech. Motivated by the data scarcity problem, as well as by successful applications of self-supervised learning (SSL) in ASR for low-resource languages, this paper investigates and evaluates the effectiveness of three different data-centric SSL training strategies in improving Dutch dysarthric speech recognition. The first strategy involves fine-tuning with both dysarthric and healthy speech data, the second with disease-specific data and the third with speaker-specific data. The first and third strategies are proven effective, while the second one, though ineffective, provides valuable insights for further research.
Type
Publication
In Proceedings of Interspeech 2024

This paper addresses the challenge of developing effective automatic speech recognition (ASR) systems for dysarthric speech, which remains difficult despite advances in speech technology. The research is motivated by the limited availability of diverse dysarthric speech datasets, which restricts the development of models that can generalize across different types and severities of dysarthria.

We investigate three distinct data-centric self-supervised learning approaches:

  1. Fine-tuning with a combination of dysarthric and healthy speech data
  2. Fine-tuning with disease-specific data (focusing on a particular condition that causes dysarthria)
  3. Fine-tuning with speaker-specific data (tailoring models to individual speakers)

Our results demonstrate that the first and third strategies yield significant improvements in recognition accuracy, while the disease-specific approach did not prove effective. These findings suggest that both general robustness from healthy speech and speaker adaptation are valuable for dysarthric ASR, while disease-specific patterns may be too variable to provide consistent benefits.

The research provides important insights for developing more inclusive speech technology that can serve individuals with speech disorders, potentially improving their access to voice-controlled devices and applications.