Fine-Tuning Strategies for Dutch Dysarthric Speech Recognition: Evaluating the Impact of Healthy, Disease-Specific, and Speaker-Specific Data

This paper addresses the challenge of developing effective automatic speech recognition (ASR) systems for dysarthric speech, which remains difficult despite advances in speech technology. The research is motivated by the limited availability of diverse dysarthric speech datasets, which restricts the development of models that can generalize across different types and severities of dysarthria.
We investigate three distinct data-centric self-supervised learning approaches:
- Fine-tuning with a combination of dysarthric and healthy speech data
- Fine-tuning with disease-specific data (focusing on a particular condition that causes dysarthria)
- Fine-tuning with speaker-specific data (tailoring models to individual speakers)
Our results demonstrate that the first and third strategies yield significant improvements in recognition accuracy, while the disease-specific approach did not prove effective. These findings suggest that both general robustness from healthy speech and speaker adaptation are valuable for dysarthric ASR, while disease-specific patterns may be too variable to provide consistent benefits.
The research provides important insights for developing more inclusive speech technology that can serve individuals with speech disorders, potentially improving their access to voice-controlled devices and applications.