🗣️ The Blizzard Challenge 2025: A voice for Bildts

Aug 28, 2025·
Matt Coler
Matt Coler
· 3 min read

Yesterday marked a special day for speech synthesis research as teams from around the world gathered at Campus Fryslân for the Blizzard Challenge 2025. This year’s challenge was focused on creating synthetic voices for Bildts, a regional language spoken by roughly 6,000 people here in Friesland.

This wasn’t just another technical challenge. It represented something I’ve been advocating for throughout my career: using cutting-edge AI to nourish linguistic diversity.

Why Bildts matters

Bildts emerged from centuries of language contact between Frisian and Dutch varieties from South and North Holland. It’s officially recognized as a regional language by Fryslân province, but like many minority languages, it faces pressure from dominant languages.

The Speech Synthesis Workshop (SSW) chose Bildts deliberately, aligning with this year’s theme: “Scaling down: sustainable synthesis for language diversity.” Instead of focusing on major world languages that already have extensive AI support, the challenge asked: Can we build high-quality synthetic voices for languages with limited digital resources?

The technical challenge

Seven international teams tackled two main challenges:

1️⃣ Main Task (MH1): Build a voice using some 7 hours of speech data from Jan de Groot, a speaker from Omrop Fryslân who provided audio columns averaging around 3 min ea.

2️⃣ Zero-shot Task (ZH1): Create voices from minimal reference audio—just 4-11 seconds per speaker. This tested whether systems could capture both language-specific characteristics and individual speaker traits with extremely limited data.

The teams represented diverse technical approaches:

  • Four used encoder-decoder architectures (GIPSA-Lab, ZMM-TTS, Idiap, Speechmatics)
  • Three employed flow-based or diffusion-based models (Vo2LIUM, GiantNetwork, NRCC)
  • Some relied on mel-spectrograms while others used self-supervised learning representations
  • Approaches to pronunciation varied from explicit phonetic modeling to learning directly from text

Innovative evaluation

We designed an evaluation framework that respected the language community itself. Rather than relying solely on standard metrics, threre were 3 listener groups:

  1. Bildts speakers (the primary evaluators): Recruited through newspapers, mailing lists, and community outreach
  2. Dutch speakers: To explore how linguistic distance affects perception
  3. International speakers: For baseline quality assessments

The evaluation measured multiple dimensions: how “Bildts-like” the synthesis sounded, naturalness, appropriateness in context, and various quality metrics. Crucially, Bildts speakers could identify issues with pronunciation, intonation, and stress. These would be impossible to capture without community involvement.

Results

Different teams excelled in different areas, and performance varied between the main task and zero-shot challenge. Some systems produced highly natural-sounding speech but struggled with language-specific characteristics. Others captured Bildts phonetic features well but with less natural prosody.

The analysis showed that building voices for under-resourced languages requires more than just applying existing techniques to new data. It demands understanding the linguistic structure, community needs, and cultural context of the target language.

Looking Forward

The Blizzard Challenge began 20 years ago, founded by Keiichi Tokuda and Alan Black, focusing primarily on well-resourced languages. This year’s shift toward endangered and minority languages signals a maturation of the field and a recognition of AI’s responsibility to serve linguistic diversity.

The synthetic voices created for Bildts won’t just remain academic exercises. Community representatives, including Gerard de Jong (coordinator of multilingualism for municipality Waadhoeke and president of the European Bureau of Minority Languages), participated in evaluations and expressed enthusiasm about potential applications for language education and preservation.

As AI systems become increasingly multilingual, the Blizzard Challenge 2025 demonstrates that cutting-edge technology can serve not just the languages with the most data, but those most in need of digital support. The question isn’t whether we can build AI for minority languages; it’s whether we will.

The Speech Synthesis Workshop and Blizzard Challenge 2025 were organized by researchers from the University of Groningen, University of Helsinki, and KTH Stockholm, with support from the Fryske Akademy and local language communities.