Speech Tech
Team
Research into speech technology:
Frisian synthesis with Do et al.
Open source, freely available synthetic voice for the lesser-resourced language, Frisian.
- Do, P., Coler, M., Dijkstra, J., & Klabbers, E. (2023). Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection. arXiv preprint arXiv:2306.12040. Accepted at the Speech Synthesis Workshop 2023.
- Do, P., Coler, M., Dijkstra, J., & Klabbers, E. (2023). Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages. arXiv preprint arXiv:2305.19396 Accepted at INTERSPEECH 2023.
- Do, P., Coler, M., Dijkstra, J., & Klabbers, E. (2023). The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech. arXiv preprint arXiv:2306.00535. Accepted at INTERSPEECH 2023.
- Do, P., Coler, M., Dijkstra, J., & Klabbers, E. (2021, August). A Systematic Review and Analysis of Multilingual Data Strategies in Text-to-Speech for Low-Resource Languages. Accepted at INTERSPEECH 2021.
- Do, P., Coler, M., Dijkstra, J., & Klabbers, E. (2022, September). Frisian Text-to-Speech Using Cross-Lingual Transfer Learning: An Experiment with 5 Source Languages. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (pp. 16-22).
Sarcasm recognition and synthesis with Gao, Zhu, and Nayak
With Xiyuan Gao, Li Zhu, and Dr. Shekhar Nayak, we develop techniques to recognize and synthesize sarcasm from acoustic cues in speech, augmented with multimodal data relating to facial expression and gesture.
- Li, Z., Nayak, S., & Coler, M. . - (2023) Funding awarded for IEEE Mentoring Experiences for Underrepresented Young Researchers (ME-UYR) Program. Project title: Yeah, right! Integrating varieties of speech data for advancing naturalness and expressiveness in sarcastic speech synthesis.
- Li, Z., Gao, X., Nayak, S. & Coler, M. . (2023) SarcasticSpeech: Speech Synthesis for Sarcasm in Low-Resource Scenarios Proc. 12th ISCA Speech Synthesis Workshop (SSW2023).
- Gao, X., Nayak, S., & Coler, M. (2022). Deep CNN-based Inductive Transfer Learning for Sarcasm Detection in Speech. Proc. Interspeech 2022, 2323-2327.
Pathology recognition with Verkhodanova et al.
- Verkhodanova, V., Coler, M., Jonkers, R., de Bot, K., & Lowie, W. (2023) Acoustic change over time in speech of one bilingual individual with Parkinson's Disease. ICPhS.
- Verkhodanova, V., Coler, M., Jonkers, R., Timmermans, S., Maurits, N., de Jong, B., & Lowie, W. (2022). A cross-linguistic perspective to classification of healthiness of speech in Parkinson's disease. Journal of Neurolinguistics, 63, 101068.
- Verkhodanova, V., Coler, M., Jonkers, R., & Lowie, W. (2022). How expertise and language familiarity influence perception of speech of people with Parkinson’s disease. Clinical Linguistics & Phonetics, 36(2-3), 165-182.
- Verkhodanova, V., Coler, M., Jonkers, R., & Lowie, W. (2021). How expertise and language familiarity influence perception of speech of people with Parkinson’s.
- Verkhodanova, V., Trčková, D., Coler, M., & Lowie, W. (2020). More than words: Cross-linguistic exploration of Parkinson’s disease identification from speech. In Speech and Computer: 22nd International Conference, SPECOM 2020, St. Petersburg, Russia, October 7–9, 2020, Proceedings 22 (pp. 613-623). Springer International Publishing.
- Verkhodanova, V., Coler, M., Jonkers, R., de Bot, K., & Lowie, W. (2019, August). Prosodic Changes in the Speech of a Single Speaker with Parkinson's Disease. In 19th International Congress of Phonetic Sciences (pp. 3046-3050). International Phonetic Association.
- Verkhodanova, V, Timmermans, S., Coler, M., Jonkers, R., de Jong, B., & Lowie, W. (2019). How dysarthric prosody impacts naïve listeners’ recognition. In Speech and Computer: 21st International Conference, SPECOM 2019, Istanbul, Turkey, August 20–25, 2019, Proceedings 21 (pp. 510-519). Springer International Publishing.
- Verkhodanova, V., & Coler, M. (2018). Prosodic and segmental correlates of spontaneous Dutch speech in patients with Parkinson’s disease: A pilot study. Age, 1000(65.1), 65.
- Strinzel, M., Verkhodanova, V., Jalvingh, F., Jonkers, R., & Coler, M. (2017). Acoustic and Perceptual Correlates of Vowel Articulation in Parkinson’s Disease With and Without Mild Cognitive Impairment: A Pilot Study. In Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings 19 (pp. 56-64). Springer International Publishing.
Together with Dr. Vass Verkhodanova, we examine speech of people with Parkinson’s Disease (PD) to determine how PD is manifest in speech and how medical experts detect the disease based only on voice. We analyze acoustic cues specific to the speech of multilingual PD patients in several languages. By bridging speech processing tools and expert knowledge, we seek to develop techniques to automatically detect PD from speech.
Outreach activities
- Director of the MSc Voice Technology. We will train a new generation of graduates in the burgeoning development of speech tech, pioneering innovations that transform how humans interact with machines to the benefit of humankind. More info
- Coordinator of the Speech Tech Summer School 2023. Official page is here. More info and an overview of speakers can be found here.
- Organizer of the International LITHME Conference. We served as the local host for the 3rd International LITHME conference.
- Organizer + Speaker, Dutch Speech Tech Day. (02/2023)
- Mozilla Contribute-a-thon: Frisian vs Dutch speech contributions — see also this link (2021)