Researchers at the University of Waikato, led by professor Te Taka Keegan with former master’s student Kingsley Eng, have built a high-fidelity text-to-speech system for the Waikato — Maniapoto dialect of te reo Māori that preserves community ownership of the voice, recordings and model. The system was designed in direct response to calls for sovereign digital systems after Māori language data was widely used by overseas AI firms without local input; enabling community control matters because it determines who can reuse, redistribute or profit from the voice and derived outputs.
The consenting human voice for the project is translator and language mentor Ngaringi Katipa. Recording began with passages that yielded 4.5 hours of raw material and later expanded, after adding targeted sentence and word lists, to a cleaned dataset totaling 7 hours and 45 minutes. Keegan and Eng said every technical decision prioritized local control over the recordings and model; Eng has since taken a role as a machine learning engineer at precision toolmaker Extec.

Te reo Māori poses specific technical challenges for text-to-speech because meaning often depends on vowel length and on pronunciations that differ from English. For example, keke (cake), kēkē (armpit) and kekē (to creak) rely primarily on vowel length for distinction, and the digraph “wh” is typically pronounced like “f.” Those phonemic contrasts, combined with relatively sparse digital resources for many Māori dialects, make English — centric off-the-shelf systems prone to errors when synthesizing pronunciation and intonation.
The project was explicitly developed against a backdrop in which large AI services can reproduce fluent te reo Māori after ingesting text and audio scraped from Māori communities and academic sources. Keegan notes that those datasets were frequently collected without Māori input, processed outside Aotearoa, and delivered through company — owned interfaces that do not transfer governance or control of outputs back to the originating language communities. Beyond creating a single synthetic voice, Keegan and Eng present the work as a replicable blueprint for other minority — language communities: recruit consenting local speakers, include targeted recordings that cover rare lexical items and dialectal forms, and retain ownership and governance over raw recordings, trained models and generated outputs.
structure data governance so the originating community retains rights to the final system.
Sources
Replies (0)
No replies in this topic yet.