Google and a consortium of Africa’s leading research institutions announced the launch of WAXAL, a large-scale, openly accessible audio dataset designed to accelerate research and build more comprehensive AI technologies.
The WAXAL dataset was developed over three years with funding from Google. This project includes 1,250 hours of transcribed natural audio and over 20 hours of high-quality studio recordings designed to construct high-fidelity synthetic voices.
Although voice-enabled technology has become commonplace in many parts of the world, a severe lack of high-quality voice data has hindered the development of voice-enabled technology in most of Africa’s more than 2,000 languages. This has left hundreds of millions of people without access to technology in their native language.
“The ultimate impact of WAXAL is to empower people in Africa,” said Aisha Walcott-Bryant, Head of Google Research Africa. “This dataset provides a critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own language, and ultimately reach more than 100 million people. We look forward to seeing African innovators use this data to create everything from new educational tools to voice-enabled services, everything that creates tangible economic opportunities across Africa.”
A core principle of the project was to ensure it was built by and for the community. African academic institutions and community organizations, including Makerere University (Uganda), University of Ghana, and Digital Umuganda (Rwanda), led the data collection with guidance from Google experts. These partner institutions retain full ownership of their data and establish a new framework for fair, partnership-driven AI development.
“For AI to have real impact in Africa, it must speak our language and understand our context. The WAXAL dataset will provide our researchers with the high-quality data they need to build voice technologies that reflect our unique communities. In Uganda, we are already strengthening local research capacity and supporting new student- and faculty-led projects.” – said Joyce Nakatumba Nabende, a senior lecturer in the School of Computing and Information Technology, Makerere University.
The dataset covers the following languages: Acholi, Akan, Dagare, Dagbani, Duluo, Ewe, Fante, Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Kikuyu, Lingala, Luganda, Malagasy, Masaba, Nyankore, Rukiga, Shona, Soga. (Lusoga), Swahili, Yoruba.
“For us at the University of Ghana, the impact of WAXAL goes beyond the data itself. Thanks to WAXAL, we have been able to build our own linguistic resources and train a new generation of AI researchers. 7,000 More than a dozen volunteers joined because they wanted their voices and languages to be relevant to the digital future. Today, their collective efforts are sparking an ecosystem of innovation in areas such as health, education and agriculture, proving that where data exists, possibilities exist everywhere.” – Professor Isaac Wiafe, Associate Professor, University of Ghana.


