Fine-Tuning Speech Recognition for Wolof and Hausa in Maternal and Reproductive Health
YUX Cultural AI Lab
Automatic Speech Recognition (ASR) has advanced global communication and how people interact with technology—if you speak English, French, or another major global language. For millions of Africans who communicate primarily in indigenous languages like Wolof or Hausa, however, these tools remain inadequate, especially in critical areas such as healthcare.
When an ASR system fails to capture how someone describes their symptoms in their own language, the issue goes far beyond inconvenience—it becomes a barrier to healthcare access.
The Challenge: Two Key Gaps in ASR for African Languages
Research from the YUX Cultural AI Lab highlights two major reasons why ASR systems struggle in African healthcare:
- Domain-Specific Accuracy Gap – Generic ASR models often lack understanding of specialized medical vocabulary unless trained directly for it.
- Socio-Cultural Context Gap – Even when words are recognized, models frequently stumble over local accents, code-switching between languages, and culturally specific expressions.
These are not minor technical flaws. They are systemic challenges that prevent entire communities from benefiting from AI-powered healthcare.
Why Focus on Maternal and Reproductive Health?
The team chose to concentrate on maternal and reproductive health in Wolof and Hausa—two widely spoken African languages. Rather than attempting to build one universal model, we focused deeply on a single, high-impact domain.
We began with 250 essential healthcare keywords and expanded to 750 real-world phrases. Each Wolof translation was carefully validated for cultural and linguistic accuracy. The aim was not simply word-for-word translation but ensuring terms were meaningful and usable for the communities themselves.
The Technology: LoRA Fine-Tuning for Practical Adaptation
Training a speech model from scratch is prohibitively expensive for most African language projects. Instead, we applied Low-Rank Adaptation (LoRA)—a method that inserts small trainable layers into an existing model.
This approach allowed us to adapt OpenAI’s Whisper model to our domain without excessive computational costs or overfitting to limited data.
Before fine-tuning, we tested multiple open-source models, advancing only those with a Word Error Rate (WER) below 50%.
The Results: Measurable Improvements
Fine-tuning with LoRA (updating the decoder while freezing the encoder) yielded notable improvements:
- Training WER: 39.3% → 25.7%
- Validation WER: 41.6% → 33.5%
- Test WER: 31.5% → 29.6%
Beyond the metrics, the model successfully captured complex Wolof phonetics and grammar that generic systems consistently mishandled.
From Lab to Field: Nakala AI in Practice
These advances feed directly into Nakala AI, a platform supporting socio-behavioral researchers working in African languages. Instead of spending hours manually transcribing interviews, researchers can now access accurate, nuanced transcriptions and translations.
This reflects a critical principle: ASR for African languages must extend beyond lab-based benchmarks. It must deliver value in the real world, where accurate transcription directly supports better healthcare delivery.
What’s Next: Scaling Responsibly
The next phase of this work includes:
- Keyword Error Rate (KER): Measuring accuracy on key healthcare terms.
- Cross-Dialect Support: Ensuring performance across diverse accents and speech patterns.
- Data Augmentation: Leveraging methods like text-to-speech synthesis to expand training data.
- Beyond Healthcare: Applying this approach to finance, education, and other domains.
Why It Matters
This research demonstrates that community-informed, domain-specific AI can achieve more impact than generalized solutions. By focusing on real-world use cases, validating cultural relevance, and developing platforms like Nakala AI, it is possible to make speech recognition genuinely useful for African communities today—not just in the future.
The takeaway is clear: if speech recognition is to serve everyone, it must be built with the people who will use it—one language and one domain at a time.