4 min read· The Lingo team

Why we're racing to save Cameroon's oral languages

Cameroon holds around 250 languages. Most have never been written down — they live only in the mouths of their last fluent speakers. This is where we begin.

missionlanguage preservation

Cameroon is one of the most linguistically dense countries on Earth — roughly 250 languages for a population smaller than many single cities elsewhere. It is a treasure and a warning at once.

A treasure, because each of those languages is a complete way of seeing: its own metaphors, its own taxonomy of plants and kinship, its own humour. A warning, because most of them are oral-first — they have never had a standard spelling, a dictionary, or a single page of digital text. When the last fluent grandparents pass, the language does not get archived. It simply stops.

The problem with "just write it down"

The instinct is to say: record it, transcribe it, done. But that instinct quietly assumes a writing system, a keyboard layout, literate speakers, and someone to do the transcribing. For a language spoken by a few thousand elders in a few villages, none of that exists. Asking people to write a language they have only ever spoken is asking them to do the one thing the language was never built for.

So the data simply isn't there. No parallel corpora, no transcripts, no labelled audio. The modern AI playbook — "scrape a few billion words" — has nothing to scrape.

Our wager

We think the path runs the other way around. Instead of demanding text from oral communities, we should:

  1. Start with whatever parallel text does exist, however narrow, and build first translation models from it.
  2. Use those models to lower the barrier — so a speaker sees a prompt in a language they read, and simply speaks the answer in their mother tongue.
  3. Turn those spoken contributions into open voice data that trains the next, better generation of models.

It is a loop designed for people who speak a language but may never have written it. The rest of this log is the story of building that loop — the corpus we had to compile by hand, the pivot architecture that squeezes translations out of almost no data, the funding that keeps it alive, and the engineering that keeps 56 languages online for the price of a coffee.

We are racing because the clock is real. Every year of delay is voices we cannot get back.