May 10, 2024 5 min read· The Lingo team

French as a pivot: translating with almost no direct data

We can't train a Ghomálá'↔Ewondo model — there's no data for it. So we route everything through French. Here's how the pivot architecture works, and what it costs.

modelsarchitectureMarianMT

Say a speaker wants to go from Ghomálá' to Ewondo. There is no parallel Ghomálá'–Ewondo corpus. There never will be one large enough to train on directly. Multiply that by every pair of 56 languages and you get over three thousand impossible models.

The pivot trick

Instead of training every pair, we train every language against a shared hub language — French, the working language across much of Cameroon and the one our Bible alignments share. Then any translation becomes a chain:

Ghomálá' → French → Ewondo

We only ever need X → French and French → X for each language — about 112 models instead of thousands. Add a language and you add two models, not hundreds.

How the models are built

Each direction is a compact MarianMT sequence-to-sequence model (the Helsinki OPUS architecture): 6 encoder and 6 decoder layers, ~75M parameters, fine-tuned on our verse-aligned data. Small enough to run on a CPU; numerous enough to cover the whole network through the French hub.

At inference, the worker parses the requested chain, loads each model in turn, and pipes the output of one stage into the next.

What the pivot costs

Pivoting is not free:

Error compounds. Every hop can introduce mistakes, and the second hop translates the first model's output, imperfections included.
Nuance leaks at the hub. Anything French can't easily express is a detail that may not survive the round trip.
Latency adds up. Two model loads and two generations per translation.

We accept these costs because the alternative isn't a better model — it's no model at all. A pivoted translation that gets someone 80% of the way there is infinitely more useful than a direct model that cannot exist. And every one of those costs gets smaller as the voice project replaces written, formal data with real, spoken contributions.