Research log
Notes from the work
The choices, the dead ends, the data we had to compile by hand, and the engineering that keeps dozens of languages online for almost nothing.
- 7 min
Why we're moving to compressed (int8) models
We re-built our entire serving stack on int8 models: ~3.8× smaller and ~6× faster on CPU, for a quality cost that — on narrow-domain, low-resource models — sits below the noise. Here's the data, the examples, and the reasoning.
engineeringquantizationdata - 6 min
Fitting 56 languages on a free server: int8 and the economics of always-on
How we keep dozens of translation models online for almost nothing: CPU nodes that yield to GPUs, idle-eviction, and int8 quantization that's nearly free precisely because our models are trained on a narrow, formal corpus.
engineeringquantizationinfrastructure - 4 min
Funding open language preservation
Open, free, and sustainable is a hard trio. A few words on who funds this work, why we keep it cheap on purpose, and what 'archived but alive' means.
fundingsustainability - 5 min
From text to voice: the next chapter
Text was the on-ramp. But these are spoken languages, and the people who hold them speak more than they write. Lingo becomes a place to contribute your voice.
productvoicecommunity - 3 min
Open-sourcing our first models on Hugging Face
The first French-pivot models for Cameroonian languages are public, open, and downloadable. Here's what shipped and why we gave it away.
releaseopen source - 5 min
French as a pivot: translating with almost no direct data
We can't train a Ghomálá'↔Ewondo model — there's no data for it. So we route everything through French. Here's how the pivot architecture works, and what it costs.
modelsarchitectureMarianMT - 7 min
Building a corpus from scarcity
Training data for these languages doesn't sit in a database — we had to go and find it: scanned books, pamphlets, blogs, purchased booklets, and scripture as the aligned backbone, all unified under one alphabet.
datacorpusmethodology - 4 min
Why we're racing to save Cameroon's oral languages
Cameroon holds around 250 languages. Most have never been written down — they live only in the mouths of their last fluent speakers. This is where we begin.
missionlanguage preservation