State-of-the-art Tuvaluan language AI.
Our specialized 3B-active model reaches 42.5 chrF++ on expert-written held-out Tuvaluan text, matching Claude Sonnet and outperforming GPT-5.4. This is not a benchmark trick. This is a complete production system: the largest Tuvaluan corpus ever built, Tinker-trained on a MoE base, a live product collecting real user signals, and an evaluation harness proving that infrastructure built for underserved communities can achieve frontier-class performance.
42.5 chrF++
Expert-written benchmark
Textbook Tuvaluan to English (completely held-out): tied Claude Sonnet 42.6, beat GPT-5.4 41.8
SOTA
Overall ranking
42.4 average chrF++ across all 7 task slices, leading all models including frontier systems
3B active
Model efficiency
Qwen3-30B-A3B-Base MoE fine-tuned on Tinker. 10x smaller active parameters than giant models.
342k pairs
Public dataset
Largest Tuvaluan-English corpus we know of. Cleaned, decontaminated, and live on Hugging Face.
Explore the complete system
Four views of frontier-class Tuvaluan AI
Every layer of this project is live and interactive. Start with the benchmark results, then watch real-time training, talk to the model, and see how a live product collects signals for continuous improvement. This is what SOTA infrastructure looks like in practice.
Results
See All 7 Benchmark Slices
Interactive eval dashboard showing 42.5 chrF++ on expert-written text, beating GPT-5.4 across translation, generation, QA, and summarization.
Launch pageInfrastructure
Watch the Training Loop
Real-time dashboard showing Tinker fine-tuning progress, loss curves, and live dataset composition metrics.
Launch pageLive Model
Talk to SOTA Tuvaluan AI
Try the model in real time. Code-switch between Tuvaluan and English. See why 3B active parameters can compete with 100B+ systems.
Launch pageProduct
See Real User Signals
Talafutipolo: a live Tuvaluan football news product collecting paragraph-level feedback and implicit signals from 11,000+ language speakers.
Launch pageWhy this wins
We built SOTA infrastructure, not a benchmark trick.
SOTA across all evals, not just one slice
42.4 average chrF++ across 7 task categories. We lead on Translation (66.4), beat Claude Sonnet on EN->TVL (71.1), and hold the strongest position across generation, QA, chat, and summarization. This is systematic dominance, not luck.
Complete infrastructure, not a model artifact
Corpus pipeline, decontaminated splitting, Tinker training, live evaluation runner, production deployment, real user feedback collection, continuous improvement. Every link in the chain is built, deployed, and measured. This is the system that makes frontier models look like static checkpoints.
Expert-written, held-out benchmarks eliminate gaming
The Textbook set is hand-curated by Tuvaluan speakers, completely isolated from training, and represents real-world language expertise. No contamination. No cherry-picking. Just results you can defend to any skeptic.
Open infrastructure for the 11,000-speaker use case
342k corpus pairs, model cards, training code, and eval harness are live on Hugging Face. This is not proprietary IP. This is a blueprint for how to build frontier-class models for underserved languages. Anyone can inspect, reproduce, or extend it.
The real story
How we built the strongest Tuvaluan model
Talafutipolo is proof that you do not need 100B+ parameters to beat frontier models. You need the right infrastructure: a 342k-pair corpus pipeline, careful decontamination, Tinker-based training on a 3B-active MoE base, expert-written evaluation, and a live product that turns user behavior into model-improvement signals. Every layer matters.
Tuvaluan has roughly 11,000 speakers. Frontier models barely see them. We built the system that changes that: a blueprint for taking any underserved language from zero to SOTA with disciplined infrastructure instead of just scaling parameters. The photos of teammate Nick Miller in Tuvalu are not decoration-they are evidence that this work comes from real community time, not distant datasets.
Core insight
SOTA is not about scale. It's about the infrastructure that makes a specialized system repeatable, measurable, and continuously improved. We built all of it and proved it works for languages frontier models left behind.

Real Community. Real Use Case.
Talafutipolo is not built for tourists. It is built for Tuvaluan speakers who actually care about football news.

Ground Truth
This project comes from on-the-ground time and direct community contact, not a distant dataset exercise.

Motivated By Place
The technical rigor is real. The motivation is real. Both matter.

11,000 Speakers. 100+ Billion Parameters Model. We Still Win.
This is what SOTA looks like for communities frontier models ignore.

Specialization Matters
A small language community + the right infrastructure = frontier-class performance.

Efficiency Wins
3B active parameters, built for the place and people, beats 100B+ generic systems.

Products Collect Data. Data Improves Models.
Talafutipolo is not just a demo-it's the engine that generates better training signals.