← Back to Blog

Gradium TTS, upgraded: more accurate Text-To-Speech

7 min read
Gradium TTS upgrade

Gradium TTS now runs on a new model. It improves two things that matter for production voice agents: how reliably it pronounces the hard cases, and how natural the output sounds. You can hear the pronunciation gains here, head-to-head against other providers. This post covers what changed, how we built it, and how it compares against the rest of the field.

Because most production voice agents in our customer use cases for these types of examples run over the phone, every audio sample here plays at 8 kHz (telephone quality).

ENOur principal asked that any parent wishing to volunteer for the spring field trip send a confirmation message to volunteers@school.org or reply directly to ms.taylor@hotmail.com by Wednesday afternoon.

Gradium
Cartesia Sonic 3.5

FRÀ l'admission, le patient mesurait 1,80 m pour 78 kg, sa tension affichait 12,5 sur 8, et l'infirmière a noté un périmètre crânien de 56 cm avant de transmettre le dossier complet au médecin de garde.

Gradium
ElevenLabs Flash v2.5

What's new

Pronunciation accuracy

Accuracy for standard and hard cases is the second focus of this release. Gradium TTS is now substantially more accurate on important cases:

  • Character spelling. Reading out letters or special characters in common expressions such as email addresses, reference numbers, or confirmation codes, without slurring or skipping characters.
  • Text expansion. Distinguishing strings of text like acronyms that should be spelled out (API, SQL, URL) from those pronounced as words (NASA, UNESCO), and handling mixed forms.

These are the moments an agent has to nail on the first run, with no room for ambiguity.

Hear the difference

Phone numbers and licence plates

The press release listed three media contacts: Sarah at 212-555-0177, James at +44 161 496 0123, and the international desk at +33 1 70 36 21 00 for European inquiries.

Gradium TTS
Previous model

How we built it

We train Gradium TTS on real-world feedback from our users, gathered through direct conversations and the cases they share with us. Rather than optimizing against clean public benchmarks, we target what actually breaks in production: the messy, hard cases voice agents hit every day.

For each one we cover the full range of variations: every way an acronym or date can be written, across every language we support. We also cover interruptions, domain-specific terms, codes read aloud, and cross-lingual input. That is why the gains show up where they matter, not just on benchmark sentences.

How we measure it

We evaluate our model on two axes: objective metrics (e.g. word error rate, speaker similarity) and human preference. The latter is measured two ways: pairwise comparisons in the style of public voice arenas (ELO scores), and pass/fail listening tests (% success rate).

In a global pairwise test, the new model wins head-to-head against the previous model in all five languages.

Pronunciation accuracy

We use simple pass/fail listening tests (% success rate). The new model matches or beats the previous model in nearly every category, with the one clear jump on common expressions in Spanish (+11.1 pts); a couple of categories dip slightly, all within statistical noise (i.e. under ±2 pts). In the table, each cell reads the new model's improvement in success rate (delta in percentage points).

Subcategory EN FR ES PT DE
Spelling & acronym
spelling, expansion logic, alphanumericals
+4.4 +8.5 +4.8 +5.4 +0.8
Numbers & symbols
floating & large numbers, currency, ranking, calendar
+3.5 +8.7 +0.3 −1.0 +4.2
Common expressions
emails, phone numbers, URLs, codes, banking details
+5.5 −1.1 +11.1 +5.2 +3.7
Audio artifacts
shrilling, glitches or stability
+3.6 −2.0 +3.7 +5.5 +5.4

Compared to other providers

For every new release, we also benchmark Gradium TTS against the models that compete for voice agent workloads: real-time, low-latency TTS in the same class.

Every competitor is tested through its own API, so the comparison matches what you would ship. With Gradium TTS, the studio experience is identical to the API, so these samples are exactly what you get in production.

Head-to-head preference tells you which voice listeners like. But voice agents just have to be right, reading an email address, a phone number or a reference code correctly the first time. We specifically measured success rate (the share of test samples a native-speaker evaluator judged correct) across a variety of accuracy criteria for the following models:

  • Cartesia Sonic 3.5
  • Inworld TTS 1.5 Max
  • ElevenLabs Flash v2.5
  • ElevenLabs Multilingual v2

A gap is called a lead only when 95% confidence intervals don't overlap; otherwise the models are on par.

Where Gradium TTS pulls ahead: the hard pronunciation cases.

These are the specific failure modes that surface in live calls, where one mistake means the customer mishears a code or a callback number. They include ranking and ordinals, email addresses, phone numbers, time and dates, and acronyms. This last group covers the whole alphanumeric spell-out family (real acronyms plus IDs, license plates, order/policy/reference numbers, and URLs and codes).

Let's zoom in on English and French.

In English, Gradium TTS now reads email addresses correctly 97% of the time, top of the field, and handles time expressions more reliably than both ElevenLabs models (86% vs 51–61%). Every other criterion lands in the same range as competitors, above 70%.

In French, Gradium TTS leads the field on every hard case. It scores 95% on ranking and ordinals (e.g. "le n° 1", "Napoléon Ier") against 70% for the next-best competitor and under 45% for the rest, and 93% on phone numbers vs 51–58% for ElevenLabs and Inworld. On email addresses, it's right 62% of the time, more than double every competitor (all at 29% or below). It stays far ahead on acronyms, codes and ID numbers (where rivals drop as low as 30% on reference numbers), and comes out ahead on time and dates too.

Overall success rate on French hard-pronunciation cases (higher is better), scored by native speakers: Gradium TTS 63.2%, Cartesia Sonic 3.5 59.1%, ElevenLabs Multilingual v2 43.4%, Inworld TTS 1.5 Max 41.0%, ElevenLabs Flash v2.5 35.5%.
Overall success rate on French hard-pronunciation cases, scored by native speakers.

Beyond numbers, just give it a try yourself.

Hear it in practice

Time

The pharmacy reminds you to take the first capsule at 7am with breakfast, the second at 12:00 p.m. with a full glass of water, and the final dose at 11:45pm before bed for the next ten days.

Gradium
ElevenLabs Flash v2.5

URLs and codes

School families can review the updated calendar at westside-elementary.org/calendar/2026-2027 and sign up for parent-teacher conferences via the link in the email sent earlier this week from the principal.

Gradium
Cartesia Sonic 3.5

How to use it

Our new model is now available by default in the Gradium TTS API. If your setup points to the default, you are already on it, with no action needed. All your voices, including custom voices, carry over and work as-is, so there is no migration step and nothing to re-clone.

Try Gradium TTS in the Gradium Studio, read the API docs to integrate it, or contact us to talk through your use case.

What comes next

We are continuing to improve on all complex cases in every language. For instance, "1,80 m" should be pronounced "un mètre quatre-vingts" in French, not "un virgule quatre-vingt mètre". Every single example is part of our feedback loop.

If you have complex cases you're looking for a provider to solve, we would love to see them. Share them with us and we'll put them to work.

Frequently Asked Questions