How to Handle TTS Edge Cases with Text Normalization in Gradium

4 min read

Text-to-speech models are excellent at natural phrasing and prosody. Where they can struggle is with structured tokens like dates, phone numbers, or email addresses. Without guidance, the model may guess how to pronounce them, and guess wrong.

Gradium's Text Normalization feature solves this. It adjusts specific parts of your input text before they are converted to speech, so they are always spoken in a natural and predictable way.

What Is Text Normalization in Gradium TTS?

Text Normalization processes structured tokens in your text before the TTS model receives them. Instead of asking the model to interpret 12/31/2020, it receives a normalized version that is easier to pronounce correctly.

Text Normalization is especially useful for:

  • Dates
  • Times
  • Numbers
  • Emails
  • URLs
  • Phone numbers
  • Alphanumeric codes

In the API, you configure this using json_config.rewrite_rules in your setup message.

How Do You Enable Text Normalization With a Language Alias?

The simplest way to enable Text Normalization is with a language alias. Language aliases are normalization presets that bundle together the most common rules for a given language.

Add rewrite_rules to the json_config in your setup message:

{
  "type": "setup",
  "voice_id": "<VOICE_ID>",
  "model_name": "default",
  "output_format": "wav",
  "json_config": {
    "rewrite_rules": "en"
  }
}

Here, "en" is the language alias for English. Use a language alias when your input is mostly in one language.

How Do You Enable Specific Normalizers for Tighter Control?

If you need more control, you can enable individual normalizers instead of using a language alias. Pass a comma-separated list of rule names to rewrite_rules:

{
  "json_config": {
    "rewrite_rules": "TimeEn,Date,NumberEn,EmailEn"
  }
}

Two important details about how rules are applied:

  • Rules are applied word by word.
  • Only the first matching rule applies to each word.

This means the order of rules matters. When combining multiple normalizers, place the most specific rules first.

Use specific rules when you need tighter control over normalization, or when your input contains text in multiple languages.

What Does Text Normalization Change in Practice?

Here are examples of what each normalizer does to your input text:

Type Input Spoken as
Date 12/31/2020 12-31 2020
Time 3:45PM! 3.45PM!
Number 2500000 2 million 500 thousand
Email foo.bar@gmail.com foo dot bar at gmail dot com
URL Any URL Spelled out character by character with proper language handling
Phone number Any phone number Formatted according to the country
Alphanumeric code AB12CD34 A-B 1-2 C-D 3-4

When Should You Use a Language Alias vs. Specific Rules?

Use a language alias (e.g., "en") when your input is mostly in one language and you want a sensible default set of normalizations.

Use specific rules (e.g., "TimeEn,Date,NumberEn,EmailEn") when you need tighter control over which normalizations apply, or when your input contains text in multiple languages.

Summary

Text Normalization takes structured, error-prone tokens and turns them into a speakable form. Once normalization is handled, the model can focus on what it does best: natural phrasing and prosody, meaning the rhythm, pitch, and stress patterns of speech.

To use it:

  1. Add json_config.rewrite_rules to your setup message
  2. Pass a language alias like "en" for a preset bundle of rules
  3. Or pass a comma-separated list of specific normalizers for custom control
  4. Remember that rules are applied word by word, and only the first matching rule applies

Frequently Asked Questions

What is Text Normalization in Gradium TTS?
It is a feature that adjusts specific parts of your input text before they are converted to speech, so structured tokens like dates, numbers, and emails are always spoken naturally.
Where do I configure Text Normalization?
In the json_config.rewrite_rules field of your setup message.
What is a language alias?
A language alias is a normalization preset that bundles together the most common rules for a given language. For example, "en" applies a standard set of English normalizations.
What are examples of specific normalizer names?
TimeEn, Date, NumberEn, and EmailEn. You can combine them in a comma-separated list.
When should I use specific rules instead of a language alias?
Use specific rules when you need tighter control over which normalizations are applied, or when your input text contains multiple languages.