How to Use Pronunciation Dictionaries in Gradium TTS: Studio and API Guide
Just like people, LLMs sometimes need help figuring out the right way to pronounce certain words. Proper nouns, brand names, special technical terms, and slang are common sources of mispronunciation in text-to-speech output.
Gradium's Pronunciation Dictionaries let you inject that extra context directly into your TTS setup, making pronunciation reliable in every case.
This guide covers two core use cases: controlling how specific words are spoken, and replacing unwanted words with acceptable alternatives.
What Are Pronunciation Dictionaries in Gradium?
A Pronunciation Dictionary in Gradium is a set of rules that tell the TTS model how to handle specific pieces of text before they are spoken. Each rule maps an original text to a replacement pronunciation or word.
Once created, a dictionary is identified by a pronunciation_id that you pass in your TTS setup.
How Do You Control How Words Are Pronounced?
Some words are commonly mispronounced by TTS models: abbreviations, slang, names, and brand-specific terminology.
How to Set Up Pronunciation Rules in Gradium Studio
- Open Gradium Studio
- Go to the Pronunciation tab
- Create a new pronunciation dictionary (for example, name it
slangs) - Add a rule: set the original text to
xoxoand the replacement pronunciation toex-o-ex-o
Example
Without a pronunciation rule, the abbreviation xoxo may be read in unexpected ways. With the rule in place, the model will always speak it as ex-o-ex-o, exactly as intended.
This approach works for any word or abbreviation that needs a specific spoken form: names, brand terms, or domain-specific vocabulary.
How Do You Use Pronunciation Dictionaries for Content Moderation?
Pronunciation Dictionaries can also serve as a content moderation layer for your voice output. Instead of correcting pronunciation, you can use rules to replace words that do not meet your platform's content guidelines with acceptable alternatives.
Example
If you are building a platform where user-generated content is read aloud, you can add a rule that replaces an offensive word with an appropriate substitute. The TTS model will then speak the replacement word, and the output will always stay within your platform's guidelines.
From the tutorial: a rule replacing a profanity with "Gentleman" ensures the spoken output follows platform standards, regardless of what the input text contains.
How Do You Add a Pronunciation Dictionary to the Gradium TTS API?
Once you have created your dictionary in Gradium Studio and obtained its pronunciation_id, add it to your TTS setup object:
import gradium
import asyncio
async def main():
client = gradium.client.GradiumClient()
result = await client.tts(
setup={
"voice_id": "<YOUR_VOICE_ID>",
"output_format": "wav",
"pronunciation_id": "<YOUR_DICTIONARY_ID>",
},
text="I'm going to get you loser!"
)
with open("output.wav", "wb") as f:
f.write(result.raw_data)
asyncio.run(main())
With this setup, the output follows the rules defined in your dictionary. The text is still passed as-is; the model handles the substitution before generating speech.
Summary
| Use case | What you configure | Result |
|---|---|---|
| Pronunciation control | Map original text to a spoken form (e.g., xoxo to ex-o-ex-o) |
The model speaks the word exactly as intended |
| Content moderation | Map an unwanted word to an acceptable alternative | The output stays within your platform guidelines |
Frequently Asked Questions
- What is a Pronunciation Dictionary in Gradium?
- It is a set of rules that map original text to replacement pronunciations or words. The TTS model applies these rules before generating speech, ensuring consistent and controlled audio output.
- What types of words benefit most from Pronunciation Dictionaries?
- Proper nouns, brand names, special technical terms, abbreviations, and slang are the most common cases. Any word the model might mispronounce or that needs a specific spoken form can be handled with a pronunciation rule.
- Where do I create a Pronunciation Dictionary?
- In Gradium Studio, go to the Pronunciation tab and create a new dictionary. Add rules by specifying the original text and the desired replacement.
- How do I use a Pronunciation Dictionary in the API?
- Obtain the pronunciation_id of your dictionary from Gradium Studio, then pass it in the setup object of your TTS request alongside voice_id and output_format.
- Can I use a Pronunciation Dictionary for content moderation?
- Yes. You can define rules that replace words that do not meet your platform guidelines with acceptable alternatives. The TTS model will speak the replacement word, keeping your audio output within policy.