Gradium API (0.1.0)

Download OpenAPI specification:

This documentation covers the Gradium API.

This API exposes our Text-To-Speech and Speech-To-Text models, which offers low-latency, high-quality & natural sounding output and best in class accuracy.

For issues, questions, or feature requests, please contact us at support@gradium.ai

Documentation

Features

  • Multilingual: We currently support five languages: English (en), French (fr), German (de), Spanish (es) and Portuguese (pt) for our Text-To-Speech and Speech-To-Text with more languages to come.
  • Low-latency: Our servers are based in Europe and in the US, with our expected time-to-first-token is below 300ms when streaming.
  • Voice selection: We provide a voice library, with multiple voices to choose from in different languages. You can also clone voices instantaneously using a 10'' voice sample.

Installation

pip install gradium

Quick Start

import asyncio
import gradium

async def main():
    client = gradium.client.GradiumClient(api_key="your-api-key")

    result = await gradium.speech.tts(
        client,
        setup={"voice_id": "YTpq7expH9539ERJ", "output_format": "wav"},
        text="Welcome to Gradium! Transform your text into natural-sounding speech in seconds."
    )

    with open("welcome.wav", "wb") as f:
        f.write(result.raw_data)

if __name__ == "__main__":
    asyncio.run(main())

Creating a Client

Using API Key Directly

import gradium

client = gradium.client.GradiumClient(api_key="gd_your_api_key_here")

Using Environment Variable

Set the GRADIUM_API_KEY environment variable:

export GRADIUM_API_KEY=gd_your_api_key_here

Then create the client without passing the API key:

client = gradium.client.GradiumClient()

Text-to-Speech (TTS)

Basic Usage

import gradium

client = gradium.client.GradiumClient()
result = await gradium.speech.tts(
    client,
    setup={
        "model_name": "default", 
        "voice_id": "YTpq7expH9539ERJ",
        "output_format": "wav"
    },
    text="Hello, world!"
)

with open("output.wav", "wb") as f:
    f.write(result.raw_data)

print(f"Sample rate: {result.sample_rate}")
print(f"Request ID: {result.request_id}")

Setup Parameters

  • model_name: The TTS model to use (default: "default")
  • voice_id: The voice id of the voice to be used. The voice id can be found in the voice library section of this documentation or in the studio.
  • output_format: Audio format of the input data (supported: "pcm", "wav", "opus", ...)

When using "pcm" output format, the audio will adhere to the following specifications:

  • Sample Rate: 48000 Hz (48kHz)
  • Format: PCM (Pulse Code Modulation)
  • Bit Depth: 16-bit signed integer
  • Channels: Single channel (mono)
  • Chunk Size: 3840 samples per chunk (80ms at 48kHz)

Alternative output formats include "ulaw_8000", "alaw_8000", "pcm_16000", and "pcm_24000".

Streaming TTS

The TTS can be used in a streaming fashion. The first chunks of audio will be available as soon as they are generated. When using the "pcm" output format, the audio chunks will be in raw PCM format sampled at 48kHz using 16-bit signed integer (little-endian) mono.

stream = await gradium.speech.tts_stream(
    client,
    setup={
        "model_name": "default",
        "voice_id": "LFZvm12tW_z0xfGo",
        "output_format": "pcm"
    },
    text="This is a longer text that will be streamed."
)

async for audio_chunk in stream.iter_bytes():
    print(f"Received {len(audio_chunk)} bytes")

Using Custom Voices

result = await gradium.speech.tts(
    client,
    setup={
        "model_name": "default",
        "voice_id": "YTpq7expH9539ERJ",
        "output_format": "wav"
    },
    text="Hello with my custom voice!"
)

Output Formats

# WAV format
result = await gradium.speech.tts(client, setup={"voice_id": "YTpq7expH9539ERJ", "output_format": "wav"}, text="Hello")

# PCM format: the data is sampled at 48kHz, 16-bit signed integer, mono
result = await gradium.speech.tts(client, setup={"voice_id": "YTpq7expH9539ERJ", "output_format": "pcm"}, text="Hello")

# Get numpy array from PCM
pcm_array = result.pcm()
pcm16_array = result.pcm16()

Speed Control

# You can guide the speed of the model using the padding bonus parameter.
# Default value is 0.0
# Negative values mean that the speaker will speak faster (values between -4.0 and -0.1)
# Positive values mean that the speaker will speak slower (values between 0.1 and 4.0)

sample_text = "Hello, this is a test from the Gradium Text to Speech system. We are testing the speed."

slower_audio = await gradium.speech.tts(
    client,
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'padding_bonus':2.0}},
    text=sample_text,
    )

faster_audio = await gradium.speech.tts(
    client,
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'padding_bonus':-2.0}},
    text=sample_text,
    )

Breaks

# The pause is generated with the tag <break time="1.5s" />
# The break time should be between 0.1 and 2.0s 
# The break tag must be preceeded and followed by a space

sample_text = """Hello, this is a test from the Gradium Text to Speech system. <break time="1.5s" /> We are testing the pause."""

test_audio = await gradium.speech.tts(
    client,
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav'},
    text=sample_text,
    )

Text with Timestamps

result = await gradium.speech.tts(
    client,
    setup={"voice_id": "YTpq7expH9539ERJ", "output_format": "wav"},
    text="Hello, world!"
)

for item in result.text_with_timestamps:
    print(f"{item.text}: {item.start_s:.2f}s - {item.stop_s:.2f}s")

Async Generator Input

async def text_generator():
    yield "Hello, "
    yield "this is "
    yield "a streaming "
    yield "example."

stream = await gradium.speech.tts_stream(
    client,
    setup={"voice_id": "YTpq7expH9539ERJ", "output_format": "pcm"},
    text=text_generator()
)

async for chunk in stream.iter_bytes():
    pass

Voices Library

Gradium provides a selection of high-quality voices across multiple languages. Here are our voices.

Flaghsip Voices

Name Voice ID             Language Country Age Group Gender Description
Emma YTpq7expH9539ERJ en us 🇺🇸 Adult Feminine A pleasant and smooth female voice ready to assist your customers and also eager to have nice conversations.
Kent LFZvm12tW_z0xfGo en us 🇺🇸 Adult Masculine A relaxed and authentic American adult voice that connects like a genuine friend.
Sydney jtEKaLYNn6iif5PR en us 🇺🇸 Adult Feminine A joyful and airy American adult voice that makes corporate training feel helpful and light.
John KWJiFWu2O9nMPYcR en us 🇺🇸 Adult Masculine A warm low-pitched American adult voice with the resonant quality of a classic radio broadcaster.
Eva ubuXFxVQwVYnZQhy en gb 🇬🇧 Adult Feminine A joyful and dynamic British adult voice ideal for lively conversations.
Jack m86j6D7UZpGzHsNu en gb 🇬🇧 Adult Masculine A pleasant British voice suited for helpful service, casual conversations, or intense narrations.
Elise b35yykvVppLXyw_l fr fr 🇫🇷 Adult Feminine A warm and smooth French adult voice ideal for friendly conversation and welcoming support.
Leo axlOaUiFyOZhy4nv fr fr 🇫🇷 Adult Masculine A warm and smooth French adult voice ideal for friendly conversation and welcoming support.
Mia -uP9MuGtBqAvEyxI de de 🇩🇪 Adult Feminine A joyful and energetic German voice perfect for professional context as well as enthusiastic discussions.
Maximilian 0y1VZjPabOBU3rWy de de 🇩🇪 Adult Masculine A warm and smooth German adult voice ideal for friendly conversation and professional narration.
Valentina B36pbz5_UoWn4BDl es mx 🇲🇽 Adult Feminine A warm and engaging Mexican female voice perfect for natural storytelling and connecting like a genuine friend.
Sergio xu7iJ_fn2ElcWp2s es es 🇪🇸 Adult Masculine A warm and smooth Spanish adult voice ideal for friendly conversation and professional narration.
Alice pYcGZz9VOo4n2ynh pt br 🇧🇷 Adult Feminine A warm and smooth Brazilian female voice ideal for professional service and pleasant narration or even an enthusiastic conversation!
Davi M-FvVo9c-jGR4PgP pt br 🇧🇷 Adult Masculine An engaging and smooth Brazilian adult voice ideal for helpful service and relaxing conversations.

All Voices

View all voices
Name voice_id Language Country Perceived age Perceived gender Description
Eva ubuXFxVQwVYnZQhy en gb Adult Feminine A joyful and dynamic British adult voice ideal for lively conversations
Jack m86j6D7UZpGzHsNu en gb Adult Masculine A pleasant British voice suited for helpful service casual conversations or intense narrations
Emma YTpq7expH9539ERJ en us Adult Feminine A pleasant and smooth female voice ready to assist your customers and also eager to have nice converstations
Kent LFZvm12tW_z0xfGo en us Adult Masculine A relaxed and authentic American adult voice that connects like a genuine friend.
Mia -uP9MuGtBqAvEyxI de de Adult Feminine A joyful and energetic German voice perfect for professional context as well as enthusiastic discussions.
Maximilian 0y1VZjPabOBU3rWy de de Adult Masculine A warm and smooth German adult voice ideal for friendly conversation and professional narration.
Valentina B36pbz5_UoWn4BDl es mx Adult Feminine A warm and engaging Mexican female voice perfect for natural storytelling and connecting like a genuine friend.
Sergio xu7iJ_fn2ElcWp2s es es Adult Masculine A warm and smooth Spanish adult voice ideal for friendly conversation and professional narration.
Elise b35yykvVppLXyw_l fr fr Adult Feminine A warm and smooth French adult voice ideal for friendly conversation and welcoming support.
Leo axlOaUiFyOZhy4nv fr fr Adult Masculine A warm and smooth French adult voice ideal for friendly conversation and welcoming support.
Alice pYcGZz9VOo4n2ynh pt br Adult Feminine A warm and smooth Brazilian female voice ideal for professional service and pleasant narration or even an enthusiastic conversation!
Davi M-FvVo9c-jGR4PgP pt br Adult Masculine An engaging and smooth Brazilian adult voice ideal for helpful service and relaxing conversations.
Max NoJdNY6JTz-VJLwz en ca Young Adult Masculine A clear calm and measured male voice.
Kelly Lxc7YlPC8ckLJA8H en gb Adult Feminine Clear soft and measured female narration.
Arjun -_aUUFZaJ0CT1gks en in Adult Masculine A warm voice with a clear low-pitch and a smooth texture.
Hunter W5htOuyiFI4Fwhxs en au Adult Masculine A joyful and smooth Australian adult voice that keeps listeners tuned in with radio charm.
Tiffany Eu9iL_CYe8N-Gkx_ en us Young Adult Feminine A warm and smooth American young adult voice that greets customers with a smile you can hear.
Christina 2H4HY2CBNyJHBCrP en us Adult Feminine A joyful low-pitched American adult voice that handles business and service with efficiency.
Maria KNYHZTB8ZqdAZv5Q en us Adult Feminine A joyful high-pitched American adult voice that teaches and tutors with genuine energy.
Mark dh0EzP6jCroK6prq en us Adult Masculine A warm low-pitched American adult voice that resonates with professional radio quality.
Logan XJc-Y9tkSd1UA7s4 en us Young Adult Masculine A joyful and smooth American young adult voice that fits the energetic vibe of a gym coach.
Juan 78zAgQK6xmExb8wS en us Adult Masculine A joyful and smooth American adult voice that welcomes and hosts with vibrant energy.
Kaitlyn 56DcpvEI0Gawpidh en us Adult Feminine A warm and smooth American adult voice that offers the kindness of a helpful neighbor.
Michelle lt88kyLfD8Mqemla en in Young Adult Feminine A warm and smooth Indian English young adult voice for clear and friendly service.
Mary wPx6HPbUQkaUHGhq en us Adult Feminine A joyful high-pitched American adult voice that connects perfectly with younger audiences.
Cameron c8BzreHTk1GG2R4z en us Adult Masculine A steady low-pitched American adult voice ideal for tech reviews and casual explanations.
Jeremy 9QHzSiOYUD-RzEzM en us Adult Masculine A composed American adult voice that sounds intelligent and tech-savvy.
Jesse hOhCtzjR-cRG4T5T en us Young Adult Masculine A joyful high-pitched American young adult voice with a unique airy texture for character roles.
Sean cu0XE3Cxmg_GmSJ3 en us Adult Masculine A joyful American adult voice that brings the spirited energy of a rodeo announcer.
Charles P0GYBrxlhTy5CC87 en gb Adult Masculine A warm and smooth British adult voice that hosts with a classic reliable radio presence.
Olivia kr-Om35JRqmA3Hzq en us Young Adult Feminine A warm low-pitched American young adult voice that guides meditation with soothing calm.
Shelby O0uTTRx5zcetDFX4 en us Young Adult Feminine A joyful high-pitched American young adult voice that brings enthusiasm to web content.
Patrick Z5GIOZR45ieZ8M-W en us Adult Masculine A joyful and smooth American adult voice perfect for clear and engaging public service announcements.
Richard HndphaVV7KTCfKQT en us Adult Masculine A joyful high-pitched American adult voice that captures the excitement of sports commentary.
Jason FOFDH8py3aghc5kb en us Adult Masculine A joyful American adult voice that delivers radio content with a distinct engaging tone.
Kimberly Abqwk2RWxlBEyv0j en gb Adult Feminine A joyful high-pitched British adult voice that welcomes listeners with cheerful efficiency.
Timothy v5lib8tjaosy5sxQ en us Adult Masculine A warm low-pitched American adult voice with a nostalgic friendly resonance.
Nathan 4NU5PqxX2BdMEtWe en us Adult Masculine A warm and smooth American adult voice that sounds just like your friendly neighbor.
Adam EbIA5CIcQoa6NNd2 en us Adult Masculine A joyful and smooth American adult voice that greets the morning with radio-ready energy.
Abigail KRo-uwfno-KcEgBM en us Adult Feminine A warm and airy American adult voice that adds a touch of magic and empathy to any story.
Melissa 8Tm8RKFEbnkRtkdA en us Adult Feminine A joyful and smooth American adult voice that facilitates with upbeat enthusiasm.
Allison yU6yxQ3e8LKRwU84 en us Adult Feminine A joyful high-pitched American adult voice that brings high energy to training and teaching.
Kelsey MQC0U1yWvZXrppaF en us Adult Feminine A balanced American adult voice that fits realistic everyday service interactions.
Haley aq7ltaIQ6ZJUY0jR en gb Adult Feminine A confident and warm British adult voice versatile enough for e-learning support and storytelling.
Anna PS7enm5lVZiIvEKV en us Adult Feminine A warm and smooth American adult voice that provides comfort and supportive guidance.
Katherine bvNlBZ3DWDoVy_Yc en us Young Adult Feminine A warm and smooth American young adult voice that balances business professionalism with kindness.
Steven zyLIanWKViHkc6Wp en gb Adult Masculine A steady and smooth British adult voice that offers helpful and consistent management advice.
Brian ptMwY_gvmFxXMmDf en us Adult Masculine A steady American adult voice with a low-pitched tone suitable for distinct character roles.
Jose LqFNS0u6EII7VHBx en us Adult Masculine A warm low-pitched American adult voice that offers the reassuring guidance of a mentor.
Madison cuXxqSrGVntdhFpZ en gb Young Adult Feminine A warm low-pitched British young adult voice that feels like a friendly neighbor.
Dylan d9Fl9x8luXXX7u6E en us Adult Masculine A warm and smooth American adult voice that keeps the flow going as a DJ or host.
Rebecca GJSxJhSTPAGIPDwy en us Adult Feminine A warm and airy American adult voice that manages and assists with a gentle touch.
Samuel pxKsJ_4kEMid5XpZ en au Young Adult Masculine A warm and smooth Australian young adult voice that sounds like a friendly bartender or colleague.
Eric knw-ddWDPNORRA4Z en us Adult Masculine A joyful and smooth American adult voice that makes sales and service feel cheerful and easy.
Alyssa 22YWyuFACaMHsPh5 en us Young Adult Feminine A warm and smooth American young adult voice that adds a relatable human touch to readings.
Alexandra 4nAcNUlNhEA_Kyjo en us Adult Feminine A joyful and smooth American adult voice ideal for reading and hosting duties.
Jasmine QPHuXnvRPQ57oXYy en us Adult Feminine A joyful high-pitched American adult voice that commands the room with managerial confidence.
Benjamin IBVzgY91NZ1IJ0oP en gb Adult Masculine A joyful and smooth British adult voice that leads events and shows with master-of-ceremony flair.
Aaron Ve1zknlflaRwcAQw en gb Adult Masculine A composed and smooth British adult voice perfect for technical and IT-related explanations.
Jordan ws0Wb0PZXl21_Bbz en us Young Adult Masculine A joyful American young adult voice that motivates with the energy of a fitness instructor.
Christian x69x43aS-5mVLCX2 en gb Adult Masculine A warm and smooth British adult voice that sounds like a kind and knowledgeable scholar.
Thomas m7fJRmVaJjG2TL1c en gb Adult Masculine A warm and smooth British adult voice that brings an actor's versatility to conversation.
Morgan MGiwMOFxVe4a2aSU en gb Adult Feminine A warm and airy British adult voice that guides listeners into a state of meditation.
Cody SqHUVuEiTPSlIB5r en us Adult Masculine A warm and resonant American adult voice that delivers radio quality with a professional touch.
Alex 91EdXxJDbWICDBgz en us Adult Neutral A joyful high-pitched American adult voice that grabs attention in advertisements.
Brianna fggSYM_FGJ30QTTl en us Young Adult Feminine A warm and smooth American young adult voice ideal for music radio and educational content.
Kevin J2qsArcdozbto5Hn en au Adult Masculine A joyful Australian adult voice that engages audiences as a TV host or tutor.
Victoria 8dBmiTurwb7KcxLY en us Adult Feminine A warm and smooth American adult voice that conveys the reliability of a helpful colleague.
Nicole T7UL6gmeDqqYiVe1 en us Adult Feminine A joyful American adult voice with a sarcastic edge perfect for entertaining podcasts.
Jennifer auZu0iT-fniQ4cJd en us Adult Feminine A warm and smooth American adult voice that is always ready to help like a good friend.
Courtney UX3Hi2ZmK7tT0c3G en gb Adult Feminine A joyful high-pitched British adult voice perfect for sales and professional announcements.
Stephanie ikbJkd83GvuyoSLb en us Adult Feminine A joyful and smooth American adult voice that sounds like a modern relatable mom.
Kyle CjQcj4yeIs6h0uAb en us Adult Masculine A joyful American adult voice that wakes up the audience with morning radio energy.
Lauren SG3KnxbSOkkrY097 en us Adult Feminine An assertive and smooth American adult voice that fits the modern urban businesswoman persona.
Alexis 74asmf7CXzjfopIX en us Adult Feminine A joyful American adult voice that delivers customer service scripts with a bright distinct tone.
Megan exG4bLr-lZ_bI0jF en us Adult Feminine A joyful high-pitched American adult voice that mixes customer service clarity with influencer energy.
Jonathan 4u2uvwrHdTA2gRnZ en us Adult Masculine A joyful high-pitched American adult voice with the charm of an old-timey character actor.
Robert gTAO-3xLZ8_WSfbm en us Adult Masculine A warm and resonant American adult voice that brings a professional acting polish to any script.
Alexander 8sWSyTC7byLsbHkr en us Adult Masculine A warm low-pitched American adult voice that motivates with the resonance of a fitness coach.
Rachel dEcrv3B8XGHoox2_ en gb Adult Feminine A warm low-pitched British adult voice that balances professional business tones with a calming presence.
Kayla 9VXl5t2IMagUQAzg en gb Adult Feminine A joyful British adult voice with a precise tone ideal for automated yet friendly service.
Elizabeth u8rA2xOF_0LRnNSb en us Adult Feminine A consistent and smooth American adult voice that provides clear and reliable customer service.
Amanda ZZb4X9ueHSdRlv9q en gb Young Adult Feminine A joyful and hip British young adult voice that brings energy to podcasts and modern content.
Brittany 3bIdO9CHnAh_pRAf en in Adult Feminine A joyful and smooth Indian English adult voice that is perfect for friendly HR and service roles.
William VeVmpxxbyJiWrGNG en au Adult Masculine A joyful high-pitched Australian adult voice that sounds like an energetic high school coach.
Hannah lP7D1y02OQFtffU3 en us Young Adult Feminine A warm and airy American young adult voice that creates a calm atmosphere for yoga and meditation.
Anthony 2V3TjbyQGPlkY6ON en au Adult Masculine A joyful and smooth Australian adult voice that brings a cartoonish MC-style energy.
Justin 6Mp6PGnaCdb-US21 en us Adult Masculine A distinct American adult voice with a characterful tone ideal for niche roles.
James MZWrEHL2Fe_uc2Rv en us Adult Masculine A warm and resonant American adult voice that excels at storytelling and persuasive advertising.
David OceLYI_PPbqsdgdV en gb Young Adult Masculine A warm and smooth British young adult voice that captures the relaxed tone of a college student.
Ryan AqRuVz8-e8u3BR00 en us Adult Masculine A warm low-pitched American adult voice with a resonant rural charm for sales and storytelling.
Taylor EfuzJVuTmw_mA7PC en us Adult Feminine A warm and efficient American adult voice that fits perfectly for automated customer support.
Sarah aW5dxfdkzIFCIdXc en us Young Adult Feminine A clear American young adult voice that is precise and perfect for student-focused reading.
Joseph MhsYZQ4bIfcDpokF en gb Adult Masculine A warm and relatable British adult voice with a genuine blue-collar friendliness.
Samantha mn5sS7D8kYKETZXA en us Adult Feminine A warm and professional American adult voice that is both helpful and authoritatively managerial.
Austin -0MuXG9RcCsuSVtb en us Mature Masculine A warm rough-textured American mature voice that embodies the kindness of a gentle grandfather.
Daniel apU2CMobTyu92tZj en au Adult Masculine A joyful and smooth Australian adult voice that brings a cheerful down-to-earth vibe to any chat.
Emily i1kmq28cO60ia35K en us Young Adult Feminine A warm and smooth American young adult voice perfect for modern podcasting and influencing.
Brandon 2j8TWGsIiUl4G3kj en us Young Adult Masculine A high-pitched joyful American young adult voice that sounds like your friendliest colleague.
Tyler Ow5IKhni2ED3Xxhl en gb Adult Masculine A warm and smooth British adult voice that blends tech-savviness with a friendly radio persona.
Nicholas n2Gv34jje2ZiiNzK en us Adult Masculine A joyful American adult voice with a relatable slightly clumsy charm perfect for sitcom-style scripts.
Ashley QZMzHBlnJRjll_71 en us Adult Feminine A warm low-pitched American adult voice that feels like a cool supportive friend or aunt.
Joshua bDlMqRew31ZJwrD- en us Adult Masculine A joyful and resonant American adult voice that brings the classic energy of a radio host.
Jessica wYY8mXKrKtwKsaXZ en us Adult Feminine A consistent and smooth American adult voice that handles customer service with patience and clarity.
Jacob ixaCTlZ5Xqf2XzQH en us Mature Masculine A steady American mature voice with a unique old-timey texture for distinct conversational roles.
Christopher fs2Qj_X2Z2WvWJSU en gb Adult Masculine A smooth British adult voice that conveys the trustworthy tone of a reliable expert.
Matthew X-wgJsZwQKhfebgK en us Adult Masculine A high-pitched joyful American adult voice that pops with energy perfect for reading ads.
Michael Mj0Pzs94jCw8oVOC en us Adult Masculine A low-pitched casual American adult voice with a sporty vibe for conversational content.
Olivier vMYQUSzm6GRkJX6d fr fr Adult Masculine Friendly male voice tone is warm and welcoming.
Manon p1fSBpcmVWngBqVd fr fr Young Adult Feminine A gentle and warm voice with a calm and measured pace.
Jade 3mM3xaoFjNMQa22C fr fr Young Adult Feminine A young female speaker with a clear high-pitched and smooth voice.
Amélie J4XbCGPYNMigXcfZ fr fr Young Adult Feminine A friendly voice with a clear tone and pleasant pitch.
Adrien 0LMAi0x_YVG_GLeM fr fr Young Adult Masculine Clear smooth and moderately paced voice with a warm tone.
Sarah -dOnYAX4N4GqSOee fr fr Young Adult Feminine A warm and smooth French young adult voice perfect for friendly interactions and welcoming service.
Jennifer N8xxxD_d-ZinGVI4 fr fr Young Adult Feminine A warm and smooth French young adult voice ideal for friendly support and welcoming conversation.
Élodie zba0owtqy4Gnewn9 fr fr Adult Feminine A confident French adult voice that excels in corporate training compliance and narration.
Justine TJv-kucMsUo24VQe fr fr Young Adult Feminine A confident and upbeat French young adult voice perfect for youth brands and energetic explanations.
Océane YE0-JPiElafJrZaC fr fr Young Adult Feminine A polished French young adult voice designed for professional broadcasting and reporting.
Léa QY_BJKHMElKDO12- fr fr Adult Feminine A formal French adult voice that delivers financial reports and news with absolute precision.
Sarah QkmUhBH4hIV2_BkY fr fr Adult Feminine A confident and compassionate French adult voice ideal for biographies support and non-fiction.
Mathieu D-IpHY1UI0iX9xQD fr fr Adult Masculine An assertive and energetic French adult voice perfect for high-stakes promos and executive presentations.
Clément twLGV8mrH_ycNpUn fr fr Adult Masculine A confident and sincere French adult voice that lends credibility to expert topics and emotional appeals.
Julie k1wgs3k8-wRxTJO6 fr fr Adult Feminine A joyful and enthusiastic French adult voice that makes news and education feel fresh and engaging.
Dylan Hdf5cdfaGrLDTD63 fr fr Adult Masculine A sincere and emotional French adult voice that offers genuine support and relatable warmth.
Marion 1VAVLmmbQFDw7TMn fr fr Adult Feminine A warm and trustworthy French adult voice that shines in storytelling education and fantasy roles.
Pauline 2AtP1urAQkZaeI2U fr fr Adult Feminine A professional and articulate French adult voice suited for serious journalism and formal announcements.
Vincent B09t5S64xLaKwXeW fr fr Adult Masculine A warm and wise French adult voice perfect for historical narration and supportive guidance.
Pierre AroCL6f1qizjiZ_a fr fr Young Adult Masculine An energetic French young adult voice that brings a lively journalistic flair to news and updates.
Guillaume qTA0lxFpynJdoxx7 fr fr Young Adult Masculine A joyful and adventurous French young adult voice ideal for dynamic storytelling and sports reporting.
Romain zpmn3GOfiU_i5QGo fr fr Adult Masculine A warm and steady French adult voice that delivers quick instructions and interviews with clarity.
Kévin IB53xJtufx1sbfbt fr fr Adult Masculine A sincere and emotional French adult voice that brings depth and wisdom to narratives and heartfelt ads.
Florian kw_VWSocR7vyA9Ty fr fr Adult Masculine A joyful and relatable French adult voice that sounds like a friendly journalist or the guy next door.
Antoine hx1RAC4Lqd9xyTAr fr fr Adult Masculine A gritty and confident French adult voice perfect for intense narration and expert instruction.
Quentin pdcyd1mLmo0fcg3O fr fr Adult Masculine A confident and sincere French adult voice that connects effortlessly in tech explainers and documentaries.
Mélanie xynYWquoAsrvM7UY fr ca Adult Feminine A warm and clear Canadian French adult voice designed for friendly assistance and educational guidance.
Adam aNiSRZ0BhQxO1FPx fr fr Adult Masculine A warm and formal French adult voice that brings a calm professional touch to corporate communications.
Anaïs ImBVnxSeLsdCfNIV fr fr Young Adult Feminine A distinctive French young adult voice with a sharp tone perfect for lifestyle and character roles.
Marine GmGF_3ETsY2Zq7_w fr fr Adult Feminine A warm and nurturing French adult voice ideal for storytelling education and empathetic support.
Maxime s0PhgjzOTRD5wo5L fr ca Adult Masculine A joyful and instructional Canadian French voice that makes learning and support feel effortless.
Alexandre HBfu9XA3QfzAG1MN fr ca Adult Masculine A high-energy and assertive Canadian French voice perfect for fast-paced promos and clear instructions.
Camille w9V1722uEmTkWqnR fr fr Adult Feminine A joyful and professional French adult voice that delivers corporate and journalistic scripts with energy.
Marie BbLb4TxdlrldgpHI fr fr Adult Feminine A warm and professional French adult voice ideal for calm instruction and empathetic communication.
Thomas 8nsAoui8Y5RK9PYw fr fr Adult Masculine A confident and sincere French adult voice that drives action in commercials and educational explainers.
Chloé rIYDMY3dLccdauWA fr fr Adult Feminine A bright and versatile French adult voice perfect for friendly assistance education and lifestyle content.
Nicolas mxcKXLymdLQCdlEq fr fr Adult Masculine An assertive and warm French adult voice that brings strength and character to narration and promos.
Laura Jlh1B0PKQJyup0sQ fr fr Adult Feminine A helpful and clear French adult voice that excels in both educational content and empathetic service.
Amandine NvHEAMGiPT4u8iT- fr fr Adult Feminine A versatile and joyful French adult voice capable of shifting from warm education to playful character work.
Valentin WWHSNJCSTm77dyGd fr fr Adult Masculine A warm and lively French adult voice that brings a spark of genuine enthusiasm to any script.
Manu L6OaiBybqikfCBk0 fr fr Young Adult Masculine A pleasant voice with a low pitch and smooth texture.
Sofia s4CzgVHP5cEkB9LD es es Adult Feminine Soft low-pitched and smooth with a slow and measured pace.
Pablo aCWBiYUiQ4VwW8_b es es Adult Masculine A warm low-pitched Spanish adult voice that brings a calm smooth authority to any script.
Carlos yPxeHKlCzaHeKd_V es es Adult Masculine A warm and versatile Spanish adult voice that adapts seamlessly from ads to professional settings.
Adrián r5WB0b126tlHSrku es mx Young Adult Masculine A warm and smooth Mexican young adult voice that naturally bridges journalism and conversation.
Alberto h39kz1iyoymcjcqh es es Young Adult Masculine A warm Spanish young adult voice with a hosting flair perfect for media and customer engagement.
Elena PqjKPYFyGNsg1YU- es es Young Adult Feminine A warm and engaging Spanish young adult voice that makes journalism and education feel accessible.
Javier wGhY_zZCoQ5gB0ce es ar Adult Masculine A warm and smooth Argentine adult voice that delivers professional and social content with charm.
Sergio -8ZoUJpVU98rxpv9 es mx Young Adult Masculine An energetic Mexican young adult voice that brings a bright modern feel to customer service and ads.
David zdE2H9vw2vcMl_Pt es mx Adult Masculine A joyful and smooth Mexican adult voice that fits perfectly in both casual chats and formal spots.
Ana ynR4CAbXMiOv-vGC es es Young Adult Feminine A warm and versatile Spanish young adult voice ideal for everything from ads to professional service.
Sara lPCVUcicz2XRaLE3 es es Adult Feminine A warm and knowledgeable Spanish adult voice that balances journalistic clarity with conversational ease.
Marta VAb2M8nKHlUUZBk4 es mx Young Adult Feminine A warm and relatable Mexican young adult voice perfect for connecting with Gen Z audiences.
Daniel R3L8t75ZEoZCPUA9 es es Adult Masculine A confident low-pitched Spanish adult voice that commands respect in professional and service contexts.
Alejandro eorxD0DWv--n7l3p es es Young Adult Masculine A joyful and smooth Spanish young adult voice that adds a fresh energy to advertisements.
Cristina Bwl2KLUPxf82_ZaJ es mx Adult Feminine A joyful and resonant Mexican adult voice ideal for vibrant social media and character work.
Carmen zhH3lPUo-JxmlOJT es co Young Adult Feminine An energetic Colombian young adult voice that captures the lively spirit of a millennial streamer.
María k2B3TJiffePxjeBn es co Young Adult Feminine A warm and smooth Colombian young adult voice that brings a friendly touch to education and ads.
Miguel Gijj_GPBfJVcP-FZ es es Adult Masculine A steady Spanish adult voice with a robotic edge perfect for automated customer service.
Laura xB86uC_i8sO2U41- pt br Adult Feminine A smooth and pleasant voice perfect for a nice chat.
Frederico L7890s1B44FqSiGC pt br Adult Masculine A clear low-pitched voice spoken with a smooth texture
Eduardo hAdJ9w9xBQkFgrRl pt br Adult Masculine A clear low-pitched voice with a smooth texture.
Rodrigo EzmLkNorEpZG_oNv pt pt Young Adult Masculine A low-pitched Portuguese young adult voice that delivers information with calm confidence.
Bruna Du_Dcv4fgXBDdubR pt pt Adult Feminine A high-pitched energetic Portuguese adult voice perfect for engaging corporate training and narration.
Daniel _cP-0vSYfMmzR4al pt br Adult Masculine A joyful and dynamic Brazilian adult voice that brings excitement to radio hosting and promos.
Leonardo YUKEEk7Y4Igsj1Ts pt pt Adult Masculine An energetic and varied Portuguese adult voice ideal for lively radio spots and character work.
Thiago QZtWUy8jmIroWiOu pt br Adult Masculine A warm and versatile Brazilian adult voice that balances professional hosting with genuine kindness.
Pedro Yee42wDKxEFHi0BS pt br Young Adult Masculine A smooth low-pitched Brazilian young adult voice with a cool steady tone for scripts.
Matheus wT1bHy1Vq_0Bn73I pt pt Adult Masculine A resonant and warm Portuguese adult voice that brings authority and kindness to educational content.
Jéssica Fmt16x6anKfMMeSx pt br Adult Feminine A smooth Brazilian adult voice designed for clear and professional customer service.
Fernando 8QUaJGjSFdgHkuI8 pt br Young Adult Masculine A warm and friendly Brazilian young adult voice that sounds like the approachable guy next door.
Juliana B6aHVROMF8FuKR07 pt pt Young Adult Feminine A high-pitched energetic Portuguese young adult voice perfect for animated characters and lively dialogue.
Ana 24cfpJbYGXZLE39T pt br Adult Feminine A joyful and neighborly Brazilian adult voice that feels instantly familiar and welcoming.
Gustavo T4yRIRCLji61Fz-N pt br Adult Masculine A high-pitched friendly Brazilian adult voice ideal for approachable and caring roles.
Bruno isyT17KHEj84P9w9 pt br Adult Masculine A warm and helpful Brazilian adult voice that conveys genuine reliability and kindness.
Maria 73lMH7Zcc411nxJz pt pt Adult Feminine A cheerful and helpful Portuguese adult voice that brightens any conversational script.
Letícia h6qFHXR3-bqPg_PE pt br Adult Feminine A warm and empathetic Brazilian adult voice perfect for podcasting and supportive messaging.
Rafael KpDAXeGeen7P9Uri pt pt Adult Masculine A warm and friendly Portuguese adult voice ideal for relatable radio hosting and conversation.
Gabriel 4ubKCfFxLeBg-cbl pt br Adult Masculine An energetic and joyful Brazilian adult voice that commands attention with charismatic flair.
Lucas AaTW_13X1yYe_OnX pt br Adult Masculine A warm low-pitched Brazilian adult voice that adds a kind educational tone to any project.
João YHOBjtajNBEHUI_K pt br Adult Masculine A smooth and clear Brazilian adult voice perfect for conversational delivery.
Moritz IIZIkBSZAmb9nFZb de de Adult Masculine Clear low-pitched male voice with a smooth texture and a slow measured pace.
Lisa kAoOc9Yb5EQDzA-N de de Adult Feminine A soft and clear voice with a varied pitch.
Hans vbg20SqFS_gBntTQ de at Adult Masculine A calm low-pitched male delivery with a pleasant tone.
Franziska VXA4-0_ZN4o8q3vK de de Adult Feminine A warm and smooth German adult voice that offers deep support with kindness.
David zyla-_bhVQtNTBdT de de Adult Masculine A smooth German adult voice that educates with a calm low tone.
Lea lSVEPWl_N_7MtcHe de de Adult Feminine A warm and smooth German adult voice that teaches with a friendly approachable style.
Stefanie hXjVvZ6oDDGQAQFj de de Young Adult Feminine A confident and airy German young adult voice that reads with sincerity and clarity.
Tom xq0vDziADfAmg6Uh de de Adult Masculine An airy German adult voice that speaks publicly with a formal high pitch.
Niklas -qKylkN2UPxd7Mmg de de Adult Masculine A joyful and smooth German adult voice that handles customer care with formal positivity.
Michelle fJDF4lEH590XplFv de de Adult Feminine A joyful and smooth German adult voice that coaches with high energy and encouragement.
Jasmin h2o5CDDhV5wE3Bwi de de Adult Feminine A balanced and airy German adult voice that makes book reading feel light and accessible.
Dominik ZOiGbnYdgKSBM_rH de de Adult Masculine A balanced and smooth German adult voice designed for formal customer care.
Sabrina dK5Glio51HTxdMu0 de de Adult Feminine A balanced and smooth German adult voice perfect for professional book reading.
Dennis YHkMHL6WppbXd42a de de Adult Masculine An airy German adult voice that delivers technical information with formal grace.
Julian LAmPTQZkwYJKRCKt de de Adult Masculine A balanced and resonant German adult voice that coaches with a calm steady presence.
Jannik RPw-aWdY8NBiIWeg de de Adult Masculine A joyful and resonant German adult voice that motivates and coaches with authority.
Melanie bauuigqCZbJFfk5q de de Adult Feminine A warm and smooth German adult voice that brings a professional deep perspective.
Christian WxHB2b5HxA0Kuq5u de de Adult Masculine A joyful and smooth German adult voice that reports with energy and professionalism.
Nadine 9O8ZawShJ7UwURjK de de Adult Feminine A warm and smooth German adult voice that educates with journalistic precision.
Nicole t1Y_yKjku5R46F9t de de Adult Feminine A warm and airy German adult voice that delivers journalistic content with a kind touch.
Sebastian KEMqb7dQlTCAEUx6 de de Mature Masculine A steady resonant German mature voice that brings the comforting wisdom of a grandfather.
Lena df4Al5gt14Am4Qaf de de Adult Feminine A grounded and smooth German adult voice that reports the news with a steady tone.
Fabian 42-EbMFThYfhVB83 de de Adult Masculine A warm German adult voice that teaches with a high engaging energy.
Patrick 3-pqEMoGtIq7wXtH de de Adult Masculine A steady and smooth German adult voice that explains educational topics with journalistic clarity.
Christina 9LhjfdN9LOrygqDi de de Adult Feminine A warm and smooth German adult voice that offers insightful guidance with a friendly tone.
Jessica --9DFXOPx8kJFsbe de de Adult Feminine An airy German adult voice that delivers formal journalism with a light touch.
Jennifer XFttJvHwReWtWQNQ de de Adult Feminine A steady and smooth German adult voice that reports professionally and formally.
Vanessa 8eZwfGLoSF2N0RB3 de de Adult Feminine A warm and smooth German adult voice that engages listeners as a lively podcast host.
Maria sz-H9BxaRaqxQ2S0 de de Adult Feminine A relaxed and airy German adult voice that hosts podcasts with a cool vibe.
Jonas 6tFmjkrmrdhO2bXV de de Adult Masculine A warm German adult voice that contemplates and converses with philosophical insight.
Anna D8iRHK1qJhqfE00v de de Adult Feminine A balanced airy German adult voice that brings deep empathy to conversation.
Marcel Cw79FL0p0J6UM9El de de Adult Masculine A balanced German adult voice that handles customer care with a clear high-pitched tone.
Kevin AySdCEnP2nqRo1WM de de Adult Masculine A steady and smooth German adult voice that maintains a formal journalistic standard.
Tobias uycTGmIXbw_Y83p9 de de Adult Masculine A low-pitched German adult voice that delivers technical details with care and precision.
Daniel H0GE4TqfCQGmpQhL de de Adult Masculine A warm and resonant German adult voice that sounds like a friendly student peer.
Tim -WFy9WtlQNE-dEV2 de de Adult Masculine A steady and resonant German adult voice ideal for professional customer care interactions.
Philipp ZsVFAOnjnEPxJVDI de de Adult Masculine A balanced German adult voice that reads books with a smooth immersive flow.
Maximilian H3Rh9kJcd4gZidvN de de Adult Masculine A warm German adult voice that educates with a calm low-pitched authority.
Sarah ApPgTz3nMHOsWxhK de de Adult Feminine A warm low-pitched German adult voice that offers the soothing understanding of a close confidant.
Florian XnSnbQW98he4aULg de de Adult Masculine A warm German adult voice that delivers news with a high resonant clarity.
Katharina AEJ61XaIaRill4cJ de de Adult Feminine A steady low-pitched German adult voice designed for steady and engaging book reading.
Mona T2NDxsof9FHYxgJj de de Adult Feminine A warm and smooth German adult voice that brings a tutor's patience to any script.
Laura wBgI9XmASQwvQ13w de de Adult Feminine A warm German adult voice that teaches and guides with a kind high-pitched tone.
Felix uF8PfAXrv6qU9UEM de de Adult Masculine A smooth German adult voice perfect for straightforward journalistic reporting.
Julia FRTqjB2TL-Ix9GXW de de Adult Feminine A warm and conversational German adult voice that sounds like a relatable student.
Alexander xki1DK6Ks6tuDmcb de de Adult Masculine A warm German adult voice that reports with journalistic integrity and a resonant tone.
Lukas 5UkFVe2B8OqLo-5R de de Adult Masculine A low-pitched German adult voice that conveys the authority of a seasoned expert.
Jan 1D38wv1wp-H7QcyM de de Adult Masculine A balanced German adult voice with a high pitch ideal for clear customer care.

Custom Voices

Create and manage your own custom voice clones. Custom voices are passed to TTS using the voice_id parameter (not voice).

List All Custom Voices

import json
import gradium

all_custom_voices = await gradium.voices.get(client)
print(json.dumps(all_custom_voices, indent=2))

Get Specific Voice

import json

voice = await gradium.voices.get(client, voice_uid="abc123def456")
print(json.dumps(voice, indent=2))

Create Custom Voice

import json

voice = await gradium.voices.create(
    client,
    audio_file="my_voice_sample.wav",
    name="My Custom Voice",
    description="A voice created from my recording",
    start_s=0.0,
)
print(json.dumps(voice, indent=2))

Update Voice

await gradium.voices.update(
    client,
    voice_uid="abc123def456",
    name="Updated Voice Name",
    description="Updated description",
    start_s=1.5
)

Delete Voice

await gradium.voices.delete(client, voice_uid="abc123def456")

Credit Management

Credits are consumed based on the audio generated: 1 credit equals 1 character of TTS. One minute is approximately 750 characters, so 1h of TTS generation is approximately 45 000 characters.

Get Credit Information

import json

credits_info = await gradium.usages.get(client)
print(json.dumps(credits_info, indent=2))

Speech-to-Text (STT)

The Speech-to-Text model converts audio input into text transcriptions, supporting real-time streaming and a semantic VAD.

Basic Streaming Usage

import asyncio
import gradium

async def main():
    client = gradium.client.GradiumClient(api_key="your-api-key")

    # Audio generator that yields audio chunks
    async def audio_generator(audio_data, chunk_size=1920):
        for i in range(0, len(audio_data), chunk_size):
            yield audio_data[i : i + chunk_size]

    # Create STT stream
    stream = await client.stt_stream(
        {"model_name": "default", "input_format": "pcm"},
        audio_generator(audio_data),
    )

    # Process transcription results
    async for message in stream.iter_text():
        print(message)

if __name__ == "__main__":
    asyncio.run(main())

Setup Parameters

  • model_name: The STT model to use (default: "default")
  • input_format: Audio format of the input data (supported: "pcm", "wav", "opus")

When using "pcm" input format, the audio must adhere to the following specifications:

  • Sample Rate: 24000 Hz (24kHz)
  • Format: PCM (Pulse Code Modulation)
  • Bit Depth: 16-bit signed integer
  • Channels: Single channel (mono)
  • Chunk Size: Recommended 1920 samples per chunk (80ms at 24kHz)

Message Types

The STT stream returns different types of messages:

  • Text Messages (text): Contain transcription results together with timestamps.
  • VAD Messages (step): Provide Voice Activity Detection information to determine when the speaker has finished speaking.
# Text messages containing transcription results
async for msg in stream._stream:
    if msg.get("type") == "text":
        print(f"Transcription: {msg}")

    # VAD (Voice Activity Detection) messages
    elif msg.get("type") == "step":
        vad_info = msg.get("vad", {})
        # Use msg["vad"][2]["inactivity_prob"] to detect turn completion
        # VAD steps occur every 80ms
        inactivity_probability = msg["vad"][2].get("inactivity_prob")
        print(f"Inactivity probability: {inactivity_probability}")

TTS

Text-to-Speech endpoints for converting text to audio

TTS WebSocket Stream

WebSocket Endpoint for Text-to-Speech Streaming

Connect to this endpoint via WebSocket for real-time text-to-speech conversion with low latency audio streaming.

Connection URL:

For Europe

wss://eu.api.gradium.ai/api/speech/tts

For the USA

wss://us.api.gradium.ai/api/speech/tts

Authentication: Include your API key in the WebSocket connection header:

  • Header: x-api-key: your_api_key

Quick Reference

Direction Message Type Example
🔵⬆️ Client→Server Setup (first) {"type": "setup", "voice_id": "YTpq7expH9539ERJ", "model_name": "default", "output_format": "wav"}
🟢⬇️ Server→Client Ready {"type": "ready", "request_id": "uuid"}
🔵⬆️ Client→Server Text {"type": "text", "text": "Hello, world!"}
🟢⬇️ Server→Client Audio (stream) {"type": "audio", "audio": "base64..."}
🔵⬆️ Client→Server EndOfStream {"type": "end_of_stream"}
🟢⬇️ Server→Client AEndOfStream {"type": "end_of_stream"}
🔴⬇️ Server→Client Error {"type": "error", "message": "Error description", "code": 1008}

Message Types

1. Setup Message (First Message)

Direction: Client → Server Format: JSON Object

{
  "type": "setup",
  "model_name": "default",
  "voice_id": "YTpq7expH9539ERJ",
  "output_format": "wav"
}

Fields:

  • type (string, required): Must be "setup"
  • model_name (string, required): The TTS model to use (default: "default")
  • voice_id (string, required): Voice ID from the library (e.g., "YTpq7expH9539ERJ" for Emma's voice) or custom voice ID
  • output_format (string, required): Audio format - either "wav", "pcm", or "opus".

Important: This must be the very first message sent after connection. The server will close the connection if any other message is sent first.


2. Ready Message

Direction: Server → Client Format: JSON Object

{
  "type": "ready",
  "request_id": "550e8400-e29b-41d4-a716-446655440000"
}

Fields:

  • type (string): Will be "ready"
  • request_id (string): Unique identifier for the session

This message is sent by the server after receiving the setup message, indicating that the connection is ready to receive text messages.


3. Text Message (Subsequent Messages)

Direction: Client → Server Format: JSON Object

{
  "type": "text",
  "text": "Hello, world!"
}

Fields:

  • type (string, required): Must be "text"
  • text (string, required): The text to be converted to speech

Send text messages to be converted to speech. You can send multiple text messages in sequence. The server will stream audio back as it's generated.


4. Audio Response

Direction: Server → Client Format: JSON Object

{
  "type": "audio",
  "audio": "base64_encoded_audio_data..."
}

Fields:

  • type (string): Will be "audio"
  • audio (string): Base64-encoded audio data in the requested format

Important: Multiple audio messages will be streamed for each text message. Continue receiving until you detect the end of speech or receive a new message type.


5. End Of Stream

Direction: Client → Server and Server → Client Format: JSON Object

{
  "type": "end_of_stream",
}

This message is sent by the client when it has submitted all the text that it wants to be considered. The server will then send back all the remaining audio until all the text has been processed, then an EndOfStream message, and then closes the websocket connection.


Error Handling

When errors occur, the server sends an error message as JSON before closing the connection:

Error Message Format:

{
  "type": "error",
  "message": "Error description explaining what went wrong",
  "code": 1008
}

Common Error Codes:

  • 1008: Policy Violation (e.g., invalid API key, missing setup message)
  • 1011: Internal Server Error (unexpected server-side error)

Best Practices

  1. Always send setup first: The server expects a setup message immediately after connection
  2. Handle audio streaming: Audio responses are streamed in chunks - buffer and process appropriately
  3. Implement reconnection logic: Network issues happen - build in automatic reconnection with exponential backoff
  4. Monitor connection health: Implement ping/pong or periodic checks to detect stale connections
  5. Graceful error handling: Parse error messages and handle different error codes appropriately
  6. Reuse connections: For multiple utterances, keep the connection alive and send multiple text messages
  7. Close cleanly: Always close WebSocket connections properly when done

header Parameters
x-api-key
required
string

Your Gradium API key

Responses

STT

Speech-to-Text endpoints for converting audio to text

STT WebSocket Stream

WebSocket Endpoint for Speech-to-Text Streaming

Connect to this endpoint via WebSocket for real-time speech-to-text conversion with streaming audio input.

Connection URL:

For Europe

wss://eu.api.gradium.ai/api/speech/asr

For the USA

wss://us.api.gradium.ai/api/speech/asr

Authentication: Include your API key in the WebSocket connection header:

  • Header: x-api-key: your_api_key

Quick Reference

Direction Message Type Example
🔵⬆️ Client→Server Setup (first) {"type": "setup", "model_name": "default", "input_format": "pcm"}
🟢⬇️ Server→Client Ready {"type": "ready", "request_id": "uuid", "model_name": "default", "sample_rate": 24000}
🔵⬆️ Client→Server Audio {"type": "audio", "audio": "base64..."}
🟢⬇️ Server→Client Text (result) {"type": "text", "text": "Hello world", "start_s": 0.5}
🟢⬇️ Server→Client VAD (activity) {"type": "step", "vad": [...], "step_idx": 5, "step_duration_s": 0.08}
🟢⬇️ Server→Client End Text {"type": "end_text", "stop_s": 2.5}
🔵⬆️ Client→Server EndOfStream {"type": "end_of_stream"}
🟢⬇️ Server→Client EndOfStream {"type": "end_of_stream"}
🔴⬇️ Server→Client Error {"type": "error", "message": "Error description", "code": 1008}

Message Types

1. Setup Message (First Message)

Direction: Client → Server Format: JSON Object

{
  "type": "setup",
  "model_name": "default",
  "input_format": "pcm"
}

Fields:

  • type (string, required): Must be "setup"
  • model_name (string, required): The Speech-To-Text model to use (default: "default")
  • input_format (string, required): Audio format - "pcm", "wav", or "opus"

Important: This must be the very first message sent after connection. The server will close the connection if any other message is sent first.


2. Ready Message

Direction: Server → Client Format: JSON Object

{
  "type": "ready",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "model_name": "default",
  "sample_rate": 24000,
  "frame_size": 1920,
  "delay_in_tokens": 0,
  "text_stream_names": []
}

Fields:

  • type (string): Will be "ready"
  • request_id (string): Unique identifier for the session
  • model_name (string): The Speech To Text model being used
  • sample_rate (integer): Expected sample rate in Hz (typically 24000)
  • frame_size (int): Number of samples by which the model processes data (typically 1920 which is equivalent to 80ms at 24kHz)
  • delay_in_tokens (integer): Delay in tokens for the model
  • text_stream_names (array): List of text stream names

This message is sent by the server after receiving the setup message, indicating that the connection is ready to receive audio.


3. Audio Message

Direction: Client → Server Format: JSON Object (with binary audio data)

{
  "type": "audio",
  "audio": "base64_encoded_audio_data..."
}

Fields:

  • type (string, required): Must be "audio"
  • audio (string, required): Base64-encoded audio data

Audio Format Requirements (for PCM input):

  • Sample Rate: 24000 Hz (24kHz)
  • Format: PCM (Pulse Code Modulation)
  • Bit Depth: 16-bit signed integer (little-endian)
  • Channels: Single channel (mono)
  • Chunk Size: Recommended 1920 samples per chunk (80ms at 24kHz)

Send audio messages to be transcribed. You can send multiple audio messages in sequence. The server will stream text and VAD responses as it processes the audio.


4. Text Response

Direction: Server → Client Format: JSON Object

{
  "type": "text",
  "text": "Hello world",
  "start_s": 0.5,
  "stream_id": null
}

Fields:

  • type (string): Will be "text"
  • text (string): The transcribed text
  • start_s (float): Start time of the transcription in seconds
  • stream_id (integer or null): Stream identifier for tracking multiple concurrent streams

Text messages contain the transcribed speech. Multiple text messages will be streamed as the audio is processed.


5. VAD Response (Voice Activity Detection)

Direction: Server → Client Format: JSON Object

{
  "type": "step",
  "vad": [
    {
      "horizon_s": 0.5,
      "inactivity_prob": 0.05
    },
    {
      "horizon_s": 1.0,
      "inactivity_prob": 0.08
    },
    {
      "horizon_s": 2.0,
      "inactivity_prob": 0.12
    }
  ],
  "step_idx": 5,
  "step_duration_s": 0.08,
  "total_duration_s": 0.4
}

Fields:

  • type (string): Will be "step"
  • vad (array): List of VAD predictions with future horizons
    • horizon_s (float): Lookahead duration in seconds
    • inactivity_prob (float): Probability that voice activity has ended by this horizon in seconds.
  • step_idx (integer): The step index (increments every 80ms)
  • step_duration_s (float): Duration of this step in seconds (typically 0.08)
  • total_duration_s (float): Total duration of audio processed so far

VAD Interpretation:

  • VAD messages are emitted every 80ms (one per audio frame)
  • Use the inactivity_prob value from the longest horizon to determine if the speaker has likely finished
  • Higher inactivity_prob values indicate higher confidence that speaking has ended
  • Recommended threshold: Use vad[2]["inactivity_prob"] (third prediction) as the turn-taking indicator

6. End Text Response

Direction: Server → Client Format: JSON Object

{
  "type": "end_text",
  "stop_s": 2.5,
  "stream_id": null
}

Fields:

  • type (string): Will be "end_text"
  • stop_s (float): Stop time of last text message in seconds
  • stream_id (integer or null): Stream identifier

Sent when the previous text segment has a finished and its end timestamp is available.


7. End Of Stream

Direction: Client → Server and Server → Client Format: JSON Object

{
  "type": "end_of_stream"
}

This message is sent by the client when it has finished sending audio. The server will then process any remaining audio and send back all outstanding text results, VAD information, and then an end_of_stream message before closing the connection.


Error Handling

When errors occur, the server sends an error message as JSON before closing the connection:

Error Message Format:

{
  "type": "error",
  "message": "Error description explaining what went wrong",
  "code": 1008
}

Common Error Codes:

  • 1008: Policy Violation (e.g., invalid API key, missing setup message, invalid audio format)
  • 1011: Internal Server Error (unexpected server-side error)

Best Practices for STT

  1. Always send setup first: The server expects a setup message immediately after connection
  2. Use correct audio format: When using PCM, ensure audio is 24kHz PCM 16-bit mono
  3. Send appropriately sized chunks: 1920 samples (80ms) per message is recommended
  4. Graceful shutdown: Send end_of_stream when done to properly close the session
header Parameters
x-api-key
required
string

Your Gradium API key

Responses

Voices

Manage custom voice clones

Create Voice

Create a new voice for an organization with audio file upload.

Request Body schema: multipart/form-data
required
audio_file
required
string <binary> (Audio File)
name
required
string (Name)
input_format
string (Input Format)
Description (string) or Description (null) (Description)
Language (string) or Language (null) (Language)
start_s
number (Start S)
Default: 0
timeout_s
number (Timeout S)
Default: 10

Responses

Response samples

Content type
application/json
{
  • "uid": "string",
  • "error": "string",
  • "was_updated": false
}

Get Voices

List voices for the authenticated organization.

query Parameters
skip
integer (Skip)
Default: 0
limit
integer (Limit)
Default: 100
include_catalog
boolean (Include Catalog)
Default: false

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Get Voice

Get a voice by its UID. Optional org_uid and key_uid for access control.

path Parameters
voice_uid
required
string (Voice Uid)

Responses

Response samples

Content type
application/json
{
  • "uid": "string",
  • "name": "string",
  • "description": "string",
  • "language": "string",
  • "start_s": 0,
  • "stop_s": 0,
  • "filename": "string"
}

Update Voice

Update a voice by its UID.

path Parameters
voice_uid
required
string (Voice Uid)
Request Body schema: application/json
required
Name (string) or Name (null) (Name)
Description (string) or Description (null) (Description)
Language (string) or Language (null) (Language)
Start S (number) or Start S (null) (Start S)
Array of Tags (objects) or Tags (null) (Tags)
Rank (number) or Rank (null) (Rank)

Responses

Request samples

Content type
application/json
{
  • "name": "string",
  • "description": "string",
  • "language": "string",
  • "start_s": 0,
  • "tags": [
    ],
  • "rank": 0
}

Response samples

Content type
application/json
{
  • "uid": "string",
  • "name": "string",
  • "description": "string",
  • "language": "string",
  • "start_s": 0,
  • "stop_s": 0,
  • "filename": "string"
}

Delete Voice

Delete a voice by its UID.

path Parameters
voice_uid
required
string (Voice Uid)

Responses

Response samples

Content type
application/json
{
  • "detail": [
    ]
}

Credits

Monitor API credit balance