Gradium TTS in Pipecat: Setup and Integration Guide
Gradium ships as a natively supported text-to-speech provider inside Pipecat, the open-source Python framework for building real-time voice and multimodal agents. The integration is maintained as part of Pipecat's official service catalogue, with a dedicated GradiumTTSService class, a published API reference, and a working example in the Pipecat repository.
This article covers what that integration actually provides: how to install it, what each configuration setting does, what is fixed by design, and where to find the official references if you are wiring it into a production pipeline.
What Pipecat is and where Gradium fits
Pipecat's provider model: one framework, many services
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. It connects speech-to-text, an LLM, and text-to-speech into a single real-time pipeline, and handles the surrounding plumbing: audio transport, turn-taking, and interruption detection. Rather than building its own speech models, Pipecat orchestrates external services through a common interface, and ships installable extras for dozens of providers across STT, TTS, and LLM categories, from Deepgram and ElevenLabs to Cartesia, Hume, and Gradium, each maintained as a separate optional dependency.
This means choosing a TTS provider in Pipecat is a configuration decision, not an architectural one. The same pipeline structure, audio transport, and turn-taking logic stays in place regardless of which TTS service is plugged in.
Where Gradium sits in that ecosystem
Gradium is one of these natively supported services, installable as a Pipecat extra (pipecat-ai[gradium]) and exposed through GradiumTTSService, a class that follows the same configuration pattern as every other TTS service in the framework. Gradium maintains close integration ties with Pipecat: the service is documented directly on docs.pipecat.ai, with a dedicated API reference page and a runnable example shipped in the official Pipecat GitHub repository.
For a step-by-step build using Gradium with Pipecat, see How to Build an Audiobook Agent with Gradium and Pipecat, which walks through a complete voice application using this same service.
Setting up GradiumTTSService in Pipecat
Installation and basic configuration
Installing the Gradium extra pulls in the dependencies needed to run GradiumTTSService:
uv add "pipecat-ai[gradium]"
Before using the service, you need a Gradium account, an API key generated from the Gradium dashboard, and a voice ID, either selected from Gradium's voice catalogue or created as a custom clone. The API key is read from the GRADIUM_API_KEY environment variable.
A minimal setup looks like this:
from pipecat.services.gradium import GradiumTTSService
tts = GradiumTTSService(
api_key=os.getenv("GRADIUM_API_KEY"),
settings=GradiumTTSService.Settings(
voice="_6Aslh2DxfmnRLmP",
),
)
The service connects to Gradium's WebSocket API endpoint, with traffic automatically routed to the nearest available region. The endpoint can be overridden to pin a specific region or a custom deployment if your infrastructure requires it.
Configurable settings: voice, model, and language
GradiumTTSService exposes its runtime-configurable options through a Settings object, which can be updated mid-conversation using TTSUpdateSettingsFrame without restarting the pipeline. The current settings are model (which model identifier to use for synthesis, defaulting to "default"), voice (the voice identifier), and language (the synthesis language).
tts = GradiumTTSService(
api_key=os.getenv("GRADIUM_API_KEY"),
settings=GradiumTTSService.Settings(
model="default",
voice="your-voice-id",
),
)
Pipecat's documentation notes a recent change worth flagging if you are working from older example code: the InputParams and params= pattern used in earlier versions of the service is deprecated as of Pipecat v0.0.105, replaced by the Settings and settings= pattern shown above.
What is fixed: the 48kHz output constraint
One detail worth knowing before building around this service: Gradium's TTS output through Pipecat is fixed at a 48kHz sample rate. This is set automatically and is not configurable. For most voice agent pipelines using modern WebRTC or telephony transports, 48kHz is a standard and well-supported rate, but it is a constraint to account for if your downstream pipeline expects a specific alternate sample rate and would otherwise need a resampling step.
Features that matter for voice agent pipelines
Word-level timestamps
Gradium's TTS service in Pipecat provides word-level timestamps alongside the generated audio. This is the kind of detail that matters specifically for production features rather than basic synthesis: synchronized captions, karaoke-style text highlighting, or precise alignment between spoken audio and an on-screen transcript all depend on knowing exactly when each word starts and ends in the output stream.
Runtime voice switching
Changing the voice setting at runtime, through UpdateSettingsFrame, automatically disconnects and reconnects the underlying WebSocket connection with the new voice configuration applied. This is handled by the service itself rather than requiring manual connection management, which matters for any agent that needs to switch character voices, languages, or branded voice identities mid-session without restarting the entire pipeline.
The service also exposes the standard Pipecat service connection events, on_connected, on_disconnected, and on_connection_error, which can be used to log connection state or trigger custom handling around the WebSocket lifecycle.
@tts.event_handler("on_connected")
async def on_connected(service):
print("Connected to Gradium")
Get started
GradiumTTSService is in Pipecat's official catalogue today. Install it with uv add "pipecat-ai[gradium]", read the API reference on docs.pipecat.ai, and generate an API key at gradium.ai. For a full walkthrough, see How to Build an Audiobook Agent with Gradium and Pipecat.
Glossary
GradiumTTSService. The Pipecat service class that connects to Gradium's WebSocket text-to-speech API. Provides streaming synthesis, instant voice cloning support, word-level timestamps, and runtime-configurable voice and model settings within a Pipecat pipeline.
Pipecat extra. An optional, separately installable dependency group in the pipecat-ai Python package that adds support for a specific external service. Gradium is installed via the gradium extra (pipecat-ai[gradium]), alongside extras for dozens of other providers.
Settings object. Pipecat's pattern for exposing runtime-configurable parameters on a service, passed via a settings= constructor argument and updatable mid-conversation through frames like TTSUpdateSettingsFrame. Replaced the older InputParams/params= pattern as of Pipecat v0.0.105.
TTSUpdateSettingsFrame. A Pipecat frame type used to update a TTS service's configurable settings, such as voice or model, while a pipeline is already running, without requiring a full service restart.
Word-level timestamp. Timing metadata indicating the start and end of each word in a synthesized audio output. Used to synchronize on-screen text, captions, or transcript highlighting with spoken audio. Provided natively by Gradium's TTS service in Pipecat.