Pipecat is a voice agent framework that handles the transport and pipeline orchestration for real-time voice applications. In this project it manages WebRTC transport, the conversation pipeline, and event handling for client connections.

Why is audio input disabled in this project?

Fable is a story narrator that works from typed prompts. Users type a topic and the agent narrates a story. There is no need for speech recognition, so audio input is disabled to simplify the setup. Gradium provides STT services if you want to add voice input later.

What does the temp parameter do in GradiumTTSService?

It controls the generation temperature for speech. Lower values produce more consistent and stable delivery. Higher values add more variation and expressiveness. A value of 0.6 is used in this project as a balanced starting point for audiobook narration.

Can I use a different LLM instead of Gemini?

Yes. The tutorial uses Gemini 2.0 Flash as an example, but Pipecat supports other LLMs. You can swap it for any compatible model.

How long does it take to build this?

The complete agent is built in about 100 lines of Python.

How to Build an Audiobook Agent with Gradium and Pipecat: Step-by-Step Guide

Q: How does Gradium stream audio in real time?

Gradium TTS connects over a persistent WebSocket. When the service starts it sends a setup message with the voice selection and output format. Gradium responds with a ready message, and audio is streamed back as it is generated. The listener hears the narration begin almost immediately.

Q: Can I deploy this to production?

Yes. When you are ready for production, you can deploy to Pipecat Cloud with a single command.

What if you could generate a custom audiobook on any topic and hear it narrated in real time? That is exactly what this guide builds: a voice agent called Fable, a story narrator that takes any prompt you give it and speaks a vivid, original story back as audio, almost instantly.

This is built with Gradium for Text-To-Speech and Pipecat as the voice agent framework. The full pipeline runs in about 100 lines of Python. If you need a fully conversational agent (with speech recognition and turn-taking) rather than a one-way narrator, see the companion guide on building a voice AI agent with LiveKit. Evaluating voice AI providers? See how Gradium compares to ElevenLabs, Cartesia, and Deepgram.

How Do You Install the Dependencies?

A single install command gives you everything needed. Gradium handles narration and Pipecat's built-in WebRTC transport delivers the audio to the browser.

pip install "pipecat-ai[gradium,google,runner,webrtc]"

How Do You Set Your Credentials?

Add your API keys to your local environment:

GRADIUM_API_KEY=your-gradium-key
GOOGLE_API_KEY=your-google-key

How Does the Pipeline Work?

The Fable pipeline follows a simple, linear flow:

Text Input -> User Aggregator -> Gemini LLM -> Gradium TTS -> WebRTC Audio Out

Text arrives over the WebRTC data channel, gets added to the LLM context, Gemini generates narrative prose, and Gradium streams the narration back as audio. That is the entire flow.

How Do You Configure the Transport?

Since users type their prompts instead of speaking them, audio input is disabled entirely. Only audio output is needed for the narrated story.

from pipecat.transports.base_transport import BaseTransport, TransportParams

transport_params = {
    "webrtc": lambda: TransportParams(
        audio_in_enabled=False,
        audio_out_enabled=True,
    ),
}

How Do You Configure Gradium TTS?

Gradium's TTS service connects over a persistent WebSocket and streams audio as it generates. The listener hears the story begin almost immediately, without waiting for the entire narration to finish.

from pipecat.services.gradium.tts import GradiumTTSService

tts = GradiumTTSService(
    api_key=os.getenv("GRADIUM_API_KEY"),
    voice_id="zIGaffB0kKEBG_8u",
    params=GradiumTTSService.InputParams(temp=0.6),
)

The voice_id selects which voice to use. You can browse available voices in Gradium Studio and swap this for any voice that fits your narrator's personality.

The temp parameter controls generation temperature:

Lower values produce more consistent speech.
Higher values add more variation and expressiveness.

A value of 0.6 is used here as a balanced starting point for narration. For finer control over temp, cfg_coef, padding_bonus, and text normalization, see the json_config reference guide.

Under the hood, when the service starts it opens a WebSocket and sends a setup message with the voice selection and output format. Gradium responds with a ready message, and the connection is open for streaming.

How Do You Configure the LLM?

This example uses Gemini 2.0 Flash to generate stories. You can use any other LLM as well.

from pipecat.services.google.llm import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.0-flash",
)

How Do You Define Fable's Character with a System Prompt?

The system prompt defines who Fable is: a master storyteller that produces vivid, immersive prose in the style of an audiobook narrator.

SYSTEM_PROMPT = (
    "You are Fable, a master storyteller and narrator. "
    "The user sends you a short topic or prompt and you turn it into a vivid, "
    "immersive story told in the style of an audiobook narrator.\n\n"
    "How you behave:\n"
    "- When the conversation starts with no user prompt yet, introduce yourself "
    "briefly: 'Hello, I am Fable. Give me a topic and I will spin you a tale.'\n"
    "..."
)

Fable adapts its style to match the mood of the prompt: a fairy tale gets whimsy, a thriller gets tension, a slice-of-life piece gets warmth. The output is kept as pure prose so TTS reads it naturally. No bullet points or markdown that would sound awkward when spoken aloud.

How Do You Build the Pipeline?

Since there is no microphone, user prompts arrive as text over the WebRTC data channel. A handler takes each message, adds it to the conversation context, and triggers the LLM to generate a story.

@transport.event_handler("on_app_message")
async def on_app_message(transport, message, sender):
    if isinstance(message, str) and message.strip():
        frame = LLMMessagesAppendFrame(
            messages=[{"role": "user", "content": message.strip()}],
            run_llm=True,
        )
        await task.queue_frames([frame])

The pipeline itself is straightforward. Text comes in, goes through the LLM, the response flows through Gradium TTS, and audio goes out to the browser.

pipeline = Pipeline(
    [
        transport.input(),
        user_aggregator,
        llm,
        tts,
        transport.output(),
        assistant_aggregator,
    ]
)

When a client connects, Fable's opening line is triggered. When they disconnect, the task is cleaned up.

@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
    await task.queue_frames([LLMRunFrame()])

@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
    await task.cancel()

How Do You Run the Agent?

python src/agent_pipecat.py

The agent starts a local WebRTC server. Open the browser interface and you will hear Fable introduce itself. Type a story prompt and Fable narrates a custom story back in real time.

What Does Gradium TTS Bring to This Build?

Gradium TTS provides four key capabilities in this project:

Streaming audio over a persistent WebSocket: narration starts almost instantly, without waiting for the full story to be generated.
Voice selection: pick any voice from Gradium Studio to match the narrator's personality.
Temperature control: dial in expressiveness for the narration style.
Word-level timestamps: open the door to synchronized captions and highlights.

When you are ready for production, you can deploy to Pipecat Cloud with a single command.

Summary: Building an Audiobook Agent with Gradium and Pipecat

Step	What you do
1. Install	`pip install "pipecat-ai[gradium,google,runner,webrtc]"`
2. Credentials	Add `GRADIUM_API_KEY` and `GOOGLE_API_KEY` to your env
3. Transport	Disable audio input, enable audio output (WebRTC)
4. TTS	Configure `GradiumTTSService` with voice ID and `temp=0.6`
5. LLM	Configure Gemini 2.0 Flash (or any LLM)
6. System prompt	Define Fable as a storyteller that outputs pure prose
7. Pipeline	Connect input, aggregator, LLM, TTS, and output
8. Run	`python src/agent_pipecat.py`