How to Build a Voice AI Agent with Gradium and LiveKit (Python Guide)

6 min read

Gradium's Speech-to-Text and Text-to-Speech models integrate directly with the LiveKit voice agent framework. The result: a fully conversational voice AI agent, built in about 100 lines of Python, with semantic Voice Activity Detection, natural interruption handling, and production deployment in a single command.

This guide walks through the full setup, from installing the plugin to deploying to LiveKit Cloud. If you are building a non-conversational narrator instead, see our companion guide on building an audiobook agent with Pipecat.

What Do You Need Before You Start?

Before starting, you will need:

  • A LiveKit Cloud account (URL, API key, and API secret, available from the LiveKit dashboard or CLI)
  • A Gradium API key (generated from Gradium Studio)
  • Python with uv for dependency management

How Do You Install the Gradium Plugin for LiveKit?

If you are starting from the LiveKit agent starter template, clone it and install dependencies with uv.

If you are adding Gradium to an existing LiveKit project, you only need one install:

pip install "livekit-agents[gradium]~=1.3"

This single package gives you both Gradium STT and TTS as ready-to-use plugins.

How Do You Configure Your Environment?

Create a .env file at the root of your project with the following variables:

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-key
LIVEKIT_API_SECRET=your-secret
GRADIUM_API_KEY=your-gradium-key

How Do You Set Up the Agent Session?

All agent logic lives in a single file: agent.py. The core of the agent is the AgentSession, which handles the entire real-time loop. Instead of manually wiring up audio transcription and LLM responses, you declare which models you want to use and the session takes care of the rest:

session = AgentSession(
    stt=gradium.STT(vad_threshold=0.6, vad_bucket=1),
    llm=inference.LLM(model="openai/gpt-4.1-mini"),
    tts=gradium.TTS(),
    allow_interruptions=True,
    min_interruption_words=0,
    preemptive_generation=True,
)

What Are the Key Session Parameters?

What Is Semantic Voice Activity Detection (VAD)?

Gradium's built-in semantic VAD does not simply wait for silence. It understands when the speaker has actually finished a thought. This removes the awkward pauses that voice bots typically introduce between a user's sentence and the agent's response.

The vad_threshold and vad_bucket parameters in gradium.STT() control how the VAD behaves.

What Does allow_interruptions Do?

Set to True to let users talk over the agent at any point. Combined with min_interruption_words=0, the agent will respond to interruptions immediately, without requiring a minimum word count.

What Does preemptive_generation Do?

Set to True to have the LLM start generating a response before the user has finished their sentence. This reduces perceived response latency in conversation.

How Do You Customize the TTS?

Gradium's TTS works with its default settings. If you need to customize it, you can pass a voice_id or a json_config to control things like speed or how the model handles structured tokens like dates or emails. For high-volume conversational deployments, see also how to multiplex TTS requests over one WebSocket connection.

How Do You Define the Agent Class and Its Tools?

The agent is defined as a Python class. It receives two things: instructions that shape its personality, and tools (Python functions it can call during a conversation):

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are Protein Partner, a personalized ..."
        )

    @function_tool
    async def lookup_profile(self, name: str) -> str:
        """Look up a user's saved profile."""
        ...

In the example above, the agent has two tools: lookup_profile and save_profile. When a user provides their name, the agent can automatically retrieve their saved profile and greet them with personalized data.

How Do You Download Models and Test Your Agent?

Before the first run, download the required models:

uv run python src/agent.py download-files

To test your agent directly in the terminal:

uv run python src/agent.py console

To run it as a full server connected to LiveKit Cloud:

uv run python src/agent.py dev

How Do You Deploy to Production?

When you are ready to go live, the starter template includes a Dockerfile. Deploy to LiveKit Cloud with a single command:

lk cloud deploy

Make sure your GRADIUM_API_KEY is added as a secret in the LiveKit Cloud settings. Without it, your agent will not be able to process speech or generate audio in production.

Summary: Building a Voice Agent with Gradium and LiveKit

Here is the complete setup at a glance:

Step What you do
1. Install pip install "livekit-agents[gradium]~=1.3"
2. Configure Set LiveKit and Gradium credentials in .env
3. AgentSession Declare STT, LLM, and TTS models in agent.py
4. Parameters Configure VAD, interruptions, and preemptive generation
5. Agent class Define instructions and function tools
6. Test console for terminal, dev for LiveKit Cloud
7. Deploy lk cloud deploy with GRADIUM_API_KEY set as a secret

Frequently Asked Questions

What does `pip install "livekit-agents[gradium]~=1.3"` install?
It installs the LiveKit agents framework with the Gradium plugin, giving you both Gradium STT and TTS as ready-to-use components.
What is the AgentSession?
The AgentSession is the component that handles the entire real-time loop for the voice agent. You declare which STT, LLM, and TTS models to use, and it manages audio transcription, LLM responses, and speech generation automatically.
What is semantic VAD and why does it matter?
Gradium's built-in semantic VAD understands when a speaker has finished a turn, rather than simply detecting silence. This removes the awkward pauses that are common in voice bots.
What does preemptive_generation do?
It makes the LLM start generating a response before the user has finished their sentence, reducing perceived latency in conversation.
How do I add tools to my agent?
Define async Python functions decorated with @function_tool inside your Agent class. The agent will call them automatically during conversation when relevant.
How do I customize the TTS voice or behavior?
Pass a voice_id or a json_config to gradium.TTS() to control the voice or adjust how the model handles structured tokens like dates or emails.
How do I deploy the agent to production?
Use `lk cloud deploy` from the project root. Make sure GRADIUM_API_KEY is added as a secret in your LiveKit Cloud settings before deploying.
Does the agent work if GRADIUM_API_KEY is missing in production?
No. Without the API key set as a secret in LiveKit settings, the agent will not be able to process speech or generate audio.