Gradbot: Vibe code voice agents in 50 lines of code

When we launched Gradium in December 2025, we started with the core models that power voice agents: best-in-class Text-to-Speech (TTS) and Speech-to-Text (STT).

But having great STT and TTS models is only half the battle. To actually build a voice agent, you need to orchestrate these components: wire the transport layer, handle interruptions, manage conversation flow, coordinate tool calls, and sync everything in real time.

In order to spin up client demos quickly, we built a minimal internal framework to handle the orchestration and design custom voice demos easily. Today we're sharing it with the community.

Gradbot is an open-source framework for prototyping voice agents in minutes. Whether you're building a 3D NPC game or a travel booking assistant, Gradbot lets you go from idea to working voice experience in around 50 lines of code.

Gradbot Core

Voice agents need three streams running in sync: listening to the user (speech-to-text), deciding what to say (LLM inference), and speaking back (text-to-speech).

At the center of Gradbot is a multiplexing engine written in Rust that coordinates these three pipelines in real time. It manages the continuous back-and-forth between STT, LLM, and TTS while handling conversational state.

Every agent built on Gradbot gets natural turn-taking and interruption handling out of the box.

Silence and Flushing

The goal: Make your agent reply quickly to your questions and feel present during pauses.

We want to Gradbot to react to the user queries as fast as possible. Gradium STT models include voice activity detection, this part determines when the user has finished talking and Gradbot sends the transcript to the LLM so as to get an answer back. The STT model operates with a delay, this would cause additional latency so to get around it Gradbot flushes trailing audio by pushing silence into the STT buffer. This ensures the LLM gets your complete utterance before replying. Combined with a prompt that encourages filler words, conversations feel more natural.

Most voice agents go silent when you do, creating dead air. Gradbot sends a silence signal to the LLM after a configurable window (default: 5 seconds). The LLM responds naturally, asking if you're still there or prompting with a follow-up. If silence continues, responses evolve instead of repeating. After five unanswered attempts, Gradbot closes the session.

Both behaviors work out of the box. You can tweak the defaults.

Asynchronous Tool Calling

The goal: Keep conversations flowing when external services are slow or fail.

When your agent checks flight availability or searches for restaurants, you don't want it to freeze. Gradbot's Rust core tracks every tool call through three states: new (requested), pending (LLM knows it's waiting), and answered (result in). If a tool never responds, the call is marked lost. The LLM sees this and recovers gracefully instead of hanging.

This tracking ensures the LLM always has valid conversation history, regardless of when results arrive. Your booking and search agents keep users engaged even when services glitch.

All of this lives in the Rust core. Your app just handles business logic.

Who Should Use Gradbot?

Gradbot is built for prototyping and experimentation. Use it to hack on ideas and build voice experiences without spending hours on infrastructure.

Good for:

Support agents and real-time assistants
Coaching and educational apps
Tool-using voice workflows
Games and experimental interfaces

Build weird, fun stuff. Voice agents don't have to be boring.

For production, use Gradium's models through orchestrators like LiveKit and Pipecat for enterprise-grade reliability and scaling.

Get Started

pip install gradbot

You can point your coding agent to the Gradbot voice agent skill or you can build on top of the existing Gradbot demos.

Gradbot is live and open source today. We can't wait to see what you build.