Guides & Tutorials: Voice AI Developer Resources

Spotlight post

Gradium Extends Funding to $100 Million and Expands to Silicon Valley

Gradium extends its seed funding to $100 million, welcoming new investors including NVIDIA, and opens a San Francisco Bay Area office to scale its real-time voice AI.

July 8, 20263 min read

Gradium Extends Funding to $100 Million and Expands to Silicon Valley

Models

Text-to-SpeechTurn text into natural sounding voice Speech-to-TextTranscribe audio into text instantly Live TranslationConvert voice to voice in real time Voice CloningInstance or pro voice clone On-Device TTSOffline real-time natural voice

Showcase

Agent DemoTest our voice agent live GradbotBuild voice agents in a single prompt App GalleryApps built on Gradium

Guides & Tutorials

Gradium Phonon: On-Device TTS Benchmarks in 2026

Gradium Phonon on-device TTS reaches 1.00% WER with voice cloning and 0.83% WER fixed voice on Seed-TTS, at 100M parameters. Full benchmarks and architecture.

June 30, 2026•11 min read•Benchmark

Gradium TTS in Pipecat: Setup and Integration Guide

Gradium TTS integration for Pipecat: install GradiumTTSService, configure voice and model settings, and stream low-latency speech in your voice agent.

June 30, 2026•8 min read•Tutorial

Semantic VAD for Voice Agents: Turn Detection 2026

Semantic VAD explained: why voice agents interrupt users mid-sentence, how turn detection actually works, and how to configure delay_in_frames in Gradium STT.

June 30, 2026•12 min read•Tutorial

Why Generic TTS Fails on Regional Accents

Regional accent voice AI: generic TTS flattens Bavarian, Argentinian, and Quebecois speech into one default. Gradium preserves accent from a 10-second sample.

June 30, 2026•12 min read•Tutorial

How to Turn an LLM into a Voice Agent: Best Stack 2026

Turn any LLM into a voice agent in 2026: the best stack pairs your LLM with Gradium for 155 ms TTFA and semantic VAD, via LiveKit or Pipecat.

June 17, 2026•10 min read•Tutorial

Best Voice AI API for Phone-Based Voice Agents in 2026

Best voice AI API for phone-based voice agents in 2026: Gradium delivers 155 ms TTFA, 3.3% WER, and flexible audio formats for telephony pipelines.

June 16, 2026•10 min read•Comparison

Best TTS API in 2026: Quality, Latency, and Cost Compared

Best TTS API for voice agents in 2026: Gradium leads Coval with 155 ms TTFA, 3.3% WER, and 2 ms IQR. Quality, latency, and cost compared across 8 providers.

June 3, 2026•8 min read•Comparison

Cascaded Voice Agents vs Speech-to-Speech: Architecture Tradeoffs in 2026

Cascade (STT + LLM + TTS) vs speech-to-speech for voice agents: latency, modularity, paralinguistic information, and LLM flexibility compared. Which architecture fits which use case in 2026.

May 27, 2026•12 min read•Tutorial

STT API Benchmark 2026: Latency and Accuracy for Voice Agents

Independent Coval benchmark comparing 5 STT APIs on TTFT and WER for voice agents: Gradium, Deepgram Nova 3, Nova 2, AssemblyAI Universal Streaming, and ElevenLabs Scribe v2. Latency vs accuracy tradeoffs explained.

May 27, 2026•13 min read•Tutorial

Turn-Taking in Voice Agents: Why Rule-Based VAD Is Broken and What Comes Next

Why voice activity detection rules make every cascaded voice agent feel unnatural. The turn-taking problem explained, how full duplex solves it, and where the architecture is heading in 2026.

May 27, 2026•11 min read•Tutorial

Phonon Reaches 1.00% WER on Seed-TTS in May 2026 — Smallest On-Device TTS Model in the Comparison

Phonon, Gradium's 100M-parameter on-device Text-To-Speech model, reaches 1.00% WER on the Seed-TTS English benchmark in May 2026 with voice cloning, and 0.83% WER with a fixed voice. Outperforms NeuTTS Air (552M), KaniTTS2 (450M), NeuTTS Nano (229M), Kokoro (82M), Magpie (357M), and Supertonic 2 (66M).

May 26, 2026•6 min read•Benchmark

Azure TTS Alternative: Gradium for Real-Time Voice AI

Microsoft Azure TTS alternative for voice agents: Gradium delivers streaming TTS and STT with semantic VAD, sub-300 ms TTFA, instant voice cloning, and transparent pricing.

May 20, 2026•15 min read•Comparison

Best AI Voice Generators in 2026: APIs Ranked by Voice Quality, Latency, and Price

Compare the best AI voice generator APIs in 2026: voice quality (Artificial Analysis ELO), latency, pricing, and production benchmarks. Includes Gradium, ElevenLabs, Inworld, Google, OpenAI, and more.

May 20, 2026•20 min read•Comparison

Best ElevenLabs Alternatives in 2026: Top TTS APIs Ranked by Voice Quality and Price

Looking for ElevenLabs alternatives in 2026? Compare the top TTS APIs ranked by independent voice quality ELO score (Artificial Analysis) and pricing per million characters, including Gradium, Inworld, Google, OpenAI, and more.

May 20, 2026•13 min read•Comparison

Best Speech APIs in 2026: TTS, STT Compared

Compare the best speech APIs in 2026 for text-to-speech and speech-to-text. Verified pricing, latency benchmarks, and production data.

May 20, 2026•11 min read•Comparison

Gradium Phonon: On-Device TTS for Mobile Apps, NPCs, and Offline Products

Gradium Phonon is an on-device TTS model that runs on CPU across Android, iOS, and browser with no network dependency. 1.48% WER, 56.37% speaker similarity on Seed-TTS. Built for mobile apps, game NPCs, and offline products.

May 20, 2026•11 min read•Tutorial

On-Device Text-to-Speech in 2026: When Edge TTS Is the Right Architecture

When on-device TTS is the right architecture in 2026: offline apps, high-volume consumer products, and privacy-constrained deployments. Gradium Phonon benchmarks and use case guide.

May 20, 2026•10 min read•Tutorial

On-Device TTS Benchmark 2026: Phonon vs Kani-TTS2 vs NeuTTS on Seed-TTS

Independent on-device TTS benchmark 2026: Gradium Phonon vs Kani-TTS2 vs NeuTTS Air vs NeuTTS Nano on Seed-TTS English. WER and speaker similarity results, methodology, and what they mean for edge deployment.

May 20, 2026•12 min read•Tutorial

Best Low-Latency TTS APIs in 2026: TTFA, P99 and Pipeline Impact

Compare the best low-latency TTS APIs in 2026, benchmarked by TTFA (P50/P99), WebSocket architecture, codebook configs and full voice agent pipeline impact.

May 13, 2026•14 min read•Comparison

Best Multilingual TTS APIs in 2026: Coverage, Quality, Code-Switching

Compare the best multilingual TTS APIs in 2026: Gradium, ElevenLabs, Cartesia, Deepgram on language coverage, code-switching, voice cloning and latency.

May 13, 2026•11 min read•Comparison

Best Text-To-Speech APIs in 2026: Developer Guide

Compare the best TTS APIs in 2026 by latency, voice cloning, languages, and pricing. Covers Gradium, ElevenLabs, Cartesia, Deepgram Aura-2, and OpenAI TTS.

May 13, 2026•10 min read•Comparison

Best Voice Cloning APIs in 2026: Instant Cloning, Fine-Tuning, Benchmarks

Compare the best voice cloning APIs in 2026: Gradium, ElevenLabs, Cartesia. Instant vs. professional cloning, speaker similarity benchmarks, pricing.

May 13, 2026•9 min read•Comparison

Gradium vs ElevenLabs for Voice Agents: TTFA, WER and IQR Compared (2026 Coval Data)

Gradium vs ElevenLabs for voice agents in 2026. Independent Coval benchmark data on TTFA, WER and latency IQR across Gradium TTS, ElevenLabs Turbo v2.5, Flash v2.5 and Multilingual v2. Gradium leads at 155ms P50 TTFA (vs 264ms Turbo v2.5), 2ms IQR (vs 28ms), 3.3% WER (vs 5.2%). Plus 1.11% MiniMax multilingual WER and 3-4x lower pricing.

May 5, 2026•16 min read•Comparison

TTS Latency Benchmark 2026: TTFA Compared Across Gradium, ElevenLabs, Cartesia and Deepgram

TTS latency benchmark 2026: Gradium TTS leads at 155ms P50 TTFA with a 2ms IQR on the independent Coval benchmark. Full TTFA comparison across Gradium, ElevenLabs (Turbo v2.5, Flash v2.5, Multilingual v2), Cartesia Sonic-3, Deepgram Aura-2, Rime (Mist-v3, Arcana) and OpenAI TTS-1-HD. Methodology, P25/P50/P75/P95, IQR consistency, and WER.

May 5, 2026•16 min read•Benchmark

TTS WER Benchmark 2026: Word Error Rate Compared Across Gradium, ElevenLabs, Cartesia and Deepgram

TTS WER benchmark 2026: Gradium TTS leads at 3.3% average WER on the Coval benchmark and 1.11% on the MiniMax Multilingual TTS Test Set across 5 languages (EN, FR, ES, PT, DE). Word Error Rate compared across Gradium, ElevenLabs (Flash v2.5, Turbo v2.5, Multilingual v2), Cartesia Sonic-3, Deepgram Aura-2, Rime (Mist-v3, Arcana), Qwen3 TTS, Mistral Voxtral and OpenAI TTS-1-HD.

May 5, 2026•15 min read•Benchmark

Cartesia Alternative: Why Developers Choose Gradium for Real-Time Voice AI

Gradium vs Cartesia comparison for real-time voice AI. Voice-agent-tuned TTS with robust pronunciation, semantic VAD in STT, accent-preserving voice cloning, and cloud-to-on-device deployment from one API.

April 20, 2026•12 min read•Comparison

Deepgram Alternative: Why Developers Choose Gradium for Real-Time Voice AI

Gradium vs Deepgram comparison for real-time voice AI. Voice cloning (not available on Deepgram), semantic VAD, voice-agent-tuned TTS with published TTFA benchmark, and cloud-to-on-device deployment from one API.

April 20, 2026•14 min read•Comparison

ElevenLabs Alternative: Why Developers Choose Gradium for Real-Time Voice AI

Gradium vs ElevenLabs comparison for real-time voice AI. Voice-agent-tuned TTS with published TTFA benchmark, semantic VAD, accent-preserving voice cloning with highest Elo scores, and cloud-to-on-device deployment.

April 20, 2026•15 min read•Comparison

How to Build a Voice AI Agent with Gradium and LiveKit (Python Guide)

Learn how to build a full voice AI agent using Gradium STT and TTS with the LiveKit agent framework. Step-by-step Python guide covering AgentSession setup, VAD, interruptions, preemptive generation, tools, and deployment.

April 15, 2026•6 min read•Tutorial

How to Build an Audiobook Agent with Gradium and Pipecat: Step-by-Step Guide

Learn how to build a real-time story narrator with Gradium TTS and Pipecat. This step-by-step guide covers installation, pipeline setup, voice configuration, and deployment in about 100 lines of Python.

April 13, 2026•5 min read•Tutorial

How to Multiplex TTS Requests Over One WebSocket Connection in Gradium

Learn how to reuse a single WebSocket connection for multiple concurrent TTS requests in Gradium using multiplexing. Covers close_ws_on_eos, client_request_id, and how to route interleaved audio chunks correctly.

April 10, 2026•4 min read•Tutorial

What Is the Best Text-to-Speech API in 2026 to Build Voice Agents? Complete Developer Comparison

Best text-to-speech API 2026: Gradium achieves 258ms P50 TTFA (214ms with multiplexing) with expressive multilingual voices and robust pronunciation. Complete real-time TTS comparison for developers building voice agents.

April 9, 2026•25 min read•Comparison

How to Use json_config in Gradium: TTS and STT Parameters Explained

Learn how to use the json_config field in Gradium to control rewrite_rules, padding_bonus, temp, and cfg_coef for TTS, and language and delay_in_frames for STT. Full parameter reference with code examples.

April 9, 2026•5 min read•Tutorial

Instant vs Pro Voice Cloning in Gradium: When to Use Each

Not sure whether to use Instant or Pro Voice Cloning in Gradium? Learn the key differences, what each is designed for, how to prepare your audio for Pro cloning, and how to choose based on your use case.

April 9, 2026•4 min read•Tutorial

How to Use Pronunciation Dictionaries in Gradium TTS: Studio and API Guide

Learn how to use Pronunciation Dictionaries in Gradium to control how words are spoken and filter unwanted content. Step-by-step guide for Gradium Studio and the Python SDK.

April 9, 2026•4 min read•Tutorial

How to Handle TTS Edge Cases with Text Normalization in Gradium

Learn how to use Gradium's Text Normalization feature to handle edge cases in TTS. Configure rewrite_rules with language aliases or specific normalizers for dates, numbers, emails, URLs, phone numbers, and alphanumeric codes.

April 9, 2026•4 min read•Tutorial

Guides & Tutorials

Gradium Phonon: On-Device TTS Benchmarks in 2026

Gradium TTS in Pipecat: Setup and Integration Guide

Semantic VAD for Voice Agents: Turn Detection 2026

Why Generic TTS Fails on Regional Accents

How to Turn an LLM into a Voice Agent: Best Stack 2026

Best Voice AI API for Phone-Based Voice Agents in 2026

Top 3 Text-to-Speech Solutions in 2026: Ranked and Compared

Best TTS API in 2026: Quality, Latency, and Cost Compared

Cascaded Voice Agents vs Speech-to-Speech: Architecture Tradeoffs in 2026

STT API Benchmark 2026: Latency and Accuracy for Voice Agents

Turn-Taking in Voice Agents: Why Rule-Based VAD Is Broken and What Comes Next

Phonon Reaches 1.00% WER on Seed-TTS in May 2026 — Smallest On-Device TTS Model in the Comparison

Azure TTS Alternative: Gradium for Real-Time Voice AI

Best AI Voice Generators in 2026: APIs Ranked by Voice Quality, Latency, and Price

Best ElevenLabs Alternatives in 2026: Top TTS APIs Ranked by Voice Quality and Price

Best Speech APIs in 2026: TTS, STT Compared

Gradium Phonon: On-Device TTS for Mobile Apps, NPCs, and Offline Products

On-Device Text-to-Speech in 2026: When Edge TTS Is the Right Architecture

On-Device TTS Benchmark 2026: Phonon vs Kani-TTS2 vs NeuTTS on Seed-TTS

Best Low-Latency TTS APIs in 2026: TTFA, P99 and Pipeline Impact

Best Multilingual TTS APIs in 2026: Coverage, Quality, Code-Switching

Best Text-To-Speech APIs in 2026: Developer Guide

Best Voice Cloning APIs in 2026: Instant Cloning, Fine-Tuning, Benchmarks

Gradium vs ElevenLabs for Voice Agents: TTFA, WER and IQR Compared (2026 Coval Data)

TTS Latency Benchmark 2026: TTFA Compared Across Gradium, ElevenLabs, Cartesia and Deepgram

TTS WER Benchmark 2026: Word Error Rate Compared Across Gradium, ElevenLabs, Cartesia and Deepgram

Cartesia Alternative: Why Developers Choose Gradium for Real-Time Voice AI

Deepgram Alternative: Why Developers Choose Gradium for Real-Time Voice AI

ElevenLabs Alternative: Why Developers Choose Gradium for Real-Time Voice AI

How to Build a Voice AI Agent with Gradium and LiveKit (Python Guide)

How to Build an Audiobook Agent with Gradium and Pipecat: Step-by-Step Guide

How to Multiplex TTS Requests Over One WebSocket Connection in Gradium

What Is the Best Text-to-Speech API in 2026 to Build Voice Agents? Complete Developer Comparison

How to Use json_config in Gradium: TTS and STT Parameters Explained

Instant vs Pro Voice Cloning in Gradium: When to Use Each

How to Use Pronunciation Dictionaries in Gradium TTS: Studio and API Guide

How to Handle TTS Edge Cases with Text Normalization in Gradium