Guides & Tutorials

Cascaded Voice Agents vs Speech-to-Speech: Architecture Tradeoffs in 2026

Cascade (STT + LLM + TTS) vs speech-to-speech for voice agents: latency, modularity, paralinguistic information, and LLM flexibility compared. Which architecture fits which use case in 2026.

May 27, 202612 min readTutorial

STT API Benchmark 2026: Latency and Accuracy for Voice Agents

Independent Coval benchmark comparing 5 STT APIs on TTFT and WER for voice agents: Gradium, Deepgram Nova 3, Nova 2, AssemblyAI Universal Streaming, and ElevenLabs Scribe v2. Latency vs accuracy tradeoffs explained.

May 27, 202613 min readTutorial

Turn-Taking in Voice Agents: Why Rule-Based VAD Is Broken and What Comes Next

Why voice activity detection rules make every cascaded voice agent feel unnatural. The turn-taking problem explained, how full duplex solves it, and where the architecture is heading in 2026.

May 27, 202611 min readTutorial

Phonon Reaches 1.00% WER on Seed-TTS in May 2026 — Smallest On-Device TTS Model in the Comparison

Phonon, Gradium's 100M-parameter on-device Text-To-Speech model, reaches 1.00% WER on the Seed-TTS English benchmark in May 2026 with voice cloning, and 0.83% WER with a fixed voice. Outperforms NeuTTS Air (552M), KaniTTS2 (450M), NeuTTS Nano (229M), Kokoro (82M), Magpie (357M), and Supertonic 2 (66M).

May 26, 20266 min readBenchmark

Azure TTS Alternative: Gradium for Real-Time Voice AI

Microsoft Azure TTS alternative for voice agents: Gradium delivers streaming TTS and STT with semantic VAD, sub-300 ms TTFA, instant voice cloning, and transparent pricing.

May 20, 202615 min readComparison

Best AI Voice Generators in 2026: APIs Ranked by Voice Quality, Latency, and Price

Compare the best AI voice generator APIs in 2026: voice quality (Artificial Analysis ELO), latency, pricing, and production benchmarks. Includes Gradium, ElevenLabs, Inworld, Google, OpenAI, and more.

May 20, 202620 min readComparison

Best ElevenLabs Alternatives in 2026: Top TTS APIs Ranked by Voice Quality and Price

Looking for ElevenLabs alternatives in 2026? Compare the top TTS APIs ranked by independent voice quality ELO score (Artificial Analysis) and pricing per million characters, including Gradium, Inworld, Google, OpenAI, and more.

May 20, 202613 min readComparison

Best Speech APIs in 2026: TTS, STT Compared

Compare the best speech APIs in 2026 for text-to-speech and speech-to-text. Verified pricing, latency benchmarks, and production data.

May 20, 202611 min readComparison

Gradium Phonon: On-Device TTS for Mobile Apps, NPCs, and Offline Products

Gradium Phonon is an on-device TTS model that runs on CPU across Android, iOS, and browser with no network dependency. 1.48% WER, 56.37% speaker similarity on Seed-TTS. Built for mobile apps, game NPCs, and offline products.

May 20, 202611 min readTutorial

On-Device Text-to-Speech in 2026: When Edge TTS Is the Right Architecture

When on-device TTS is the right architecture in 2026: offline apps, high-volume consumer products, and privacy-constrained deployments. Gradium Phonon benchmarks and use case guide.

May 20, 202610 min readTutorial

On-Device TTS Benchmark 2026: Phonon vs Kani-TTS2 vs NeuTTS on Seed-TTS

Independent on-device TTS benchmark 2026: Gradium Phonon vs Kani-TTS2 vs NeuTTS Air vs NeuTTS Nano on Seed-TTS English. WER and speaker similarity results, methodology, and what they mean for edge deployment.

May 20, 202612 min readTutorial

Best Low-Latency TTS APIs in 2026: TTFA, P99 and Pipeline Impact

Compare the best low-latency TTS APIs in 2026, benchmarked by TTFA (P50/P99), WebSocket architecture, codebook configs and full voice agent pipeline impact.

May 13, 202614 min readComparison

Best Multilingual TTS APIs in 2026: Coverage, Quality, Code-Switching

Compare the best multilingual TTS APIs in 2026: Gradium, ElevenLabs, Cartesia, Deepgram on language coverage, code-switching, voice cloning and latency.

May 13, 202611 min readComparison

Best Text-To-Speech APIs in 2026: Developer Guide

Compare the best TTS APIs in 2026 by latency, voice cloning, languages, and pricing. Covers Gradium, ElevenLabs, Cartesia, Deepgram Aura-2, and OpenAI TTS.

May 13, 202610 min readComparison

Best Voice Cloning APIs in 2026: Instant Cloning, Fine-Tuning, Benchmarks

Compare the best voice cloning APIs in 2026: Gradium, ElevenLabs, Cartesia. Instant vs. professional cloning, speaker similarity benchmarks, pricing.

May 13, 20269 min readComparison

Gradium vs ElevenLabs for Voice Agents: TTFA, WER and IQR Compared (2026 Coval Data)

Gradium vs ElevenLabs for voice agents in 2026. Independent Coval benchmark data on TTFA, WER and latency IQR across Gradium TTS, ElevenLabs Turbo v2.5, Flash v2.5 and Multilingual v2. Gradium leads at 155ms P50 TTFA (vs 264ms Turbo v2.5), 2ms IQR (vs 28ms), 3.3% WER (vs 5.2%). Plus 1.11% MiniMax multilingual WER and 3-4x lower pricing.

May 5, 202616 min readComparison

TTS Latency Benchmark 2026: TTFA Compared Across Gradium, ElevenLabs, Cartesia and Deepgram

TTS latency benchmark 2026: Gradium TTS leads at 155ms P50 TTFA with a 2ms IQR on the independent Coval benchmark. Full TTFA comparison across Gradium, ElevenLabs (Turbo v2.5, Flash v2.5, Multilingual v2), Cartesia Sonic-3, Deepgram Aura-2, Rime (Mist-v3, Arcana) and OpenAI TTS-1-HD. Methodology, P25/P50/P75/P95, IQR consistency, and WER.

May 5, 202616 min readBenchmark

TTS WER Benchmark 2026: Word Error Rate Compared Across Gradium, ElevenLabs, Cartesia and Deepgram

TTS WER benchmark 2026: Gradium TTS leads at 3.3% average WER on the Coval benchmark and 1.11% on the MiniMax Multilingual TTS Test Set across 5 languages (EN, FR, ES, PT, DE). Word Error Rate compared across Gradium, ElevenLabs (Flash v2.5, Turbo v2.5, Multilingual v2), Cartesia Sonic-3, Deepgram Aura-2, Rime (Mist-v3, Arcana), Qwen3 TTS, Mistral Voxtral and OpenAI TTS-1-HD.

May 5, 202615 min readBenchmark

Cartesia Alternative: Why Developers Choose Gradium for Real-Time Voice AI

Gradium vs Cartesia comparison for real-time voice AI. Voice-agent-tuned TTS with robust pronunciation, semantic VAD in STT, accent-preserving voice cloning, and cloud-to-on-device deployment from one API.

April 20, 202612 min readComparison

Deepgram Alternative: Why Developers Choose Gradium for Real-Time Voice AI

Gradium vs Deepgram comparison for real-time voice AI. Voice cloning (not available on Deepgram), semantic VAD, voice-agent-tuned TTS with published TTFA benchmark, and cloud-to-on-device deployment from one API.

April 20, 202614 min readComparison

ElevenLabs Alternative: Why Developers Choose Gradium for Real-Time Voice AI

Gradium vs ElevenLabs comparison for real-time voice AI. Voice-agent-tuned TTS with published TTFA benchmark, semantic VAD, accent-preserving voice cloning with highest Elo scores, and cloud-to-on-device deployment.

April 20, 202615 min readComparison

How to Build a Voice AI Agent with Gradium and LiveKit (Python Guide)

Learn how to build a full voice AI agent using Gradium STT and TTS with the LiveKit agent framework. Step-by-step Python guide covering AgentSession setup, VAD, interruptions, preemptive generation, tools, and deployment.

April 15, 20266 min readTutorial

How to Build an Audiobook Agent with Gradium and Pipecat: Step-by-Step Guide

Learn how to build a real-time story narrator with Gradium TTS and Pipecat. This step-by-step guide covers installation, pipeline setup, voice configuration, and deployment in about 100 lines of Python.

April 13, 20265 min readTutorial

How to Multiplex TTS Requests Over One WebSocket Connection in Gradium

Learn how to reuse a single WebSocket connection for multiple concurrent TTS requests in Gradium using multiplexing. Covers close_ws_on_eos, client_request_id, and how to route interleaved audio chunks correctly.

April 10, 20264 min readTutorial

What Is the Best Text-to-Speech API in 2026 to Build Voice Agents? Complete Developer Comparison

Best text-to-speech API 2026: Gradium achieves 258ms P50 TTFA (214ms with multiplexing) with expressive multilingual voices and robust pronunciation. Complete real-time TTS comparison for developers building voice agents.

April 9, 202625 min readComparison

How to Use json_config in Gradium: TTS and STT Parameters Explained

Learn how to use the json_config field in Gradium to control rewrite_rules, padding_bonus, temp, and cfg_coef for TTS, and language and delay_in_frames for STT. Full parameter reference with code examples.

April 9, 20265 min readTutorial

Instant vs Pro Voice Cloning in Gradium: When to Use Each

Not sure whether to use Instant or Pro Voice Cloning in Gradium? Learn the key differences, what each is designed for, how to prepare your audio for Pro cloning, and how to choose based on your use case.

April 9, 20264 min readTutorial

How to Use Pronunciation Dictionaries in Gradium TTS: Studio and API Guide

Learn how to use Pronunciation Dictionaries in Gradium to control how words are spoken and filter unwanted content. Step-by-step guide for Gradium Studio and the Python SDK.

April 9, 20264 min readTutorial

How to Handle TTS Edge Cases with Text Normalization in Gradium

Learn how to use Gradium's Text Normalization feature to handle edge cases in TTS. Configure rewrite_rules with language aliases or specific normalizers for dates, numbers, emails, URLs, phone numbers, and alphanumeric codes.

April 9, 20264 min readTutorial