Logo
Logo

Gradium develops audio language models designed to deliver natural, expressive, ultra-low latency voice interactions at scale and capable of performing any voice task.

Product

  • Models
  • Pricing
  • API Documentation

About us

  • Who we are
  • Blog
  • Careers
  • Contact
  • Press

Socials

  • X
  • GitHub
  • LinkedIn
  • Discord
Background Logo

© 2026 Gradium. All rights reserved.

Terms of ServicePrivacy Policy

Expressive Real-Time
Text-To-Speech

Power your AI agents

Agent Demo
🇺🇸

Explore our API

CAPABILITIES

Full suite of voice AI models to power your agents

Text-to-Speech

Seamless real-time streaming with natural, expressive speech that masters complex pronunciations. Perfect text-audio synchronization through high-precision word-level timestamps.

TEXT TO SPEECH

Speech-to-Text

Best-in-class accuracy with controllable latency and robust performance in noisy environments. Semantic voice activitydetection that enables smart turn-taking, ensuring human responsiveness.

SPEECH TO TEXT

Voice Cloning

Instant voice cloning from just 10s of audio, or leverage Pro Voice Clones for a fine-tuned model indistinguishable originals. Highest market speaker similarity for truly authentic output.

VOICE CLONING

Built by the pioneers of voice AI

Gradium translates years of peer-reviewed work into production-grade APIs that handle the hard problems: latency, naturalness, and scale.

0:00 / 0:00

INFRASTRUCTURE

Infrastructure that enables scale for real-time applications

Built for systems where latency is a requirement, not an optimization. From first integration to production scale, behavior remains predictable.

API

WebSocket APIs designed for streaming. Built for bidirectional, real-time communication.

SDKs & Integrations

Clients in Python and Rust. Integration in all major agent frameworks, including Livekit and Pipecat.

Security & Compliance

Private cloud options available for on prem deployments. Enterprise plans including zero data retention.

Reliability

Stable latency in production up to high concurrency limits. SLA for enterprise plans.

Native fluency across languages

One voice, five languages with consistent pronunciation and prosody across languages. Seamless mid-sentence code-switching without latency.

Multilingual voice

PRICING

Predictable and scalable pricing

1 character of TTS = 1 credit and 1s of STT = 3 credits

Monthly
Yearly Save 1 month

Free

$0
/month

XS

$13
/month
POPULAR

S

$43
/month

M

$340
/month

L

$1,615
/month

Tailored

Custom
Credits45k225k900k9M45MUnlimited
Hours of TTS~1hr~5hrs~20hrs~200hrs~1000hrsUnlimited
Hours of STT3hrs13hrs50hrs500hrs2500hrsUnlimited
Studio access
Yes
Yes
Yes
Yes
Yes
Yes
API access
Yes
Yes
Yes
Yes
Yes
Yes
Max concurrency2551015Custom
Commercial useNo
Yes
Yes
Yes
Yes
Yes
Instant voice clone51000100010001000Unlimited
Pro voice clone000520Unlimited
Price per additional 100k creditsNo$6.9$5.0$4.0$3.8Custom

Free

$0/month
Credits45k
Hours of TTS~1hr
Hours of STT3hrs
Studio Access
Yes
API Access
Yes
Max concurrency2
Commercial useNo
Instant voice clone5
Pay as you goNo

XS

$13/month
Credits225k
Hours of TTS~5hrs
Hours of STT13hrs
Studio Access
Yes
API Access
Yes
Max concurrency5
Commercial use
Yes
Instant voice clone1000
Pay as you go$6.9
POPULAR

S

$43/month
Credits900k
Hours of TTS~20hrs
Hours of STT50hrs
Studio Access
Yes
API Access
Yes
Max concurrency5
Commercial use
Yes
Instant voice clone1000
Pay as you go$5.0

M

$340/month
Credits9M
Hours of TTS~200hrs
Hours of STT500hrs
Studio Access
Yes
API Access
Yes
Max concurrency10
Commercial use
Yes
Instant voice clone1000
Pay as you go$4.0

L

$1,615/month
Credits45M
Hours of TTS~1000hrs
Hours of STT2500hrs
Studio Access
Yes
API Access
Yes
Max concurrency15
Commercial use
Yes
Instant voice clone1000
Pay as you go$3.8

Tailored

Custom
CreditsUnlimited
Hours of TTSUnlimited
Hours of STTUnlimited
Studio Access
Yes
API Access
Yes
Max concurrencyCustom
Commercial use
Yes
Instant voice cloneUnlimited
Pay as you goCustom

Start building with our models