Gradium

Expressive Real-Time
Text-To-Speech

Power your AI agents

🇺🇸

Explore our API

CAPABILITIES

Full suite of voice AI models to power your agents

Text-to-Speech

Seamless real-time streaming with natural, expressive speech that masters complex pronunciations. Perfect text-audio synchronization through high-precision word-level timestamps.

TEXT TO SPEECH

Speech-to-Text

Best-in-class accuracy with controllable latency and robust performance in noisy environments. Semantic voice activitydetection that enables smart turn-taking, ensuring human responsiveness.

SPEECH TO TEXT

Voice Cloning

Instant voice cloning from just 10s of audio, or leverage Pro Voice Clones for a fine-tuned model indistinguishable originals. Highest market speaker similarity for truly authentic output.

VOICE CLONING

Built by the pioneers of voice AI

Gradium translates years of peer-reviewed work into production-grade APIs that handle the hard problems: latency, naturalness, and scale.

0:00 / 0:00

INFRASTRUCTURE

Infrastructure that enables scale for real-time applications

Built for systems where latency is a requirement, not an optimization. From first integration to production scale, behavior remains predictable.

API

WebSocket APIs designed for streaming. Built for bidirectional, real-time communication.

SDKs & Integrations

Clients in Python and Rust. Integration in all major agent frameworks, including Livekit and Pipecat.

Security & Compliance

Private cloud options available for on prem deployments. Enterprise plans including zero data retention.

Reliability

Stable latency in production up to high concurrency limits. SLA for enterprise plans.

Native fluency across languages

One voice, five languages with consistent pronunciation and prosody across languages. Seamless mid-sentence code-switching without latency.

PRICING

Predictable and scalable pricing

1 character of TTS = 1 credit and 1s of STT = 3 credits

Monthly

Yearly Save 1 month

Free

/month

XS

$13

/month

POPULAR

S

$43

/month

M

$340

/month

L

$1,615

/month

Tailored

Custom

Credits

45k

225k

900k

45M

Unlimited

Hours of TTS

~1hr

~5hrs

~20hrs

~200hrs

~1000hrs

Unlimited

Hours of STT

3hrs

13hrs

50hrs

500hrs

2500hrs

Unlimited

Studio access	Yes	Yes	Yes	Yes	Yes	Yes
API access	Yes	Yes	Yes	Yes	Yes	Yes
Max concurrency	2	5	5	10	15	Custom
Commercial use	No	Yes	Yes	Yes	Yes	Yes
Instant voice clone	5	1000	1000	1000	1000	Unlimited
Pro voice clone	0	0	0	5	20	Unlimited
Price per additional 100k credits	No	$6.9	$5.0	$4.0	$3.8	Custom

Free

$0/month

Credits	45k
Hours of TTS	~1hr
Hours of STT	3hrs
Studio Access	Yes
API Access	Yes
Max concurrency	2
Commercial use	No
Instant voice clone	5
Pay as you go	No

XS

$13/month

Credits	225k
Hours of TTS	~5hrs
Hours of STT	13hrs
Studio Access	Yes
API Access	Yes
Max concurrency	5
Commercial use	Yes
Instant voice clone	1000
Pay as you go	$6.9

POPULAR

S

$43/month

Credits	900k
Hours of TTS	~20hrs
Hours of STT	50hrs
Studio Access	Yes
API Access	Yes
Max concurrency	5
Commercial use	Yes
Instant voice clone	1000
Pay as you go	$5.0

M

$340/month

Credits	9M
Hours of TTS	~200hrs
Hours of STT	500hrs
Studio Access	Yes
API Access	Yes
Max concurrency	10
Commercial use	Yes
Instant voice clone	1000
Pay as you go	$4.0

L

$1,615/month

Credits	45M
Hours of TTS	~1000hrs
Hours of STT	2500hrs
Studio Access	Yes
API Access	Yes
Max concurrency	15
Commercial use	Yes
Instant voice clone	1000
Pay as you go	$3.8

Tailored

Custom

Credits	Unlimited
Hours of TTS	Unlimited
Hours of STT	Unlimited
Studio Access	Yes
API Access	Yes
Max concurrency	Custom
Commercial use	Yes
Instant voice clone	Unlimited
Pay as you go	Custom

Start building with our models

Expressive Real-Time
Text-To-Speech

Power your AI agents

Agent Demo

🇺🇸

Explore our API

CAPABILITIES

Full suite of voice AI models to power your agents

Text-to-Speech

Seamless real-time streaming with natural, expressive speech that masters complex pronunciations. Perfect text-audio synchronization through high-precision word-level timestamps.

TEXT TO SPEECH

Speech-to-Text

Best-in-class accuracy with controllable latency and robust performance in noisy environments. Semantic voice activitydetection that enables smart turn-taking, ensuring human responsiveness.

SPEECH TO TEXT

Voice Cloning

Instant voice cloning from just 10s of audio, or leverage Pro Voice Clones for a fine-tuned model indistinguishable originals. Highest market speaker similarity for truly authentic output.

VOICE CLONING

Built by the pioneers of voice AI

Gradium translates years of peer-reviewed work into production-grade APIs that handle the hard problems: latency, naturalness, and scale.

0:00 / 0:00

INFRASTRUCTURE

Infrastructure that enables scale for real-time applications

Built for systems where latency is a requirement, not an optimization. From first integration to production scale, behavior remains predictable.

API

WebSocket APIs designed for streaming. Built for bidirectional, real-time communication.

SDKs & Integrations

Clients in Python and Rust. Integration in all major agent frameworks, including Livekit and Pipecat.

Security & Compliance

Private cloud options available for on prem deployments. Enterprise plans including zero data retention.

Reliability

Stable latency in production up to high concurrency limits. SLA for enterprise plans.

Native fluency across languages

One voice, five languages with consistent pronunciation and prosody across languages. Seamless mid-sentence code-switching without latency.

PRICING

Predictable and scalable pricing

1 character of TTS = 1 credit and 1s of STT = 3 credits

Monthly

Yearly Save 1 month

Free

/month

XS

$13

/month

POPULAR

S

$43

/month

M

$340

/month

L

$1,615

/month

Tailored

Custom

Credits

45k

225k

900k

45M

Unlimited

Hours of TTS

~1hr

~5hrs

~20hrs

~200hrs

~1000hrs

Unlimited

Hours of STT

3hrs

13hrs

50hrs

500hrs

2500hrs

Unlimited

Studio access	Yes	Yes	Yes	Yes	Yes	Yes
API access	Yes	Yes	Yes	Yes	Yes	Yes
Max concurrency	2	5	5	10	15	Custom
Commercial use	No	Yes	Yes	Yes	Yes	Yes
Instant voice clone	5	1000	1000	1000	1000	Unlimited
Pro voice clone	0	0	0	5	20	Unlimited
Price per additional 100k credits	No	$6.9	$5.0	$4.0	$3.8	Custom

Free

$0/month

Credits	45k
Hours of TTS	~1hr
Hours of STT	3hrs
Studio Access	Yes
API Access	Yes
Max concurrency	2
Commercial use	No
Instant voice clone	5
Pay as you go	No

XS

$13/month

Credits	225k
Hours of TTS	~5hrs
Hours of STT	13hrs
Studio Access	Yes
API Access	Yes
Max concurrency	5
Commercial use	Yes
Instant voice clone	1000
Pay as you go	$6.9

POPULAR

S

$43/month

Credits	900k
Hours of TTS	~20hrs
Hours of STT	50hrs
Studio Access	Yes
API Access	Yes
Max concurrency	5
Commercial use	Yes
Instant voice clone	1000
Pay as you go	$5.0

M

$340/month

Credits	9M
Hours of TTS	~200hrs
Hours of STT	500hrs
Studio Access	Yes
API Access	Yes
Max concurrency	10
Commercial use	Yes
Instant voice clone	1000
Pay as you go	$4.0

L

$1,615/month

Credits	45M
Hours of TTS	~1000hrs
Hours of STT	2500hrs
Studio Access	Yes
API Access	Yes
Max concurrency	15
Commercial use	Yes
Instant voice clone	1000
Pay as you go	$3.8

Tailored

Custom

Credits	Unlimited
Hours of TTS	Unlimited
Hours of STT	Unlimited
Studio Access	Yes
API Access	Yes
Max concurrency	Custom
Commercial use	Yes
Instant voice clone	Unlimited
Pay as you go	Custom

Expressive Real-TimeText-To-Speech

Power your AI agents

CAPABILITIES

Full suite of voice AI models to power your agents

Text-to-Speech

Speech-to-Text

Voice Cloning

Built by the pioneers of voice AI

Gradium translates years of peer-reviewed work into production-grade APIs that handle the hard problems: latency, naturalness, and scale.

INFRASTRUCTURE

Infrastructure that enables scale for real-time applications

API

SDKs & Integrations

Security & Compliance

Reliability

Native fluency across languages

PRICING

Predictable and scalable pricing

Free

XS

S

M

L

Tailored

Free

XS

S

M

L

Tailored

Start building with our models

Expressive Real-TimeText-To-Speech

Power your AI agents

CAPABILITIES

Full suite of voice AI models to power your agents

Text-to-Speech

Speech-to-Text

Voice Cloning

Built by the pioneers of voice AI

Gradium translates years of peer-reviewed work into production-grade APIs that handle the hard problems: latency, naturalness, and scale.

INFRASTRUCTURE

Infrastructure that enables scale for real-time applications

API

SDKs & Integrations

Security & Compliance

Reliability

Native fluency across languages

PRICING

Predictable and scalable pricing

Free

XS

S

M

L

Tailored

Free

XS

S

M

L

Tailored

Start building with our models

Expressive Real-Time
Text-To-Speech

Expressive Real-Time
Text-To-Speech