Text-to-Speech
TEXT TO SPEECH
TEXT TO SPEECH
SPEECH TO TEXT
Built for systems where latency is a requirement, not an optimization. From first integration to production scale, behavior remains predictable.
WebSocket APIs designed for streaming. Built for bidirectional, real-time communication.
Clients in Python and Rust. Integration in all major agent frameworks, including Livekit and Pipecat.
Private cloud options available for on prem deployments. Enterprise plans including zero data retention.
Stable latency in production up to high concurrency limits. SLA for enterprise plans.
One voice, five languages with consistent pronunciation and prosody across languages. Seamless mid-sentence code-switching without latency.

1 character of TTS = 1 credit and 1s of STT = 3 credits
| Credits | 45k |
| Hours of TTS | ~1hr |
| Hours of STT | 3hrs |
| Studio Access | Yes |
| API Access | Yes |
| Max concurrency | 2 |
| Commercial use | No |
| Instant voice clone | 5 |
| Pay as you go | No |
| Credits | 225k |
| Hours of TTS | ~5hrs |
| Hours of STT | 13hrs |
| Studio Access | Yes |
| API Access | Yes |
| Max concurrency | 5 |
| Commercial use | Yes |
| Instant voice clone | 1000 |
| Pay as you go | $6.9 |
| Credits | 900k |
| Hours of TTS | ~20hrs |
| Hours of STT | 50hrs |
| Studio Access | Yes |
| API Access | Yes |
| Max concurrency | 5 |
| Commercial use | Yes |
| Instant voice clone | 1000 |
| Pay as you go | $5.0 |
| Credits | 9M |
| Hours of TTS | ~200hrs |
| Hours of STT | 500hrs |
| Studio Access | Yes |
| API Access | Yes |
| Max concurrency | 10 |
| Commercial use | Yes |
| Instant voice clone | 1000 |
| Pay as you go | $4.0 |
| Credits | 45M |
| Hours of TTS | ~1000hrs |
| Hours of STT | 2500hrs |
| Studio Access | Yes |
| API Access | Yes |
| Max concurrency | 15 |
| Commercial use | Yes |
| Instant voice clone | 1000 |
| Pay as you go | $3.8 |