Cartesia Pricing at a Glance (May 2026)
Cartesia charges 1 credit per character for TTS, with plans ranging from free (20,000 credits) to Scale ($299/month for 8 million credits). The effective cost works out to roughly $5–$37 per million characters depending on your plan — cheaper than ElevenLabs but more expensive than Gemini Flash or Grok. You're paying for Sonic 3's 40ms time-to-first-audio — the fastest commercial TTS available.
| Plan | Price/Month | Credits | ≈ Cost/1M Chars | Concurrency |
|---|---|---|---|---|
| Free | $0 | 20,000 | $0 (no commercial) | 1 agent |
| Pro | $5 | 100,000 | ~$50 | 3 agents |
| Startup | $49 | 1,250,000 | ~$39 | 5 agents |
| Scale | $299 | 8,000,000 | ~$37 | 10 agents |
| Enterprise | Custom | Custom | Negotiable | Custom |
The Credit Math
1 credit = 1 character of TTS output. A 1,000-word blog post is roughly 5,000 characters = 5,000 credits. The Pro plan's 100,000 credits gets you about 20 blog posts per month. If you use Pro Voice Cloning, the rate jumps to 1.5 credits per character — your budget shrinks by a third.
What Each Plan Actually Includes
Free Tier — Good Enough to Test
20,000 credits gets you about 4 blog posts' worth of audio. You get access to all 42 languages, full emotion controls, and Sonic 3's signature 40ms latency. No credit card required. The limit: 1 concurrent agent, no commercial use rights, and no voice cloning. But for evaluating whether Cartesia's speed is worth the premium, it's plenty.
Pro ($5/month) — Instant Voice Cloning Unlocked
The Pro plan's main draw is Instant Voice Cloning from a 3-second audio sample. Upload a short clip and Cartesia replicates the voice characteristics across all 42 languages. At $5/month with 100,000 credits, this is the cheapest voice cloning on the market — ElevenLabs charges $5/month too, but gives you only 30,000 characters. You get 3.3x more on Cartesia.
Concurrency jumps to 3 agents — enough for a single voice agent application handling a few concurrent callers.
Startup ($49/month) — Pro Voice Cloning + Scale
At 1.25 million credits, the Startup plan is where Cartesia gets cost-competitive with pay-per-use APIs. The per-character rate drops to about $0.039/1K characters. You also unlock Pro Voice Cloning — a higher-fidelity cloning process that produces more accurate voice replicas. Concurrency goes to 5 agents.
This is the sweet spot for voice agent startups running 5-10 concurrent conversations. The 40ms latency is Cartesia's genuine competitive moat here — your voice agent responds before the user notices a gap.
Scale ($299/month) — Enterprise Volume
8 million credits at $299/month works out to ~$37/1M characters — still more expensive than OpenAI tts-1 at $15/1M or Polly Neural at $16/1M. You're paying a latency premium. 10 concurrent agents, priority support, and elevated rate limits.
Hidden Costs and Gotchas
Cartesia's pricing looks straightforward on the surface, but a few things catch developers off guard:
- Pro Voice Cloning costs 1.5x — the standard rate is 1 credit per character, but Pro Voice Cloning uses 1.5 credits per character. A 1M character job that costs $37 on a standard voice costs $55.50 with Pro cloning. There's also a one-time training fee (amount not publicly listed).
- No rollover — unused credits expire at the end of your billing cycle. If you buy the Startup plan and only use 500K of your 1.25M credits, you lose the difference. Plan for your actual usage, not your optimistic projection.
- API-only — there's no web studio, no drag-and-drop editor, no consumer app. You need to write code to use Cartesia. For non-developers, Murf AI or Canva are better options.
- Phone connection fees — if you're using Cartesia Line (their voice agent product), phone connections cost $0.014 per minute on top of TTS credits. At 10,000 calls/day averaging 3 minutes, that's $420/month in connection fees alone.
- Concurrency limits are real — the Free plan limits you to 1 concurrent agent. If two users trigger TTS simultaneously, one queues. This matters for production voice agents; you'll likely need the Startup plan ($49/mo, 5 agents) minimum.
Real-World Cost Examples
Cartesia serves two distinct audiences: voice agent builders and content creators. Here's what it actually costs for each:
| Use Case | Monthly Volume | Cartesia Plan | Monthly Cost | OpenAI tts-1 Cost |
|---|---|---|---|---|
| Blog narration (5 posts) | ~25,000 chars | Pro ($5) | $5 | $0.38 |
| Voice agent (500 calls/day) | ~750,000 chars | Startup ($49) | $49 | $11.25 |
| E-learning platform | ~2,000,000 chars | Scale ($299) | $299 | $30.00 |
| Customer support bot (2K calls/day) | ~3,000,000 chars | Scale ($299) | $299 | $45.00 |
| Full audiobook (80K words) | ~400,000 chars | Startup ($49) | $49 | $6.00 |
The pattern is clear: Cartesia is 3-10x more expensive than OpenAI tts-1 for raw character-to-audio conversion. But that misses the point. You don't choose Cartesia for price — you choose it for the 40ms latency that makes voice agents feel instantaneous. If latency doesn't matter for your use case (pre-rendered content, podcasts, audiobooks), use OpenAI or Amazon Polly instead.
When Cartesia Actually Wins on Cost
Cartesia isn't the cheapest TTS, but in two scenarios it can save you money:
- Voice agents where latency = revenue. If your AI phone agent takes 500ms to respond (ElevenLabs Turbo: ~300ms), callers hang up or talk over it. Cartesia's 40ms response means fewer abandoned calls. A 5% improvement in call completion on a 2,000-call/day operation easily justifies the $299/month premium over a $45/month OpenAI bill.
- Voice cloning at entry level. Cartesia Pro at $5/month gives you 100K characters with Instant Voice Cloning. ElevenLabs Starter at $5/month gives 30K characters with cloning. If voice cloning is your primary need and you don't need ElevenLabs' studio UI, Cartesia is 3.3x better value.
For a full breakdown of when Cartesia beats ElevenLabs (and vice versa) across speed, quality, features, and pricing, see our Cartesia vs ElevenLabs comparison.
Cartesia vs 9 Competitors: Price per Million Characters
| Service | Cost/1M Chars | Latency (TTFA) | Arena Rank | Voice Cloning |
|---|---|---|---|---|
| Grok TTS (beta) | $4.20 | ~400ms | Not ranked | No |
| Gemini Flash TTS | ~$12 | ~250ms | #2 (ELO 1,211) | No |
| OpenAI tts-1 | $15 | ~300ms | Not ranked | No |
| Amazon Polly Neural | $16 | ~350ms | Not ranked | No |
| Inworld TTS Max | $10–$50 | ~250ms | #1 (ELO 1,236) | Yes |
| Cartesia Sonic 3 | ~$37 (Scale) | 40ms (Turbo) | #10 (ELO 1,054) | Yes (3s sample) |
| Fish Audio S2 Pro | ~$15 | ~150ms | #1 blind tests (ELO 1,128) | Yes (10-30s, cross-lingual) |
| Dia TTS (Nari Labs) | ~$40 (fal.ai) | ~500ms | Not ranked | No (multi-speaker) |
| ElevenLabs Flash | $60 | ~300ms | #4 (ELO 1,179) | Yes |
| Murf AI Falcon | $10–$30 | ~130ms | Not ranked | Yes (Business) |
| Chatterbox Turbo | Free (self-host) | Varies (GPU) | Not ranked | Yes (free) |
The latency column tells the story. Cartesia is 5-8x faster than every competitor except Murf Falcon (130ms). If your application is latency-sensitive — voice agents, game NPCs, real-time accessibility — Cartesia's premium is justified. For everything else, you're overpaying.
Cartesia Startup Grant: Free Credits for Early-Stage Companies
Cartesia offers a startup grant program with free API credits for qualifying companies. If you're building a voice agent startup with less than $5M in funding, you can apply for the grant through their website. The exact credit amount varies, but grants typically provide enough to prototype and launch a pilot. They also have a Google for Startups partnership that bundles additional cloud credits.
Worth applying if you're pre-seed or seed stage. The worst they say is no, and the credits can cover your first few months of development.
Who Should (and Shouldn't) Pay Cartesia's Premium
Worth the premium for
- Voice agents where 40ms latency = competitive advantage
- Real-time gaming or interactive experiences
- Budget voice cloning ($5/mo for 100K chars)
- Startups that qualify for the grant program
- Multilingual apps (42 languages including 9 Indian)
Not worth it for
- Pre-rendered content (latency doesn't matter — use OpenAI or Polly)
- Top-tier voice quality (Arena #10 vs Inworld #1)
- Non-developers (API-only, no studio UI)
- Audiobooks (speed premium wasted on pre-rendered audio)
- Tight budgets (3-10x more expensive per character than alternatives)
Related Pricing Guides
Use our TTS cost calculator to compare Cartesia against 11 alternatives for your specific volume. For more details on Cartesia's voice quality, architecture, and limitations, read our full Cartesia AI review.
Building a voice agent and not sure which TTS to pick? Check our best text-to-speech comparison for a full feature breakdown across all providers.