The Quick Numbers
Deepgram Aura-2 TTS costs $0.030 per 1,000 characters on pay-as-you-go, dropping to $0.027/1K on the Growth plan. That's $30/1M characters — half the price of ElevenLabs Flash ($60/1M) and roughly equal to Cartesia's effective rate. The Voice Agent API bundles STT + TTS + orchestration at $4.50/hour. New accounts get $200 in free credits — enough for ~6.67 million TTS characters or ~220 hours of audio.
Deepgram is primarily a speech-to-text company — their Nova-3 STT model is one of the best in the market. TTS (Aura-2) is the newer product, launched to serve the growing voice agent market where teams need STT and TTS from a single provider. If you're already using Deepgram for transcription, adding TTS is a natural extension. If you only need TTS, the quality gap with ElevenLabs and Fish Audio matters.
Deepgram TTS at a Glance
Deepgram Plans: Pay-As-You-Go vs Growth vs Enterprise
| Feature | Pay-As-You-Go | Growth | Enterprise |
|---|---|---|---|
| TTS Rate | $0.030/1K chars | $0.027/1K chars | Custom |
| STT Rate (Nova-3) | $0.0043/min | $0.0036/min | Custom |
| Commitment | None | $4,000+ prepay | Annual contract |
| Discount | — | ~10% off PAYG | Negotiated |
| Free Credits | $200 | $200 | Trial period |
| Self-Hosting | No | No | Available |
| Support | Community | Priority | Dedicated + SLA |
The Growth plan requires a $4,000+ prepaid commitment but saves roughly 10% on all API calls. For most startups and mid-market teams, the pay-as-you-go plan is the right starting point — the $200 free credit is generous enough to evaluate thoroughly before committing.
Aura-2 TTS: What You Get
Aura-2 launched in late 2025 as an enterprise-grade TTS model built on Deepgram's speech infrastructure. The pitch: low latency, consistent quality at scale, and tight integration with Deepgram's industry-leading STT. It's designed for voice agents and real-time applications, not content creation.
- 40+ English voices with localized accents (US, UK, Australian, Indian) across multiple styles and demographics
- 10+ Spanish voices with regional accents
- 7 languages total: English, Spanish, Dutch, French, German, Italian, Japanese
- Sub-200ms baseline latency, with optimized performance reaching 90ms TTFB — fast enough for natural conversational flow
- Domain-specific pronunciation for healthcare, finance, legal, and technical terminology
- Uniform pricing — all 40+ voices at a single rate, no tiered voice pricing
The honest limitation: Aura-2's voice quality doesn't compete with ElevenLabs v3 (#4 on Speech Arena) or Fish Audio S2 Pro (#1 in blind tests). It's adequate for automated phone systems, chatbot responses, and voice agents — the kind of short, functional utterances where latency and reliability matter more than emotional nuance. For audiobooks, YouTube voiceovers, or any content where listeners actively judge voice quality, you'll want a different provider.
Voice Agent API: The $4.50/Hour Bundle
Deepgram's most distinctive offering is the Voice Agent API — a bundled STT + TTS + orchestration pipeline at $4.50/hour. This eliminates the complexity of stitching together separate STT and TTS providers and removes LLM pass-through fees from the equation.
The bundle matters because voice agent costs normally stack up fast. A typical three-provider setup (separate STT, LLM, TTS) can run $8–$15/hour at production scale. Deepgram collapses the STT and TTS portion into a single $4.50/hr rate, and since both run on the same streaming infrastructure, you get fewer handoffs, lower latency, and consistent pronunciation between what the agent hears and what it says.
Voice Agent Cost Example
A customer service bot handling 500 calls/day at an average of 4 minutes/call = ~33 hours/day. At $4.50/hr, that's about $150/day or $4,500/month for the STT + TTS layer. For comparison, building the same stack with ElevenLabs TTS alone (Flash API) would cost roughly $180/day at $60/1M characters — and that's just the TTS portion, without STT. See our Inworld pricing guide for another voice agent cost analysis.
$200 Free Credit: What It Actually Buys You
Deepgram's $200 free credit is one of the most generous in the TTS space. No credit card required, no expiry date. At the PAYG TTS rate of $0.030/1K characters:
- ~6.67 million TTS characters — roughly 1.1 million words or ~220 hours of generated audio
- ~46,500 minutes of STT at Nova-3 rates ($0.0043/min) — if you use credits for transcription instead
- ~44 hours of Voice Agent API at $4.50/hr — enough for extensive prototype testing
For comparison, ElevenLabs' free tier gives 10,000 credits/month (~10 minutes of audio). Amazon Polly's free tier gives 5 million characters but expires after 12 months. Cartesia offers 20,000 free characters. Deepgram's $200 credit dwarfs all of these for TTS evaluation.
Deepgram vs 10 TTS Competitors
| Service | Cost/1M Chars | Voices | Languages | Best For |
|---|---|---|---|---|
| Deepgram Aura-2 | $30 | 40+ | 7 | Voice agents (unified STT+TTS) |
| ElevenLabs Flash | $60 | 4,000+ | 70+ | Content creation, voice cloning |
| Cartesia Sonic 3 | ~$37 | ~130 | 42 | Speed-critical agents (40ms) |
| OpenAI tts-1 | $15 | 9 | 57+ | Budget production |
| Fish Audio S2 | $15 | Community | 80+ | Quality/price (#1 blind tests) |
| Inworld TTS-2 | $25–$35 | Custom | 100+ | Conversational agents (#1 Arena) |
| Amazon Polly Neural | $16 | 60+ | 30+ | AWS-native applications |
| Gemini Flash | ~$12 | 30 | 70+ | Budget + quality (#2 Arena) |
| Grok TTS | $4.20 | 5 | 20+ | Ultra-budget (beta pricing) |
| Chatterbox | $0 | 20 | 1 (EN) | Free open-source |
| NaturalReader | $16.50–$49/mo | AI voices | 20+ | Reading app (personal use) |
Deepgram sits in the middle of the pack on pure TTS pricing. It's cheaper than ElevenLabs and Cartesia, more expensive than OpenAI, Fish Audio, and Grok. Its differentiator isn't price — it's the unified STT + TTS stack and the Voice Agent API bundle. For the complete pricing picture across all these services, use our TTS pricing comparison tool.
Real-World TTS Cost Examples
| Use Case | Volume | PAYG Cost | Growth Cost |
|---|---|---|---|
| Small chatbot | 100K chars/mo | $3/mo | $2.70/mo |
| Voice agent | 1M chars/mo | $30/mo | $27/mo |
| Call center (500 calls/day) | ~10M chars/mo | $300/mo | $270/mo |
| Enterprise scale | 100M chars/mo | $3,000/mo | $2,700/mo |
The Voice Agent API at $4.50/hr is often the better deal for voice agent use cases. A 500-call/day operation at 4 min/call averages ~33 hrs/day or roughly $4,500/month — but you get STT and orchestration included. The per-character TTS pricing is better for non-voice-agent use cases like IVR prompts, notification audio, or content generation.
Hidden Costs and Gotchas
- Aura-2 pricing doubled in late 2025: The original Aura model was $0.015/1K characters. Aura-2 is $0.030/1K — a 100% price increase. Deepgram positioned this as a quality upgrade, but if you're comparing to old blog posts citing Deepgram TTS at $15/1M, those numbers are outdated.
- Growth tier requires $4,000+ prepay: Unlike ElevenLabs or OpenAI where you can upgrade to a monthly plan at any time, Deepgram's Growth tier needs a significant upfront commitment. The 10% discount is real, but the cash outlay is a barrier for smaller teams.
- 7 languages only: Deepgram Aura-2 supports English, Spanish, Dutch, French, German, Italian, and Japanese. ElevenLabs supports 70+, Fish Audio 80+. If your users speak Portuguese, Korean, or Arabic, Deepgram TTS isn't an option yet.
- No voice cloning: Deepgram Aura-2 does not offer voice cloning. You're limited to the 40+ preset voices. For brand voice needs, you'll need a different provider.
- Enterprise features are enterprise-only: Self-hosting, custom models, data residency, and DPA require the Enterprise plan with an annual contract. These features are table stakes for some industries but gated behind sales conversations.
When Deepgram Wins (and When It Doesn't)
Choose Deepgram If
- You already use Deepgram STT (unified stack)
- You're building voice agents ($4.50/hr bundle)
- Domain-specific pronunciation matters (medical, legal)
- Low latency + high reliability at scale
- You need enterprise self-hosting
Skip Deepgram If
- Voice quality is your top priority (ElevenLabs or Fish Audio)
- You need voice cloning (not available)
- You need 8+ languages ( OpenAI or Fish Audio)
- You're creating content (audiobooks, videos)
- Budget is paramount (OpenAI at $15/1M, Grok at $4.20/1M)
Related Pricing Guides
By TextToLab Research Team · Last verified May 2026 against Deepgram's official pricing page (deepgram.com/pricing). Competitor rates verified against each provider's pricing page. Voice Agent API pricing confirmed via Deepgram documentation.