OpenAI TTS Pricing at a Glance (May 2026)
OpenAI charges a flat per-character rate with no monthly subscription required. You have three models to choose from: tts-1 at $15/1M characters for speed, tts-1-hd at $30/1M for higher fidelity, and the newer gpt-4o-mini-tts at roughly $15/1M characters with steerable voice instructions. No credits, no tiers, no rollover headaches — just pay for what you generate.
| Model | Price per 1M Chars | ≈ Cost per Minute | Voices | Best For |
|---|---|---|---|---|
| tts-1 | $15 | ~$0.015 | 9 | Low-latency apps, chatbots |
| tts-1-hd | $30 | ~$0.030 | 9 | Pre-rendered content, podcasts |
| gpt-4o-mini-tts | ~$15* | ~$0.015 | 13 | Steerable voice, emotional control |
*gpt-4o-mini-tts uses token-based pricing ($0.60/1M input tokens + $12/1M audio output tokens). The ~$15/1M character rate is an estimate based on typical text-to-token ratios. Actual cost varies with input length.
tts-1 vs tts-1-hd: Is Paying 2x Worth It?
I ran the same 500-word script through both models, and the honest answer is: for most use cases, tts-1 is good enough. The HD model produces cleaner consonants and slightly richer vocal texture, but you need decent headphones to hear the difference. In a car, on laptop speakers, or in a busy office? Indistinguishable.
Where tts-1-hd earns its 2x premium:
- Audiobook narration — long-form listening where subtle artifacts compound
- Brand voice recordings — IVR greetings, product demos where polish matters
- Music/poetry — content where tonal precision affects interpretation
For chatbots, notifications, internal tools, and prototyping? Stick with tts-1 and save 50%. The latency is actually lower on tts-1, making it better for real-time applications anyway.
gpt-4o-mini-tts: The Model That Changes Everything
Released in March 2025, gpt-4o-mini-tts is OpenAI's biggest TTS upgrade since launch. The headline feature: you can tell it how to speak, not just what to say. Pass an instructions parameter like "Speak in a warm, reassuring tone with occasional pauses for emphasis" and the model adjusts delivery accordingly.
This is a genuine differentiator. With tts-1, you pick a voice and that's it — the emotional delivery is fixed. With gpt-4o-mini-tts, the same "Nova" voice can sound excited, somber, professional, or playful depending on your instructions. No other pay-per-character TTS API offers this level of control at this price point.
Token Pricing Math
gpt-4o-mini-tts bills by tokens instead of characters. Text input costs $0.60/1M tokens, audio output costs $12/1M audio tokens. In practice, a 1,000-word blog post (about 5,000 characters) generates roughly 5 minutes of audio for $0.075 — slightly cheaper than tts-1 at $0.075 for the same text.
The instructions parameter counts toward your input tokens. A 50-word style prompt adds about $0.000018 per request — negligible.
13 voices are available: Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer, Verse, Marin, and Cedar. You can preview all of them on our OpenAI TTS voice samples page. The additional voices (Ballad, Verse, Marin, Cedar) are exclusive to gpt-4o-mini-tts and tend toward more conversational, natural delivery.
Real-World Cost Examples
Abstract per-character rates don't mean much until you map them to actual projects. Here's what OpenAI TTS costs for common use cases, compared against ElevenLabs and Amazon Polly:
| Use Case | Characters | tts-1 Cost | tts-1-hd Cost | ElevenLabs (Flash) | Polly Neural |
|---|---|---|---|---|---|
| Blog post (1,000 words) | ~5,000 | $0.08 | $0.15 | $0.30 | $0.08 |
| E-learning module (30 min) | ~45,000 | $0.68 | $1.35 | $2.70 | $0.72 |
| Podcast (1 hour) | ~90,000 | $1.35 | $2.70 | $5.40 | $1.44 |
| Full audiobook (80K words) | ~400,000 | $6.00 | $12.00 | $24.00 | $6.40 |
| SaaS app (1M chars/month) | 1,000,000 | $15.00 | $30.00 | $60.00 | $16.00 |
The takeaway: OpenAI tts-1 and Amazon Polly Neural are neck-and-neck on cost. ElevenLabs is 4x more expensive. But voice quality and features matter too — check our full pricing comparison for the complete picture. You can also estimate your own costs with our TTS cost calculator.
Is There a Free Tier?
Sort of. OpenAI doesn't have a dedicated free plan for TTS. But new API accounts get $5 in free credits that work across all OpenAI services, including TTS. At tts-1 rates, $5 buys you about 333,000 characters — roughly 5.5 hours of audio. That's generous enough to build a prototype and test all 9 voices before spending a dime.
After the credits expire, it's pure pay-as-you-go. No monthly minimums, no commitments. You add a credit card and pay for exactly what you use. For developers testing TTS, this is actually a better deal than ElevenLabs' free tier (10,000 characters/month, about 10 minutes) — OpenAI gives you 33x more characters upfront, just not recurring.
For ongoing free TTS, consider Chatterbox Turbo (fully free, open-source) or Gemini Flash TTS (free quota in Google AI Studio).
API Limits and Hidden Gotchas
OpenAI's TTS pricing is refreshingly simple compared to ElevenLabs' credit system, but there are a few things that catch people off guard:
- 4,096 character limit per request — that's about 5 minutes of audio. For longer content, you need to split text and stitch audio files. This is the #1 complaint on the OpenAI developer forums.
- gpt-4o-mini-tts has a 2,000 input token limit — approximately 1,500 English words per request. The instructions prompt counts toward this limit.
- Rate limits start at 50 RPM — new paid accounts get 50 requests per minute for tts-1/tts-1-hd. This is enough for most applications, but voice agents handling concurrent calls may hit it.
- No voice cloning — unlike ElevenLabs, Murf AI, or Cartesia, OpenAI doesn't offer voice cloning. You're limited to the built-in voices.
- SynthID watermarking — all OpenAI TTS output is watermarked with SynthID for AI content detection. This is invisible to listeners but detectable by Google and other platforms.
- No SSML support — you can't use Speech Synthesis Markup Language for fine-grained pronunciation control. The gpt-4o-mini-tts instructions parameter partially replaces this, but it's less precise than Amazon Polly's SSML.
Azure OpenAI TTS: Same Models, Different Pricing
Enterprise teams often use Azure OpenAI instead of the direct API. The models are identical — same voices, same quality, same character limits. But pricing runs about 2x higher on Azure (roughly $30/1M characters for tts-1 equivalent). The trade-off: Azure gives you enterprise SLAs, VNet integration, HIPAA/SOC2 compliance, and data residency controls. If your company already has an Azure Enterprise Agreement, the Azure path may actually be cheaper after discount.
For small teams and indie developers? Direct API wins on cost. For regulated industries (healthcare, finance, government)? Azure is often the only option that clears legal review.
5 Ways to Cut Your OpenAI TTS Bill
- Use tts-1 unless you need HD — saves 50% with minimal quality difference for most applications.
- Cache aggressively — if the same text gets converted repeatedly (IVR greetings, product names, standard responses), cache the audio output. OpenAI charges per generation, not per playback.
- Strip unnecessary text — URLs, code blocks, markdown syntax, and boilerplate all consume characters without adding value. Clean your input before sending it.
- Batch requests efficiently — stay close to the 4,096 character limit per request rather than sending many small requests. Fewer requests = lower overhead.
- Consider Polly for high volume — at 10M+ characters/month, Amazon Polly Neural at $16/1M is slightly cheaper and offers SSML. OpenAI wins on simplicity; Polly wins on cost at scale.
OpenAI TTS vs 9 Competitors: Price per Million Characters
Here's how OpenAI stacks up against every major TTS provider. Prices are per 1M characters on each service's primary model (not budget or premium tiers):
| Service | Cost/1M Chars | Free Tier | Voice Cloning | Arena Rank |
|---|---|---|---|---|
| Grok TTS (beta) | $4.20 | None | No | Not ranked |
| Gemini Flash TTS | ~$12 | Quota-limited | No | #2 (ELO 1,211) |
| OpenAI tts-1 | $15 | $5 one-time credits | No | Not ranked |
| Amazon Polly Neural | $16 | 5M chars/12 mo | No | Not ranked |
| Inworld TTS Max | $10–$50 | 40 min trial | Yes | #1 (ELO 1,236) |
| Cartesia Sonic 3 | ~$33 | 20K chars | Yes (3s sample) | #10 (ELO 1,054) |
| ElevenLabs Flash | $60 | 10K chars/mo | Yes | #4 (ELO 1,179) |
| Murf AI Falcon | $10–$30 | 10 min total | Yes (Business plan) | Not ranked |
| Chatterbox Turbo | Free (self-host) | Unlimited | Yes (free) | Not ranked |
| Speechify Studio | $19–$49/mo flat | Limited | No | Not ranked |
OpenAI sits in the middle of the pack on price. Cheaper than ElevenLabs and Cartesia. More expensive than Grok and Gemini Flash. Roughly tied with Polly. The real draw is API simplicity — OpenAI's TTS endpoint is the easiest to integrate if you're already using GPT models.
Who Should (and Shouldn't) Use OpenAI TTS
Best for
- Developers already using the OpenAI API
- Apps that need steerable voice (gpt-4o-mini-tts)
- Moderate volume (under 5M chars/month)
- Prototyping and MVPs (simple API, no SDK needed)
- Multi-format output (MP3, WAV, FLAC, Opus, PCM)
Not ideal for
- Voice cloning (try ElevenLabs or Cartesia)
- Non-developers (no studio UI — try Murf)
- Ultra-high volume (>10M chars/mo — Polly is cheaper)
- Top-tier voice quality (ElevenLabs or Inworld rank higher)
- Real-time voice agents (<100ms — try Cartesia 40ms)
Related Pricing Guides
For more on OpenAI's TTS voice quality and capabilities, read our full OpenAI TTS review with voice samples or compare it directly with competitors using our TTS API comparison guide.
Interested in TTS for audiobooks? OpenAI tts-1 produces a full 80,000-word audiobook for $6 — but voice quality trade-offs matter for long-form listening.