Amazon Polly Pricing: The Quick Answer
Amazon Polly charges per million characters, with prices ranging from $4 to $100 per 1M characters depending on which engine you use. There are four engines, each targeting different quality levels and use cases. The free tier gives you 5 million standard characters per month for 12 months — that's roughly 115 hours of audio at no cost.
| Engine | Price per 1M Chars | Voice Quality | Best For |
|---|---|---|---|
| Standard | $4.00 | Functional, somewhat robotic | IVR, notifications, high-volume automation |
| Neural | $16.00 | Natural-sounding, smooth | Customer-facing content, podcasts, videos |
| Generative | $30.00 | Expressive, conversational | Voice agents, chatbots, LLM-driven apps |
| Long-Form | $100.00 | Highest quality, sustained coherence | Audiobooks, long narration, accessibility |
Bottom Line
Most teams should start with the Neural engine at $16/1M characters. It's the sweet spot between cost and quality — natural enough for customer-facing content, 4x cheaper than Long-Form, and the voices are significantly better than Standard. Standard at $4/1M is fine for automated notifications and IVR systems where quality matters less than cost. Skip Long-Form ($100/1M) unless you're producing full audiobooks.
Amazon Polly's Four Engines Explained
Amazon Polly isn't one service — it's four engines at four different price points. The engine you choose determines voice quality, available voices, and supported features. This is the part that catches people off guard. A quick “Amazon Polly is $4 per million characters” is technically true but misleading. That $4 rate gets you Standard voices, which sound noticeably robotic compared to modern TTS alternatives.
Standard Engine — $4/1M Characters
The cheapest option and the oldest technology. Standard voices use concatenative synthesis — stitching together pre-recorded speech segments. The result is functional but clearly synthetic. You'll hear occasional awkward pauses, inconsistent intonation, and a “reading a script” quality that's hard to mistake for human speech.
That said, $4/1M characters is absurdly cheap for production TTS. For automated phone menus, system notifications, accessibility screen readers, and any use case where voice quality isn't the priority, Standard delivers reliable audio at a price that's hard to beat. Standard voices are available in 33+ languages with SSML support for pronunciation, emphasis, and pacing control.
Neural Engine — $16/1M Characters
The Neural engine uses deep learning models to produce more natural-sounding speech. It's 4x the price of Standard but the quality jump is substantial — smoother intonation, more natural pacing, and better handling of complex sentences. Neural voices sound professional enough for customer-facing content, video narration, and e-learning materials.
Neural is the engine I recommend for most production use cases. At $16/1M characters, it's still cheaper than OpenAI TTS ($15/1M for TTS-1, which is lower quality than Polly Neural) and dramatically cheaper than ElevenLabs ($60–$300/1M). The main limitation is fewer voice options compared to Standard — not every Standard voice has a Neural equivalent.
Generative Engine — $30/1M Characters
Added in 2024 and significantly expanded in March 2026, the Generative engine uses a billion-parameter transformer model. AWS describes it as producing speech that's “assertive, emotionally engaged, and highly colloquial” — and in practice, it does sound more conversational than Neural. The key feature is bidirectional streaming: you can stream text to Polly and receive synthesized audio back simultaneously, making it suitable for real-time chatbots and LLM-powered voice agents.
As of March 2026, Generative has 10+ voices and is available in 8 AWS regions including US East, US West, Europe (Frankfurt and London), Canada, and Asia Pacific (Seoul, Singapore, Tokyo). At $30/1M characters, it matches OpenAI's TTS-1-HD pricing and competes with Google Cloud TTS Studio voices.
Long-Form Engine — $100/1M Characters
The premium tier, designed specifically for content that runs longer than a few paragraphs. Long-Form voices maintain consistent quality, pacing, and tone across extended passages — audiobooks, full articles, long-form accessibility content. Most TTS engines degrade or become monotone over longer texts. Long-Form is engineered to avoid that.
At $100/1M characters, Long-Form is the most expensive Polly option by far — 25x the price of Standard. A 120,000-character novel costs $12 with Long-Form vs. $1.92 with Neural vs. $0.48 with Standard. Unless you're producing actual audiobooks or very long narrated content where sustained quality matters, Neural is the better value. For audiobook-specific guidance, see our best TTS for audiobooks guide.
Amazon Polly Free Tier: More Generous Than You Think
Amazon Polly's free tier is one of the most generous in the TTS market. For the first 12 months after creating an AWS account, you get:
| Engine | Free Characters/Month | Approx. Audio Hours | Annual Value |
|---|---|---|---|
| Standard | 5,000,000 | ~115 hours | $240 saved |
| Neural | 1,000,000 | ~23 hours | $192 saved |
| Long-Form | 500,000 | ~11.5 hours | $600 saved |
| Generative | 100,000 | ~2.3 hours | $36 saved |
For context: 5 million Standard characters per month is enough to synthesize roughly 115 hours of audio — that's nearly 5 full days of continuous speech. Even the Neural tier at 1 million characters gives you about 23 hours, which is more than enough for development, prototyping, and low-volume production. Starting July 2025, new AWS customers also receive up to $200 in general AWS Free Tier credits, which can be applied to Polly.
The catch: these allowances expire after 12 months, and there's no grace period. On month 13, you start paying full rates immediately. Set a calendar reminder before the anniversary of your AWS account creation. For more details on free TTS options, see our free text-to-speech comparison.
How This Compares to Other Free Tiers
- Amazon Polly: 5M standard chars/mo for 12 months (~115 hours/mo)
- ElevenLabs: 10,000 chars/mo indefinitely (~10 minutes/mo)
- Murf AI: 10 minutes total (watermarked, not commercial)
- Speechify: Limited listening only (no downloads)
- Chatterbox: Fully free, open source, unlimited (self-hosted)
- Google Cloud TTS: 4M standard chars/mo for 12 months
Polly's free tier is the most generous of any commercial TTS service by a wide margin.
Real-World Cost Examples
Abstract pricing numbers don't mean much until you map them to actual use cases. Here's what common scenarios actually cost:
Blog Post Read Aloud (~5,000 characters)
| Engine | Cost |
|---|---|
| Standard | $0.02 |
| Neural | $0.08 |
| Generative | $0.15 |
| Long-Form | $0.50 |
YouTube Video Script (~50,000 characters, ~20 min narration)
| Engine | Cost |
|---|---|
| Standard | $0.20 |
| Neural | $0.80 |
| Generative | $1.50 |
| Long-Form | $5.00 |
Full Novel / Audiobook (~120,000 characters)
| Engine | Cost | Audio Length |
|---|---|---|
| Standard | $0.48 | ~2.7 hours |
| Neural | $1.92 | ~2.7 hours |
| Long-Form | $12.00 | ~2.7 hours |
Enterprise Scale: 10M Characters/Month (~230 Hours)
| Engine | Monthly Cost | Annual Cost |
|---|---|---|
| Standard | $40 | $480 |
| Neural | $160 | $1,920 |
| Generative | $300 | $3,600 |
| Long-Form | $1,000 | $12,000 |
For most teams, the eye-opening number is the Neural column. 230 hours of natural-sounding audio per month for $160 — that's less than a single month of ElevenLabs Pro ($99/mo for 500K characters, roughly 11 hours). Use our TTS cost calculator to model your specific volume.
Hidden Costs: What AWS Doesn't Highlight
Polly's per-character prices are straightforward. The costs that surprise people are the AWS infrastructure charges around Polly. These won't appear on Polly's pricing page, but they'll show up on your AWS bill:
| Cost | Price | When It Applies |
|---|---|---|
| S3 Storage | ~$0.023/GB | Storing generated audio files |
| S3 Data Transfer | ~$0.09/GB after 100 GB free | Serving audio files to users/clients |
| Lambda | $0.20/1M requests | If using serverless functions to trigger Polly |
| CloudWatch | ~$0.30/metric/mo (detailed) | Monitoring usage; basic metrics are free |
In practice, these ancillary costs are minimal for most use cases. A serverless architecture (Lambda + S3 + Step Functions) costs roughly $5/month in infrastructure on top of Polly's character charges. The big cost savings tip: cache and replay generated audio. Polly doesn't charge for replaying previously generated speech. Pre-generate common phrases and store them in S3 — this eliminates re-synthesis charges entirely for repeated content like IVR prompts, notifications, and standard greetings.
SSML Is Free
SSML markup tags are not counted as billed characters. Only the actual spoken text counts toward your bill. This means you can use extensive SSML formatting — pauses, pronunciation overrides, emphasis, pitch and rate changes — without increasing costs. SSML is essentially a free way to improve audio quality. The SynthesizeSpeech API accepts up to 6,000 total characters per request, of which no more than 3,000 can be billable text.
Amazon Polly Pricing vs. Every Major Competitor
The TTS market has gotten crowded in 2026. Here's how Polly stacks up on price per million characters across every major service you're likely evaluating:
| Service | Best Price/1M Chars | Premium Price/1M Chars | Free Tier |
|---|---|---|---|
| Amazon Polly | $4 (Standard) | $100 (Long-Form) | 5M/mo × 12 mo |
| Grok TTS (xAI) | $4.20 | $4.20 | None (beta) |
| Google Cloud TTS | $4 (Standard) | $30 (Studio/Chirp 3) | 4M/mo × 12 mo |
| Murf Falcon API | $10 (conversational) | $30 (studio quality) | None |
| OpenAI TTS | $15 (TTS-1) | $30 (TTS-1-HD) | None |
| Inworld TTS | $25 (Mini, std pricing) | $50 (Max, std pricing) | Founder pricing until May 7 |
| ElevenLabs | $60 (Flash API) | $300 (Creator overage) | 10K chars/mo |
| Chatterbox | $0 (self-hosted) | Compute costs only | Fully free (MIT license) |
Polly's Standard and Neural engines are among the cheapest commercial TTS APIs available. Only Grok TTS ($4.20/1M) comes close to Standard on price, and Grok is still in beta with a smaller voice library. Google Cloud TTS matches Polly's Standard and Neural pricing exactly ($4/$16), making the choice between them mostly about ecosystem preference (AWS vs. GCP). For the full cross-service comparison, visit our TTS pricing comparison page.
When Amazon Polly Is (and Isn't) the Right Choice
Polly Wins When:
- You're already on AWS. Polly integrates natively with S3, Lambda, Connect, Lex, and other AWS services. No additional authentication, no external API calls, no data leaving the AWS ecosystem.
- High volume is the priority. At $4–$16/1M characters, Polly is 4–15x cheaper than ElevenLabs at scale. For producing hundreds of hours per month, nothing commercial beats Polly on price except self-hosting Chatterbox.
- IVR and phone systems. Amazon Connect + Polly is the standard stack for enterprise call centers. SSML support, Speech Marks for lip-sync, and low latency make it ideal for telephony.
- You need SSML control. Polly has the most robust SSML implementation of any TTS service. Full support for prosody, phoneme, emphasis, break, and say-as tags — and SSML markup doesn't count toward your character billing.
- The free tier covers your needs. 5M standard characters/month for 12 months is enough for development, prototyping, and low-volume production without spending a cent.
Skip Polly When:
- Voice quality is your top priority. ElevenLabs, Inworld TTS-1.5 Max, and even OpenAI TTS produce more natural-sounding speech than Polly's best engines. If your content is customer-facing and quality matters more than cost, test those alternatives first.
- You need voice cloning. Polly doesn't offer it. ElevenLabs and Chatterbox both provide voice cloning (Chatterbox for free).
- You want a studio editor. Murf AI is the only major TTS platform with a full visual timeline editor. Polly is API-only (plus a basic AWS Console demo).
- You're not comfortable with AWS. Polly requires an AWS account, IAM configuration, and familiarity with AWS SDK or CLI. The learning curve is real for non-developers.
- You need real-time conversational AI. While the Generative engine added bidirectional streaming in March 2026, newer services like Cartesia Sonic (<40ms TTFA) and Murf Falcon (130ms TTFA) have faster real-time performance.
For alternatives, check our Amazon Polly alternatives page or the TTS API comparison.
5 Ways to Cut Your Amazon Polly Bill
- Cache everything. Store generated audio in S3 and replay it. Polly doesn't charge for replaying cached audio. For IVR systems with repeated greetings and prompts, this alone can cut synthesis costs by 80%+.
- Use Standard for non-critical audio. System notifications, automated alerts, and internal tools don't need Neural quality. Save the $16/1M Neural rate for customer-facing content and use the $4/1M Standard rate for everything else.
- Go serverless. Lambda + S3 + Step Functions costs roughly $5/month in infrastructure. An EC2-based approach runs ~$70/month. That's $780/year in savings on infrastructure alone.
- Leverage SSML (it's free). SSML tags aren't billed. Use prosody, emphasis, and break tags to improve quality without regenerating content at a higher-quality (and more expensive) engine tier.
- Set billing alerts. AWS Billing Alerts notify you when spending exceeds a threshold. Set one before you start, not after you get a surprise bill.
What Changed in March 2026
Amazon expanded Polly's Generative engine significantly in March 2026:
- 10 new Generative voices added to the library
- 2 new AWS regions: Europe (London) and Canada (Central), bringing total Generative availability to 8 regions
- Bidirectional streaming API: Stream text to Polly and receive audio back simultaneously. Designed for chatbots, game characters, and LLM-driven conversational AI where you can't wait for the full text before starting synthesis.
- Total voice library now exceeds 100 voices across 40+ languages
The bidirectional streaming is the most significant update — it positions Polly's Generative engine as a real option for real-time voice agents, competing with Murf Falcon, Cartesia Sonic, and Deepgram Aura-2 in that market. Explore all of Polly's features on our Amazon Polly overview page.
Which Engine Should You Choose?
- Standard ($4/1M): IVR, notifications, automated alerts, accessibility where cost beats quality.
- Neural ($16/1M): Customer-facing content, video narration, e-learning — the sweet spot for most production use cases.
- Generative ($30/1M): Real-time voice agents, chatbots, conversational AI where streaming and expressiveness matter.
- Long-Form ($100/1M): Audiobooks and extended narration only. The price premium is only worth it if you need sustained quality over 30+ minute passages.
Start with the free tier to test all four engines, estimate your monthly volume with the TTS cost calculator, and compare Polly against all TTS services before committing. For alternatives, browse our Amazon Polly alternatives page.