How much does Amazon Polly cost?

Amazon Polly pricing varies by engine: Standard costs $4 per 1 million characters, Neural costs $16/1M, Generative costs $30/1M, and Long-Form costs $100/1M. There is also a free tier that includes 5 million standard characters per month for the first 12 months.

Does Amazon Polly have a free tier?

Yes. New AWS accounts get 5 million Standard characters, 1 million Neural characters, 500,000 Long-Form characters, and 100,000 Generative characters per month free for 12 months. The Standard allowance alone is enough for roughly 115 hours of audio per month.

Is Amazon Polly cheaper than ElevenLabs?

Yes, significantly. Amazon Polly Neural costs $16 per million characters versus ElevenLabs Flash API at $60/1M and Multilingual at $120/1M. At enterprise scale (10M+ characters/month), Polly is 4-15x cheaper. However, ElevenLabs offers superior voice quality and voice cloning.

What is the difference between Amazon Polly Standard and Neural?

Standard uses older concatenative synthesis ($4/1M chars) and sounds more robotic. Neural uses deep learning models ($16/1M chars) for significantly more natural speech. Neural is recommended for customer-facing content, while Standard works for IVR systems and automated notifications where cost matters more than quality.

What are the hidden costs of Amazon Polly?

Beyond per-character charges, AWS may bill for S3 storage (~$0.023/GB for storing audio files), S3 data transfer (~$0.09/GB after 100 GB free), Lambda invocations ($0.20/1M requests), and CloudWatch monitoring. A serverless architecture typically adds about $5/month in infrastructure costs.

Amazon Polly Pricing 2026: All 4 Engines, Free Tier, and Real Costs

Q: Does SSML markup count toward Amazon Polly billing?

No. SSML tags are not counted as billed characters. Only the actual spoken text is billed. This means you can use extensive SSML formatting — pauses, pronunciation overrides, emphasis, pitch changes — to improve audio quality without increasing costs.

Amazon Polly Pricing: The Quick Answer

Amazon Polly charges per million characters, with prices ranging from $4 to $100 per 1M characters depending on which engine you use. There are four engines, each targeting different quality levels and use cases. The free tier gives you 5 million standard characters per month for 12 months — that's roughly 115 hours of audio at no cost.

Engine	Price per 1M Chars	Voice Quality	Best For
Standard	$4.00	Functional, somewhat robotic	IVR, notifications, high-volume automation
Neural	$16.00	Natural-sounding, smooth	Customer-facing content, podcasts, videos
Generative	$30.00	Expressive, conversational	Voice agents, chatbots, LLM-driven apps
Long-Form	$100.00	Highest quality, sustained coherence	Audiobooks, long narration, accessibility

Bottom Line

Most teams should start with the Neural engine at $16/1M characters. It's the sweet spot between cost and quality — natural enough for customer-facing content, 4x cheaper than Long-Form, and the voices are significantly better than Standard. Standard at $4/1M is fine for automated notifications and IVR systems where quality matters less than cost. Skip Long-Form ($100/1M) unless you're producing full audiobooks.

Amazon Polly's Four Engines Explained

Amazon Polly isn't one service — it's four engines at four different price points. The engine you choose determines voice quality, available voices, and supported features. This is the part that catches people off guard. A quick “Amazon Polly is $4 per million characters” is technically true but misleading. That $4 rate gets you Standard voices, which sound noticeably robotic compared to modern TTS alternatives.

Standard Engine — $4/1M Characters

The cheapest option and the oldest technology. Standard voices use concatenative synthesis — stitching together pre-recorded speech segments. The result is functional but clearly synthetic. You'll hear occasional awkward pauses, inconsistent intonation, and a “reading a script” quality that's hard to mistake for human speech.

That said, $4/1M characters is absurdly cheap for production TTS. For automated phone menus, system notifications, accessibility screen readers, and any use case where voice quality isn't the priority, Standard delivers reliable audio at a price that's hard to beat. Standard voices are available in 33+ languages with SSML support for pronunciation, emphasis, and pacing control.

Neural Engine — $16/1M Characters

The Neural engine uses deep learning models to produce more natural-sounding speech. It's 4x the price of Standard but the quality jump is substantial — smoother intonation, more natural pacing, and better handling of complex sentences. Neural voices sound professional enough for customer-facing content, video narration, and e-learning materials.

Neural is the engine I recommend for most production use cases. At $16/1M characters, it's still cheaper than OpenAI TTS ($15/1M for TTS-1, which is lower quality than Polly Neural) and dramatically cheaper than ElevenLabs ($60–$300/1M). The main limitation is fewer voice options compared to Standard — not every Standard voice has a Neural equivalent.

Generative Engine — $30/1M Characters

Added in 2024 and significantly expanded in March 2026, the Generative engine uses a billion-parameter transformer model. AWS describes it as producing speech that's “assertive, emotionally engaged, and highly colloquial” — and in practice, it does sound more conversational than Neural. The key feature is bidirectional streaming: you can stream text to Polly and receive synthesized audio back simultaneously, making it suitable for real-time chatbots and LLM-powered voice agents.

As of March 2026, Generative has 10+ voices and is available in 8 AWS regions including US East, US West, Europe (Frankfurt and London), Canada, and Asia Pacific (Seoul, Singapore, Tokyo). At $30/1M characters, it matches OpenAI's TTS-1-HD pricing and competes with Google Cloud TTS Studio voices.

Long-Form Engine — $100/1M Characters

The premium tier, designed specifically for content that runs longer than a few paragraphs. Long-Form voices maintain consistent quality, pacing, and tone across extended passages — audiobooks, full articles, long-form accessibility content. Most TTS engines degrade or become monotone over longer texts. Long-Form is engineered to avoid that.

At $100/1M characters, Long-Form is the most expensive Polly option by far — 25x the price of Standard. A 120,000-character novel costs $12 with Long-Form vs. $1.92 with Neural vs. $0.48 with Standard. Unless you're producing actual audiobooks or very long narrated content where sustained quality matters, Neural is the better value. For audiobook-specific guidance, see our best TTS for audiobooks guide.

Amazon Polly Free Tier: More Generous Than You Think

Amazon Polly's free tier is one of the most generous in the TTS market. For the first 12 months after creating an AWS account, you get:

Engine	Free Characters/Month	Approx. Audio Hours	Annual Value
Standard	5,000,000	~115 hours	$240 saved
Neural	1,000,000	~23 hours	$192 saved
Long-Form	500,000	~11.5 hours	$600 saved
Generative	100,000	~2.3 hours	$36 saved

For context: 5 million Standard characters per month is enough to synthesize roughly 115 hours of audio — that's nearly 5 full days of continuous speech. Even the Neural tier at 1 million characters gives you about 23 hours, which is more than enough for development, prototyping, and low-volume production. Starting July 2025, new AWS customers also receive up to $200 in general AWS Free Tier credits, which can be applied to Polly.

The catch: these allowances expire after 12 months, and there's no grace period. On month 13, you start paying full rates immediately. Set a calendar reminder before the anniversary of your AWS account creation. For more details on free TTS options, see our free text-to-speech comparison.

How This Compares to Other Free Tiers

Amazon Polly: 5M standard chars/mo for 12 months (~115 hours/mo)
ElevenLabs: 10,000 chars/mo indefinitely (~10 minutes/mo)
Murf AI: 10 minutes total (watermarked, not commercial)
Speechify: Limited listening only (no downloads)
Chatterbox: Fully free, open source, unlimited (self-hosted)
Google Cloud TTS: 4M standard chars/mo for 12 months

Polly's free tier is the most generous of any commercial TTS service by a wide margin.

Real-World Cost Examples

Abstract pricing numbers don't mean much until you map them to actual use cases. Here's what common scenarios actually cost:

Blog Post Read Aloud (~5,000 characters)

Engine	Cost
Standard	$0.02
Neural	$0.08
Generative	$0.15
Long-Form	$0.50

YouTube Video Script (~50,000 characters, ~20 min narration)

Engine	Cost
Standard	$0.20
Neural	$0.80
Generative	$1.50
Long-Form	$5.00

Full Novel / Audiobook (~120,000 characters)

Engine	Cost	Audio Length
Standard	$0.48	~2.7 hours
Neural	$1.92	~2.7 hours
Long-Form	$12.00	~2.7 hours

Enterprise Scale: 10M Characters/Month (~230 Hours)

Engine	Monthly Cost	Annual Cost
Standard	$40	$480
Neural	$160	$1,920
Generative	$300	$3,600
Long-Form	$1,000	$12,000

For most teams, the eye-opening number is the Neural column. 230 hours of natural-sounding audio per month for $160 — that's less than a single month of ElevenLabs Pro ($99/mo for 500K characters, roughly 11 hours). Use our TTS cost calculator to model your specific volume.

Hidden Costs: What AWS Doesn't Highlight

Polly's per-character prices are straightforward. The costs that surprise people are the AWS infrastructure charges around Polly. These won't appear on Polly's pricing page, but they'll show up on your AWS bill:

Cost	Price	When It Applies
S3 Storage	~$0.023/GB	Storing generated audio files
S3 Data Transfer	~$0.09/GB after 100 GB free	Serving audio files to users/clients
Lambda	$0.20/1M requests	If using serverless functions to trigger Polly
CloudWatch	~$0.30/metric/mo (detailed)	Monitoring usage; basic metrics are free

In practice, these ancillary costs are minimal for most use cases. A serverless architecture (Lambda + S3 + Step Functions) costs roughly $5/month in infrastructure on top of Polly's character charges. The big cost savings tip: cache and replay generated audio. Polly doesn't charge for replaying previously generated speech. Pre-generate common phrases and store them in S3 — this eliminates re-synthesis charges entirely for repeated content like IVR prompts, notifications, and standard greetings.

SSML Is Free

SSML markup tags are not counted as billed characters. Only the actual spoken text counts toward your bill. This means you can use extensive SSML formatting — pauses, pronunciation overrides, emphasis, pitch and rate changes — without increasing costs. SSML is essentially a free way to improve audio quality. The SynthesizeSpeech API accepts up to 6,000 total characters per request, of which no more than 3,000 can be billable text.

Amazon Polly Pricing vs. Every Major Competitor

The TTS market has gotten crowded in 2026. Here's how Polly stacks up on price per million characters across every major service you're likely evaluating:

Service	Best Price/1M Chars	Premium Price/1M Chars	Free Tier
Amazon Polly	$4 (Standard)	$100 (Long-Form)	5M/mo × 12 mo
Grok TTS (xAI)	$4.20	$4.20	None (beta)
Google Cloud TTS	$4 (Standard)	$30 (Studio/Chirp 3)	4M/mo × 12 mo
Murf Falcon API	$10 (conversational)	$30 (studio quality)	None
OpenAI TTS	$15 (TTS-1)	$30 (TTS-1-HD)	None
Inworld TTS	$25 (Mini, std pricing)	$50 (Max, std pricing)	Founder pricing until May 7
ElevenLabs	$60 (Flash API)	$300 (Creator overage)	10K chars/mo
Chatterbox	$0 (self-hosted)	Compute costs only	Fully free (MIT license)

Polly's Standard and Neural engines are among the cheapest commercial TTS APIs available. Only Grok TTS ($4.20/1M) comes close to Standard on price, and Grok is still in beta with a smaller voice library. Google Cloud TTS matches Polly's Standard and Neural pricing exactly ($4/$16), making the choice between them mostly about ecosystem preference (AWS vs. GCP). For the full cross-service comparison, visit our TTS pricing comparison page.

When Amazon Polly Is (and Isn't) the Right Choice

Polly Wins When:

You're already on AWS. Polly integrates natively with S3, Lambda, Connect, Lex, and other AWS services. No additional authentication, no external API calls, no data leaving the AWS ecosystem.
High volume is the priority. At $4–$16/1M characters, Polly is 4–15x cheaper than ElevenLabs at scale. For producing hundreds of hours per month, nothing commercial beats Polly on price except self-hosting Chatterbox.
IVR and phone systems. Amazon Connect + Polly is the standard stack for enterprise call centers. SSML support, Speech Marks for lip-sync, and low latency make it ideal for telephony.
You need SSML control. Polly has the most robust SSML implementation of any TTS service. Full support for prosody, phoneme, emphasis, break, and say-as tags — and SSML markup doesn't count toward your character billing.
The free tier covers your needs. 5M standard characters/month for 12 months is enough for development, prototyping, and low-volume production without spending a cent.

Skip Polly When:

Voice quality is your top priority. ElevenLabs, Inworld TTS-1.5 Max, and even OpenAI TTS produce more natural-sounding speech than Polly's best engines. If your content is customer-facing and quality matters more than cost, test those alternatives first.
You need voice cloning. Polly doesn't offer it. ElevenLabs and Chatterbox both provide voice cloning (Chatterbox for free).
You want a studio editor. Murf AI is the only major TTS platform with a full visual timeline editor. Polly is API-only (plus a basic AWS Console demo).
You're not comfortable with AWS. Polly requires an AWS account, IAM configuration, and familiarity with AWS SDK or CLI. The learning curve is real for non-developers.
You need real-time conversational AI. While the Generative engine added bidirectional streaming in March 2026, newer services like Cartesia Sonic (<40ms TTFA) and Murf Falcon (130ms TTFA) have faster real-time performance.

For alternatives, check our Amazon Polly alternatives page or the TTS API comparison.

5 Ways to Cut Your Amazon Polly Bill

Cache everything. Store generated audio in S3 and replay it. Polly doesn't charge for replaying cached audio. For IVR systems with repeated greetings and prompts, this alone can cut synthesis costs by 80%+.
Use Standard for non-critical audio. System notifications, automated alerts, and internal tools don't need Neural quality. Save the $16/1M Neural rate for customer-facing content and use the $4/1M Standard rate for everything else.
Go serverless. Lambda + S3 + Step Functions costs roughly $5/month in infrastructure. An EC2-based approach runs ~$70/month. That's $780/year in savings on infrastructure alone.
Leverage SSML (it's free). SSML tags aren't billed. Use prosody, emphasis, and break tags to improve quality without regenerating content at a higher-quality (and more expensive) engine tier.
Set billing alerts. AWS Billing Alerts notify you when spending exceeds a threshold. Set one before you start, not after you get a surprise bill.

What Changed in March 2026

Amazon expanded Polly's Generative engine significantly in March 2026:

10 new Generative voices added to the library
2 new AWS regions: Europe (London) and Canada (Central), bringing total Generative availability to 8 regions
Bidirectional streaming API: Stream text to Polly and receive audio back simultaneously. Designed for chatbots, game characters, and LLM-driven conversational AI where you can't wait for the full text before starting synthesis.
Total voice library now exceeds 100 voices across 40+ languages

The bidirectional streaming is the most significant update — it positions Polly's Generative engine as a real option for real-time voice agents, competing with Murf Falcon, Cartesia Sonic, and Deepgram Aura-2 in that market. Explore all of Polly's features on our Amazon Polly overview page.

Which Engine Should You Choose?

Standard ($4/1M): IVR, notifications, automated alerts, accessibility where cost beats quality.
Neural ($16/1M): Customer-facing content, video narration, e-learning — the sweet spot for most production use cases.
Generative ($30/1M): Real-time voice agents, chatbots, conversational AI where streaming and expressiveness matter.
Long-Form ($100/1M): Audiobooks and extended narration only. The price premium is only worth it if you need sustained quality over 30+ minute passages.

Start with the free tier to test all four engines, estimate your monthly volume with the TTS cost calculator, and compare Polly against all TTS services before committing. For alternatives, browse our Amazon Polly alternatives page.