Guide12 min readApril 22, 2026

Amazon Polly Pricing 2026: All 4 Engines, Free Tier, and Real Costs

Amazon Polly costs $4-$100 per 1M characters depending on engine. Breakdown of Standard, Neural, Generative, and Long-Form pricing with real-world cost examples and hidden AWS fees.

Amazon Polly Pricing: The Quick Answer

Amazon Polly charges per million characters, with prices ranging from $4 to $100 per 1M characters depending on which engine you use. There are four engines, each targeting different quality levels and use cases. The free tier gives you 5 million standard characters per month for 12 months — that's roughly 115 hours of audio at no cost.

EnginePrice per 1M CharsVoice QualityBest For
Standard$4.00Functional, somewhat roboticIVR, notifications, high-volume automation
Neural$16.00Natural-sounding, smoothCustomer-facing content, podcasts, videos
Generative$30.00Expressive, conversationalVoice agents, chatbots, LLM-driven apps
Long-Form$100.00Highest quality, sustained coherenceAudiobooks, long narration, accessibility

Bottom Line

Most teams should start with the Neural engine at $16/1M characters. It's the sweet spot between cost and quality — natural enough for customer-facing content, 4x cheaper than Long-Form, and the voices are significantly better than Standard. Standard at $4/1M is fine for automated notifications and IVR systems where quality matters less than cost. Skip Long-Form ($100/1M) unless you're producing full audiobooks.

Amazon Polly's Four Engines Explained

Amazon Polly isn't one service — it's four engines at four different price points. The engine you choose determines voice quality, available voices, and supported features. This is the part that catches people off guard. A quick “Amazon Polly is $4 per million characters” is technically true but misleading. That $4 rate gets you Standard voices, which sound noticeably robotic compared to modern TTS alternatives.

Standard Engine — $4/1M Characters

The cheapest option and the oldest technology. Standard voices use concatenative synthesis — stitching together pre-recorded speech segments. The result is functional but clearly synthetic. You'll hear occasional awkward pauses, inconsistent intonation, and a “reading a script” quality that's hard to mistake for human speech.

That said, $4/1M characters is absurdly cheap for production TTS. For automated phone menus, system notifications, accessibility screen readers, and any use case where voice quality isn't the priority, Standard delivers reliable audio at a price that's hard to beat. Standard voices are available in 33+ languages with SSML support for pronunciation, emphasis, and pacing control.

Neural Engine — $16/1M Characters

The Neural engine uses deep learning models to produce more natural-sounding speech. It's 4x the price of Standard but the quality jump is substantial — smoother intonation, more natural pacing, and better handling of complex sentences. Neural voices sound professional enough for customer-facing content, video narration, and e-learning materials.

Neural is the engine I recommend for most production use cases. At $16/1M characters, it's still cheaper than OpenAI TTS ($15/1M for TTS-1, which is lower quality than Polly Neural) and dramatically cheaper than ElevenLabs ($60–$300/1M). The main limitation is fewer voice options compared to Standard — not every Standard voice has a Neural equivalent.

Generative Engine — $30/1M Characters

Added in 2024 and significantly expanded in March 2026, the Generative engine uses a billion-parameter transformer model. AWS describes it as producing speech that's “assertive, emotionally engaged, and highly colloquial” — and in practice, it does sound more conversational than Neural. The key feature is bidirectional streaming: you can stream text to Polly and receive synthesized audio back simultaneously, making it suitable for real-time chatbots and LLM-powered voice agents.

As of March 2026, Generative has 10+ voices and is available in 8 AWS regions including US East, US West, Europe (Frankfurt and London), Canada, and Asia Pacific (Seoul, Singapore, Tokyo). At $30/1M characters, it matches OpenAI's TTS-1-HD pricing and competes with Google Cloud TTS Studio voices.

Long-Form Engine — $100/1M Characters

The premium tier, designed specifically for content that runs longer than a few paragraphs. Long-Form voices maintain consistent quality, pacing, and tone across extended passages — audiobooks, full articles, long-form accessibility content. Most TTS engines degrade or become monotone over longer texts. Long-Form is engineered to avoid that.

At $100/1M characters, Long-Form is the most expensive Polly option by far — 25x the price of Standard. A 120,000-character novel costs $12 with Long-Form vs. $1.92 with Neural vs. $0.48 with Standard. Unless you're producing actual audiobooks or very long narrated content where sustained quality matters, Neural is the better value. For audiobook-specific guidance, see our best TTS for audiobooks guide.

Amazon Polly Free Tier: More Generous Than You Think

Amazon Polly's free tier is one of the most generous in the TTS market. For the first 12 months after creating an AWS account, you get:

EngineFree Characters/MonthApprox. Audio HoursAnnual Value
Standard5,000,000~115 hours$240 saved
Neural1,000,000~23 hours$192 saved
Long-Form500,000~11.5 hours$600 saved
Generative100,000~2.3 hours$36 saved

For context: 5 million Standard characters per month is enough to synthesize roughly 115 hours of audio — that's nearly 5 full days of continuous speech. Even the Neural tier at 1 million characters gives you about 23 hours, which is more than enough for development, prototyping, and low-volume production. Starting July 2025, new AWS customers also receive up to $200 in general AWS Free Tier credits, which can be applied to Polly.

The catch: these allowances expire after 12 months, and there's no grace period. On month 13, you start paying full rates immediately. Set a calendar reminder before the anniversary of your AWS account creation. For more details on free TTS options, see our free text-to-speech comparison.

How This Compares to Other Free Tiers

  • Amazon Polly: 5M standard chars/mo for 12 months (~115 hours/mo)
  • ElevenLabs: 10,000 chars/mo indefinitely (~10 minutes/mo)
  • Murf AI: 10 minutes total (watermarked, not commercial)
  • Speechify: Limited listening only (no downloads)
  • Chatterbox: Fully free, open source, unlimited (self-hosted)
  • Google Cloud TTS: 4M standard chars/mo for 12 months

Polly's free tier is the most generous of any commercial TTS service by a wide margin.

Real-World Cost Examples

Abstract pricing numbers don't mean much until you map them to actual use cases. Here's what common scenarios actually cost:

Blog Post Read Aloud (~5,000 characters)

EngineCost
Standard$0.02
Neural$0.08
Generative$0.15
Long-Form$0.50

YouTube Video Script (~50,000 characters, ~20 min narration)

EngineCost
Standard$0.20
Neural$0.80
Generative$1.50
Long-Form$5.00

Full Novel / Audiobook (~120,000 characters)

EngineCostAudio Length
Standard$0.48~2.7 hours
Neural$1.92~2.7 hours
Long-Form$12.00~2.7 hours

Enterprise Scale: 10M Characters/Month (~230 Hours)

EngineMonthly CostAnnual Cost
Standard$40$480
Neural$160$1,920
Generative$300$3,600
Long-Form$1,000$12,000

For most teams, the eye-opening number is the Neural column. 230 hours of natural-sounding audio per month for $160 — that's less than a single month of ElevenLabs Pro ($99/mo for 500K characters, roughly 11 hours). Use our TTS cost calculator to model your specific volume.

Hidden Costs: What AWS Doesn't Highlight

Polly's per-character prices are straightforward. The costs that surprise people are the AWS infrastructure charges around Polly. These won't appear on Polly's pricing page, but they'll show up on your AWS bill:

CostPriceWhen It Applies
S3 Storage~$0.023/GBStoring generated audio files
S3 Data Transfer~$0.09/GB after 100 GB freeServing audio files to users/clients
Lambda$0.20/1M requestsIf using serverless functions to trigger Polly
CloudWatch~$0.30/metric/mo (detailed)Monitoring usage; basic metrics are free

In practice, these ancillary costs are minimal for most use cases. A serverless architecture (Lambda + S3 + Step Functions) costs roughly $5/month in infrastructure on top of Polly's character charges. The big cost savings tip: cache and replay generated audio. Polly doesn't charge for replaying previously generated speech. Pre-generate common phrases and store them in S3 — this eliminates re-synthesis charges entirely for repeated content like IVR prompts, notifications, and standard greetings.

SSML Is Free

SSML markup tags are not counted as billed characters. Only the actual spoken text counts toward your bill. This means you can use extensive SSML formatting — pauses, pronunciation overrides, emphasis, pitch and rate changes — without increasing costs. SSML is essentially a free way to improve audio quality. The SynthesizeSpeech API accepts up to 6,000 total characters per request, of which no more than 3,000 can be billable text.

Amazon Polly Pricing vs. Every Major Competitor

The TTS market has gotten crowded in 2026. Here's how Polly stacks up on price per million characters across every major service you're likely evaluating:

ServiceBest Price/1M CharsPremium Price/1M CharsFree Tier
Amazon Polly$4 (Standard)$100 (Long-Form)5M/mo × 12 mo
Grok TTS (xAI)$4.20$4.20None (beta)
Google Cloud TTS$4 (Standard)$30 (Studio/Chirp 3)4M/mo × 12 mo
Murf Falcon API$10 (conversational)$30 (studio quality)None
OpenAI TTS$15 (TTS-1)$30 (TTS-1-HD)None
Inworld TTS$25 (Mini, std pricing)$50 (Max, std pricing)Founder pricing until May 7
ElevenLabs$60 (Flash API)$300 (Creator overage)10K chars/mo
Chatterbox$0 (self-hosted)Compute costs onlyFully free (MIT license)

Polly's Standard and Neural engines are among the cheapest commercial TTS APIs available. Only Grok TTS ($4.20/1M) comes close to Standard on price, and Grok is still in beta with a smaller voice library. Google Cloud TTS matches Polly's Standard and Neural pricing exactly ($4/$16), making the choice between them mostly about ecosystem preference (AWS vs. GCP). For the full cross-service comparison, visit our TTS pricing comparison page.

When Amazon Polly Is (and Isn't) the Right Choice

Polly Wins When:

Skip Polly When:

For alternatives, check our Amazon Polly alternatives page or the TTS API comparison.

5 Ways to Cut Your Amazon Polly Bill

What Changed in March 2026

Amazon expanded Polly's Generative engine significantly in March 2026:

The bidirectional streaming is the most significant update — it positions Polly's Generative engine as a real option for real-time voice agents, competing with Murf Falcon, Cartesia Sonic, and Deepgram Aura-2 in that market. Explore all of Polly's features on our Amazon Polly overview page.

Which Engine Should You Choose?

  • Standard ($4/1M): IVR, notifications, automated alerts, accessibility where cost beats quality.
  • Neural ($16/1M): Customer-facing content, video narration, e-learning — the sweet spot for most production use cases.
  • Generative ($30/1M): Real-time voice agents, chatbots, conversational AI where streaming and expressiveness matter.
  • Long-Form ($100/1M): Audiobooks and extended narration only. The price premium is only worth it if you need sustained quality over 30+ minute passages.

Start with the free tier to test all four engines, estimate your monthly volume with the TTS cost calculator, and compare Polly against all TTS services before committing. For alternatives, browse our Amazon Polly alternatives page.