How Much Does Google Cloud TTS Cost?
Google Cloud Text-to-Speech costs $4 per million characters for Standard and WaveNet voices, $16/1M for Neural2, $30/1M for Chirp 3: HD, and $160/1M for Studio voices. The free tier covers 4M standard characters per month — roughly 83 hours of audio. WaveNet voices dropped to $4/1M in early 2026, making them the best value in Google's lineup since you get noticeably better quality than Standard at the exact same price.
Google's pricing model is pay-per-character with no monthly subscriptions or committed-use discounts. Every character counts — spaces, newlines, and SSML tags all hit your bill. New Google Cloud accounts get a $300 credit that covers all services including TTS, which buys you 75 million WaveNet characters before you pay a cent. That's enough to thoroughly evaluate every voice type before committing.
Compared to competitors, Google Cloud sits at the budget end of the spectrum. WaveNet at $4/1M undercuts OpenAI TTS ($15/1M), ElevenLabs ($24–$330/mo), and Deepgram Aura-2 ($30/1M). The trade-off is voice quality — Google's voices sound functional but rarely natural enough for content creation. They're built for IVR systems, Dialogflow bots, and accessibility features, not audiobook narration.
Google Cloud TTS at a Glance
Pricing by Voice Type
Google Cloud TTS has seven distinct voice types, each at a different price point. This is unusual — most competitors like OpenAI or ElevenLabs charge a flat rate for all voices. Google's tiered approach means you can optimize costs by matching voice quality to your use case.
| Voice Type | Price/1M Chars | Free Tier | Quality Level | Best For |
|---|---|---|---|---|
| Standard | $4 | 4M chars/mo | Basic | High-volume, low-quality-bar apps |
| WaveNet | $4 | 1M chars/mo | Good | Best value — same price, better quality |
| Neural2 | $16 | 1M chars/mo | Very Good | Customer-facing apps, mid-budget |
| Polyglot | $16 | 1M chars/mo | Very Good | Multilingual apps (same voice, many languages) |
| Chirp 3: HD | $30 | 1M chars/mo | Excellent | High-quality content, voice agents |
| Studio | $160 | 100K chars/mo | Premium | Brand voices, broadcast, ads |
| Instant Custom Voice | $60 | — | Varies | Custom brand voice cloning |
Gemini 2.5 Flash TTS and Gemini 2.5 Pro TTS are also available through Google Cloud but priced separately under Vertex AI. These require contacting sales for custom quotes and aren't part of the standard Cloud TTS API. If you're interested in Gemini's TTS capabilities specifically, check our best TTS API guide for a direct comparison.
Understanding the Voice Types
Google's naming is confusing. Seven voice types with overlapping quality tiers and different free allowances — it's a lot. Here's what actually matters at each price point.
Standard & WaveNet — $4/1M Characters
Standard voices are the oldest in Google's lineup — basic concatenative synthesis that sounds robotic by 2026 standards. They exist mainly for backward compatibility. WaveNet voices, originally based on DeepMind's WaveNet research, used to cost $16/1M but dropped to $4 in early 2026. This price drop is the single biggest change in Google's TTS pricing in years.
The implication is clear: there's almost no reason to use Standard voices anymore. WaveNet sounds noticeably better — smoother prosody, more natural intonation — and costs exactly the same. The only edge Standard has is a larger free tier (4M vs 1M characters). If you're past the free tier, always pick WaveNet.
At $4/1M, WaveNet undercuts nearly every commercial TTS API. For context, Amazon Polly Standard also charges $4/1M, but Polly Neural is $16/1M. OpenAI is $15/1M. Google's WaveNet at $4/1M is the cheapest way to get decent-quality cloud TTS from a major provider.
Neural2 & Polyglot — $16/1M Characters
Neural2 builds on WaveNet with better prosody modeling and more natural pausing. The actual difference: Neural2 handles sentence boundaries and emphasis more convincingly. Read a complex sentence with a parenthetical clause aloud — WaveNet plows through it at a flat pace, while Neural2 drops its pitch and slows slightly for the aside, the way a human would. Whether that's worth 4x the price depends on context. For a Dialogflow bot answering customer questions, Neural2's improved cadence reduces user confusion on longer responses. For a notification system reading "Your order has shipped," WaveNet is indistinguishable.
Polyglot voices are priced identically to Neural2 but serve a different purpose: a single voice that speaks multiple languages with consistent tone and personality. This is valuable for multilingual apps where you want the same "character" across languages, rather than switching to a completely different voice per locale. Not many competitors offer this — it's a genuine differentiator.
Chirp 3: HD — $30/1M Characters
Launched in March 2026, Chirp 3: HD is Google's most natural TTS model. It uses a generative approach similar to what you hear from ElevenLabs or Fish Audio — more expressive, better at conveying emotion, and capable of more natural conversational rhythm. Ten initial voices (Tiffany, Brian, Aria, and others) with distinct personalities.
At $30/1M, Chirp 3: HD is priced identically to Deepgram Aura-2 and double OpenAI TTS ($15/1M). The quality is competitive with mid-tier commercial TTS but doesn't quite match ElevenLabs v3 or Fish Audio S2 in blind listening tests. If you're already on Google Cloud and need better-than-Neural2 quality, Chirp 3: HD is the obvious upgrade path. If you're starting fresh and quality is the top priority, I'd look at the competition first.
Studio — $160/1M Characters
Studio voices are Google's premium tier — hand-tuned, professionally recorded voices designed for broadcast and advertising. At $160/1M characters, they're 40x more expensive than WaveNet. The free tier is tiny (100K chars, roughly 2 hours of audio).
Honestly, it's hard to justify Studio pricing in 2026. ElevenLabs offers better-sounding voices at lower effective rates, and the new Chirp 3: HD voices get close enough for most use cases at $30/1M. Studio made sense when it was the only high-quality option in Google's lineup, but that ship has sailed.
Instant Custom Voice — $60/1M Characters
New in 2026, Instant Custom Voice lets you clone a voice from a short audio sample. You upload a recording and Google generates a TTS voice that approximates the speaker's characteristics. At $60/1M, it's expensive but cheaper than ElevenLabs' Professional Voice Clone. The quality depends heavily on the input audio — clean studio recordings produce decent results, noisy phone recordings don't. No free tier is available for this feature.
Free Tier: How Much Audio Can You Actually Generate?
Google's free tier resets monthly and doesn't expire — it's a permanent allowance, not a trial. At roughly 800 characters per minute of speech, here's what each tier translates to in real audio output:
| Voice Type | Free Chars/Month | Hours of Audio | ~Word Count |
|---|---|---|---|
| Standard | 4,000,000 | ~83 hours | ~666,000 words |
| WaveNet | 1,000,000 | ~21 hours | ~166,000 words |
| Neural2 | 1,000,000 | ~21 hours | ~166,000 words |
| Chirp 3: HD | 1,000,000 | ~21 hours | ~166,000 words |
| Studio | 100,000 | ~2 hours | ~16,600 words |
The free tiers stack — you can use 4M Standard characters AND 1M WaveNet characters AND 1M Neural2 characters in the same month. That's over 125 hours of free audio across all tiers combined. For small projects, you might never pay anything.
On top of the monthly free tier, new Google Cloud accounts get a $300 credit valid for 90 days. At WaveNet rates ($4/1M), that $300 buys 75 million characters — roughly 1,562 hours of audio. It's the most generous new-account offer in the TTS space.
How Does Google's Free Tier Compare?
- Google Cloud TTS: 4M Standard + 1M WaveNet + 1M Neural2 + 1M Chirp 3 + 100K Studio chars/month (permanent) + $300 new account credit
- Amazon Polly: 5M Standard + 1M Neural chars — but only for 12 months, then it's gone
- Azure Cognitive Services: 500K chars/month (permanent) across all neural voices
- OpenAI TTS: No free tier — pay from the first character
- ElevenLabs: ~10,000 credits/month (roughly 10 minutes of audio)
Google wins the free tier comparison decisively. The combination of permanent monthly allowances plus the $300 new account credit means you can run a production TTS system for months without spending anything. Use our TTS cost calculator to model your specific usage and see exactly when you'd hit the paid tier.
Real-World Cost Examples
Abstract per-million pricing doesn't mean much without context. Here's what Google Cloud TTS actually costs for three common scenarios, calculated after subtracting the free tier:
| Use Case | Volume | Standard ($4) | Neural2 ($16) | Chirp 3 HD ($30) |
|---|---|---|---|---|
| Blog Narrator | 100K chars/mo | $0 (free tier) | $0 (free tier) | $0 (free tier) |
| Customer Service Bot | 5M chars/mo | $4/mo | $64/mo | $120/mo |
| Enterprise IVR | 50M chars/mo | $184/mo | $784/mo | $1,470/mo |
The cost math for the customer service bot: 5M chars minus 1M free (Neural2) = 4M billable at $16/1M = $64/month. For WaveNet at the same volume: 5M minus 1M free = 4M billable at $4/1M = just $16/month. That's why the WaveNet price drop matters so much — you can run a medium-scale bot for the price of a few coffees.
At enterprise scale (50M chars/month), the voice type choice becomes a serious cost decision. WaveNet at $184/month vs Neural2 at $784 vs Chirp 3: HD at $1,470. That's an 8x difference between the cheapest and the best quality. Most enterprises I've seen use WaveNet or Neural2 for IVR and reserve Chirp 3: HD for customer-facing conversational AI where quality directly impacts user satisfaction.
Google Cloud TTS vs Amazon Polly vs Azure vs OpenAI
Here's how Google Cloud stacks up against every major TTS provider. I've verified each price as of June 2026. For the full interactive comparison, see our TTS pricing comparison page.
| Service | Price/1M Chars | Free Tier | Voices | Languages | Best For |
|---|---|---|---|---|---|
| Google Cloud (WaveNet) | $4 | 1M chars/mo | 700+ | 40+ | Budget TTS, Google Cloud apps |
| Google Cloud (Chirp 3: HD) | $30 | 1M chars/mo | 10 | 40+ | Quality within Google ecosystem |
| Amazon Polly Standard | $4 | 5M chars (12 mo only) | 60+ | 30+ | AWS-native, high-volume |
| Amazon Polly Neural | $16 | 1M chars (12 mo only) | 60+ | 30+ | AWS-native, quality upgrade |
| Azure Neural | $16 | 500K chars/mo | 500+ | 100+ | Azure apps, language breadth |
| Azure Neural HD | $22 | 500K chars/mo | 500+ | 100+ | Premium Azure quality |
| OpenAI TTS | $15 | None | 9 | 57+ | Simple API, good quality/price |
| ElevenLabs | $24–$330/mo | 10K credits/mo | 4,000+ | 70+ | Voice cloning, top quality |
| Deepgram Aura-2 | $30 | $200 credit | 40+ | 7 | Voice agents (STT+TTS bundle) |
| Fish Audio | $15 | 10K chars/day | Community | 80+ | Quality/price (#1 blind tests) |
Google Cloud WaveNet at $4/1M is the cheapest cloud TTS option alongside Amazon Polly Standard. But cheap doesn't always mean best value. OpenAI at $15/1M delivers considerably better voice quality with a much simpler API. Fish Audio at $15/1M tops blind listening tests. ElevenLabs remains the quality leader with unmatched voice cloning. The right choice depends on whether you're optimizing for cost, quality, or ecosystem lock-in.
For a zero-cost alternative, check out open-source TTS models like Kokoro — they've gotten surprisingly good and cost nothing to self-host beyond compute.
Hidden Costs and Gotchas
Google Cloud TTS pricing looks straightforward, but there are billing behaviors that inflate your actual bill beyond naive character counts. Example: a team sending 2M characters of SSML-wrapped text per month expected a $8 WaveNet bill. The SSML tags added ~40% more billable characters, spaces and newlines added another ~10%, and the 100-character minimum per request padded short phrases — actual bill was $14. Here are the gotchas that cause that gap.
- SSML markup counts as characters: If you use SSML to control pronunciation, pauses, or emphasis, every character of every tag (<speak>, <break>, <prosody>) counts toward your billing. A 500-character sentence wrapped in SSML can become 800+ billed characters. This is unique to Google — some competitors strip markup before counting.
- Spaces and newlines are billed: Every whitespace character counts. A block of text with generous paragraph spacing costs more than the same words with minimal formatting. Trim your input text before sending it to the API.
- No response caching: Synthesizing the same text twice costs twice. Amazon Polly lets you cache and reuse generated audio without additional charges. With Google Cloud TTS, you're paying per request, so build your own caching layer if you're re-synthesizing common phrases.
- Data egress fees: Google Cloud charges for network egress when you download generated audio. Standard egress pricing applies — $0.12/GB for the first 10 TB. For a high-volume TTS application serving audio files directly, this adds up. Store generated audio in Cloud Storage and serve through a CDN to minimize egress costs.
- API request overhead: Each API call has HTTP/gRPC overhead. If you're synthesizing many short phrases (e.g., individual UI elements), the per-request latency and any associated Cloud Run or App Engine hosting costs can exceed the TTS cost itself. Batch short phrases into single requests when possible.
- Region pricing is uniform — for now: Google currently charges the same TTS rate globally. But your Cloud project's region affects egress costs and latency. Running in us-central1 is fine for North American users; serving audio to Europe from a US region adds both latency and egress fees.
- Minimum billing per request: Each API request is billed for a minimum of 100 characters, even if you send fewer. Sending 10 characters? You're paying for 100. This matters if you're synthesizing lots of very short strings.
What Changed in March 2026
Google shipped a major TTS update in March 2026 — the biggest refresh since Neural2 launched. Here's what's new and how it affects pricing:
- 10 new Chirp 3: HD Generative voices: Named voices (Tiffany, Brian, Aria, and others) with distinct personalities and speaking styles. These are Google's answer to ElevenLabs' pre-built voices — more character, more expression, less robotic. Priced at $30/1M with 1M free chars/month.
- Bidirectional Streaming API: A new real-time streaming endpoint designed for voice agents. You can send text chunks as they're generated by an LLM and receive audio back simultaneously, cutting time-to-first-byte dramatically. Same per-character pricing, but the streaming architecture reduces perceived latency for conversational use cases.
- Instant Custom Voice: Upload a short audio sample and Google generates a cloned voice in minutes. At $60/1M characters, it's a mid-range option for custom brand voices. The quality is decent for professional recordings but noticeably worse than ElevenLabs' voice cloning for complex vocal characteristics.
- WaveNet price drop to $4/1M: Previously $16/1M, now matching Standard pricing. This is the change that matters most for existing users — a 75% cost reduction for WaveNet voices with no quality change. If you were using Standard voices to save money, switch to WaveNet immediately.
The WaveNet price drop signals Google's strategy: commoditize the lower tiers to drive adoption, then upsell to Chirp 3: HD and Studio for premium use cases. It's the same playbook they used with Google Maps and BigQuery — make the entry point free or cheap enough that developers build on the platform, then capture revenue at scale.
When to Choose Google Cloud TTS
Choose Google Cloud TTS If
- You're already on Google Cloud / using Dialogflow
- Budget is the top priority (WaveNet at $4/1M)
- You need multilingual support (40+ languages)
- Free tier matters (4M+ chars/month combined)
- You want Polyglot voices for cross-language consistency
- Enterprise compliance: Google Cloud BAA, SOC 2, HIPAA
Skip Google Cloud TTS If
- Voice quality is paramount (→ ElevenLabs)
- You need voice cloning beyond basics (→ ElevenLabs, Fish Audio)
- Lowest latency matters (→ Cartesia at 40ms)
- You want a simple API (Google's has 7 voice types to navigate)
- You don't want Google Cloud vendor lock-in
- You need a simpler pricing model (→ OpenAI charges one flat rate)
Here is the decision in one sentence: if you are already running on Google Cloud and your TTS budget matters more than voice quality, use WaveNet at $4/1M and stop reading. If you are not locked into Google Cloud, spend the extra $11/1M on OpenAI TTS or Fish Audio — you get dramatically better voices, a simpler API, and no seven-tier pricing maze to navigate. Google Cloud TTS is the Honda Civic of TTS: reliable, cheap, gets you there, never the car anyone brags about.
Not sure which service fits your budget? Run your numbers through our TTS cost calculator or compare all 11 TTS services side by side.
Frequently Asked Questions
Is Google Cloud Text-to-Speech free?
Partially. Google Cloud TTS has a permanent free tier that gives you 4M Standard characters, 1M WaveNet characters, 1M Neural2 characters, 1M Chirp 3: HD characters, and 100K Studio characters per month at no cost. New Google Cloud accounts also get a $300 credit. For small projects under the free tier limits, you can use Google Cloud TTS indefinitely without paying.
Why did WaveNet pricing drop to $4/1M?
Google reduced WaveNet pricing from $16/1M to $4/1M in early 2026, matching Standard voice pricing. The likely reason: WaveNet was originally premium because the synthesis was computationally expensive. Hardware improvements and model optimization have reduced that cost. Google also wants to push users toward the newer Chirp 3: HD voices ($30/1M) as the premium option, making WaveNet the new baseline.
How does Google Cloud TTS compare to Amazon Polly?
Both Google Cloud Standard and Amazon Polly Standard charge $4/1M characters. Polly Neural is $16/1M, matching Google's Neural2 tier. The key differences: Google's free tier is permanent while Polly's expires after 12 months. Polly allows audio caching (synthesize once, reuse forever), Google doesn't. Google has more languages (40+ vs 30+) and voice types. Polly has a simpler pricing model with only two tiers. Choose based on which cloud you're already using.
What's the difference between Neural2 and Chirp 3: HD?
Neural2 ($16/1M) uses a traditional neural TTS architecture with improved prosody modeling. Chirp 3: HD ($30/1M) uses a generative approach that produces more expressive, natural-sounding speech with better emotional range. In practice, Chirp 3: HD sounds noticeably more human — closer to ElevenLabs quality than to the older Google voice types. The price difference reflects this quality gap.
Does Google Cloud TTS charge for SSML tags?
Yes. Every character in your SSML input counts toward billing, including the markup tags themselves. A <break time="500ms"/> tag is 23 billable characters even though it produces silence. If you use heavy SSML formatting, your bill can be 30-50% higher than the raw text character count suggests. Strip unnecessary SSML where possible.
Can I use Google Cloud TTS for commercial projects?
Yes. Google Cloud TTS output can be used in commercial applications, products, and services without additional licensing fees. You own the generated audio. However, you can't use it to create voice replicas of specific public figures, and Instant Custom Voice has additional terms requiring consent from the voice owner. Check Google's terms of service for your specific use case.
Is Google Cloud TTS good for audiobooks?
Not really. Even Chirp 3: HD — Google's best voice type — doesn't match the naturalness needed for long-form listening. For audiobooks, ElevenLabs or Fish Audio produce significantly better results. Google Cloud TTS is better suited for short-form content: IVR prompts, chatbot responses, accessibility features, and notification audio. For a dedicated audiobook comparison, see our best TTS API guide.
How do I estimate my Google Cloud TTS costs?
Count the total characters in your text input (including spaces and any SSML markup), subtract the free tier for your chosen voice type, then multiply the remainder by the per-million rate. For example: 2M characters of WaveNet minus 1M free = 1M billable at $4/1M = $4/month. Our TTS cost calculator does this math automatically for Google Cloud and all major competitors.
Related Guides
By TextToLab Research Team · Last verified June 2026 against Google Cloud's official Text-to-Speech pricing page (cloud.google.com/text-to-speech/pricing). Competitor rates verified against each provider's pricing page. March 2026 updates confirmed via Google Cloud blog and API changelog.