Inworld TTS Pricing at a Glance (May 2026)
Inworld TTS costs $15-$35 per million characters depending on model and plan tier. TTS-2 (the new closed-loop voice model launched May 5, 2026) starts at $35/1M on-demand and drops to $25/1M at Growth tier. TTS 1.5 Mini starts at $25/1M and drops to $15/1M. Enterprise rates go as low as $10/1M for TTS-2 and $5/1M for Mini — making Inworld's #1-ranked voice AI surprisingly affordable at scale.
| Model | On-Demand | Developer (-20%) | Growth (-40%) | Enterprise |
|---|---|---|---|---|
| TTS-2 & 1.5 Max | $35/1M | $30/1M | $25/1M | As low as $10/1M |
| TTS 1.5 Mini | $25/1M | $20/1M | $15/1M | As low as $5/1M |
Why TTS-2 and 1.5 Max Cost the Same
Inworld prices TTS-2 identically to TTS 1.5 Max even though TTS-2 is a fundamentally different architecture. TTS-2 is a closed-loop model that listens to the conversation and adapts its tone in real time — a capability no other TTS service offers at any price. The pricing parity means you get a generational leap in technology for the same cost.
Subscription Plans and Volume Discounts
Inworld uses a credit-based subscription model where monthly payments convert to API credits at discounted rates. Higher tiers unlock steeper per-character discounts:
| Plan | Monthly Cost | Monthly Credits | TTS-2 Rate | Custom Voices |
|---|---|---|---|---|
| On-Demand | $0 (pay-per-use) | 40 free min trial | $35/1M | 5 |
| Creator | $25/mo | $25 in credits | $35/1M | 10 |
| Developer | $300/mo | $300 in credits | $30/1M (20% off) | 100 |
| Growth | $1,500/mo | $1,500 in credits | $25/1M (40% off) | 3,000 |
| Enterprise | Custom | Custom | As low as $10/1M | Custom |
The math on Developer plan: $300/month gets you $300 in credits at the 20% discounted rate. At $30/1M chars for TTS-2, that's 10 million characters per month — roughly 115 hours of audio. Compare that to ElevenLabs Scale ($330/mo for 2M credits = ~33 hours of Multilingual audio). Inworld gives you 3.5x more audio output at the same price point, with the #1-ranked voice quality.
What Makes TTS-2 Worth the Premium
TTS-2 launched May 5, 2026 as a "closed-loop voice model" — the first TTS that actually listens to the conversation before speaking. That sounds like marketing, but the architecture is genuinely different from every other TTS on the market:
- Conversational awareness. TTS-2 takes the actual audio of prior conversation turns as input, not just a transcript. It picks up the user's tone, pacing, and emotional state, then adjusts its delivery accordingly. The same line sounds different after a joke versus after bad news.
- Natural-language voice direction. Instead of preset emotion tags, you write prose descriptions like "tired but warm after a long day" and TTS-2 interprets them. Think of it as a system prompt for voice.
- Inline non-verbal cues. Five commands render as actual audio: [laugh], [breathe], [clear_throat], [sigh], [cough]. Drop them anywhere in text and they produce natural sounds — not the text-artifact versions most TTS services output.
- 100+ languages with voice identity preservation. Switch languages mid-sentence and the speaker still sounds like the same person. Cross-lingual voice cloning is built in.
- Three stability modes. Control the tradeoff between expressiveness and consistency depending on your use case — more predictable for IVR systems, more expressive for companions.
For the full technical analysis of how Inworld compares to competitors on quality, latency, and features, read our Inworld TTS 1.5 review. The review covers TTS 1.5 Max in depth; TTS-2 builds on that foundation with the closed-loop architecture.
Voice Agent Economics: What TTS Actually Costs in Production
Inworld's sweet spot is voice agents — AI-powered phone bots, customer support, sales qualification. Here's what production deployments actually cost at the Developer tier ($30/1M for TTS-2):
| Use Case | Daily Volume | Monthly Chars | Inworld TTS-2 | ElevenLabs Flash |
|---|---|---|---|---|
| Support bot (500 calls, 3 min avg) | 500 calls/day | ~11.3M | $339 | $678 |
| Sales qualifier (200 calls, 5 min avg) | 200 calls/day | ~7.5M | $225 | $450 |
| Virtual companion (8 hrs/day) | Continuous | ~18M | $540 | $1,080 |
| Contact center (10K calls, 4 min avg) | 10,000 calls/day | ~300M | $3,000 (Enterprise) | $18,000 |
The calculation: A typical 3-minute voice agent call generates about 750 characters of TTS output (the agent speaks roughly half the time, at ~125 words/min = ~625 chars + some padding). Monthly estimate: calls/day × 30 days × 750 chars.
At the enterprise contact center scale, Inworld's advantage is massive: $3,000/month at $10/1M (enterprise rate) versus $18,000 for ElevenLabs Flash. That's an 83% cost reduction — and you get the #1-ranked voice quality plus TTS-2's conversational awareness, which genuinely matters when your voice agent needs to respond to an upset customer differently than a casual inquiry.
Integration Costs: The Full Voice Agent Stack
TTS is just one piece of a voice agent. Inworld integrates with the major orchestration platforms, and each adds its own costs:
- Vapi — $0.05/min platform fee + telephony costs. The most popular voice agent platform. Inworld TTS is available as a provider option.
- LiveKit — Open-source with hosted option. No per-minute platform fee on self-hosted. Inworld TTS integrates via their Agents framework.
- Pipecat — Open-source Python framework by Daily.co. Free to use. Inworld is a supported TTS provider.
- NLX — Enterprise conversational AI. Custom pricing. Inworld is a partner for voice generation.
- Voximplant — Cloud communications platform. Per-minute telephony charges apply on top of TTS costs.
For the 500-call/day support bot scenario above, total monthly costs might look like: Inworld TTS ($339) + Vapi ($0.05/min × 500 calls × 6 min × 30 days = $4,500) + STT ($500) + LLM ($200) = ~$5,500. The TTS is typically 5-10% of total voice agent costs — the telephony and orchestration platform eat most of the budget.
Hidden Costs and Gotchas
- No ongoing free tier. Unlike ElevenLabs (10K credits/month forever) or Amazon Polly (5M chars/12 months), Inworld gives you 40 free minutes to evaluate — then you pay. Good for testing, not for building a prototype.
- API-only — no consumer-facing product. There's no web editor, no drag-and-drop studio, no mobile app. You write code to use Inworld TTS. If you need a visual interface, look at Murf AI or ElevenLabs.
- TTS-2 is still in research preview. Production readiness may vary. The pricing is set, but availability and rate limits during the preview period may be more restrictive than TTS 1.5.
- Creator plan is barely a discount. At $25/month, the Creator tier gives $25 in credits but no rate discount — you still pay $35/1M for TTS-2. The discount kicks in at Developer ($300/month, 20% off) and Growth ($1,500/month, 40% off).
- STT is separate. Inworld's speech-to-text (STT 1) costs $0.35/hour on-demand. For voice agents, you need both STT and TTS — factor in both costs.
TTS 1.5 vs TTS-2: Which Model to Use
| Feature | TTS-2 | TTS 1.5 Max | TTS 1.5 Mini |
|---|---|---|---|
| Price (On-Demand) | $35/1M | $35/1M | $25/1M |
| P90 Latency | <250ms | <250ms | <130ms |
| Conversational Awareness | Yes (closed-loop) | No | No |
| Voice Direction | Natural language prose | No | No |
| Non-Verbal Sounds | [laugh], [sigh], [cough], [breathe], [clear_throat] | No | No |
| Languages | 100+ (cross-lingual) | 15 | 15 |
| Best For | Voice agents, companions | High-quality content | Budget-sensitive agents |
My recommendation: Use TTS-2 for any voice agent or conversational AI where the system responds to real users in real time — the conversational awareness genuinely changes the experience. Use TTS 1.5 Mini for high-volume, cost-sensitive applications where quality is "good enough" (IVR, notifications, basic phone bots). Use TTS 1.5 Max for pre-rendered content where you want Inworld's quality without TTS-2's conversational overhead.
Inworld vs Competitors for Voice Agents
Voice agent builders care about three things: quality, latency, and cost. Here's how the options stack up:
| Service | Cost/1M Chars | Latency (P90) | Arena Rank | Emotion Awareness |
|---|---|---|---|---|
| Inworld TTS-2 | $25-$35 | <250ms | #1 (ELO 1,236) | Yes (closed-loop) |
| Cartesia Sonic 3 | ~$37 | 40ms | #10 (ELO 1,054) | No |
| ElevenLabs Flash | $60 | ~300ms | #4 (ELO 1,179) | No |
| Fish Audio S2 Pro | $15 | ~150ms | ELO 1,128 / #1 blind | 15,000+ emotion tags |
| OpenAI gpt-4o-mini-tts | ~$15 | ~300ms | Not ranked | Steerable via prompt |
| Gemini Flash TTS | ~$12 | ~250ms | #2 (ELO 1,211) | 200+ audio tags |
| Grok TTS (beta) | $4.20 | ~400ms | Not ranked | No |
The tradeoff is clear: Cartesia wins on raw latency (40ms), but Inworld wins on quality (#1 ELO) and has the unique closed-loop awareness that no competitor matches. For voice agents where the AI needs to respond empathetically — think healthcare, mental health, customer retention calls — TTS-2's conversational awareness isn't a nice-to-have, it's the product differentiator.
For pure cost optimization where quality is secondary, Grok ($4.20/1M) or Gemini Flash (~$12/1M) are hard to beat. For a deeper speed-vs-quality analysis, see our Cartesia vs ElevenLabs comparison.
Who Should (and Shouldn't) Choose Inworld TTS
Pick Inworld if
- Building voice agents that need empathetic, adaptive responses
- Quality is the top priority (Arena #1)
- You need 100+ language support with voice identity
- You want non-verbal cues (laughter, sighs) in conversation
- Enterprise scale with negotiable rates
Skip Inworld if
- You need a web editor or studio UI (use ElevenLabs)
- Budget is the #1 factor (use Grok or Gemini)
- You need sub-50ms latency (use Cartesia)
- You want a free tier for ongoing use (use ElevenLabs or Chatterbox)
- You're creating audiobooks or pre-rendered content (the closed-loop advantage is wasted)
Related Pricing Guides
Use our TTS cost calculator to estimate Inworld costs at your specific volume. For the complete quality and feature analysis, read our full Inworld TTS review.
Comparing TTS options for a voice agent project? Our best text-to-speech guide covers all major providers, and the Canva TTS guide covers simpler alternatives for non-developers.
By TextToLab Research Team · Pricing verified May 2026 against inworld.ai/pricing and Inworld TTS API docs. TTS-2 launched May 5, 2026. Arena rankings from Artificial Analysis.