Guide12 min readMay 20, 2026

By TextToLab Research Team

Inworld TTS Pricing 2026: TTS-2 API Costs, Plans & Voice Agent Economics

Inworld TTS costs $15-$35/1M characters with volume discounts to $5-$10/1M at enterprise scale. Full breakdown of TTS-2 vs 1.5 pricing, subscription tiers, voice agent cost examples, and competitor comparison.

Inworld TTS Pricing at a Glance (May 2026)

Inworld TTS costs $15-$35 per million characters depending on model and plan tier. TTS-2 (the new closed-loop voice model launched May 5, 2026) starts at $35/1M on-demand and drops to $25/1M at Growth tier. TTS 1.5 Mini starts at $25/1M and drops to $15/1M. Enterprise rates go as low as $10/1M for TTS-2 and $5/1M for Mini — making Inworld's #1-ranked voice AI surprisingly affordable at scale.

ModelOn-DemandDeveloper (-20%)Growth (-40%)Enterprise
TTS-2 & 1.5 Max$35/1M$30/1M$25/1MAs low as $10/1M
TTS 1.5 Mini$25/1M$20/1M$15/1MAs low as $5/1M

Why TTS-2 and 1.5 Max Cost the Same

Inworld prices TTS-2 identically to TTS 1.5 Max even though TTS-2 is a fundamentally different architecture. TTS-2 is a closed-loop model that listens to the conversation and adapts its tone in real time — a capability no other TTS service offers at any price. The pricing parity means you get a generational leap in technology for the same cost.

Subscription Plans and Volume Discounts

Inworld uses a credit-based subscription model where monthly payments convert to API credits at discounted rates. Higher tiers unlock steeper per-character discounts:

PlanMonthly CostMonthly CreditsTTS-2 RateCustom Voices
On-Demand$0 (pay-per-use)40 free min trial$35/1M5
Creator$25/mo$25 in credits$35/1M10
Developer$300/mo$300 in credits$30/1M (20% off)100
Growth$1,500/mo$1,500 in credits$25/1M (40% off)3,000
EnterpriseCustomCustomAs low as $10/1MCustom

The math on Developer plan: $300/month gets you $300 in credits at the 20% discounted rate. At $30/1M chars for TTS-2, that's 10 million characters per month — roughly 115 hours of audio. Compare that to ElevenLabs Scale ($330/mo for 2M credits = ~33 hours of Multilingual audio). Inworld gives you 3.5x more audio output at the same price point, with the #1-ranked voice quality.

What Makes TTS-2 Worth the Premium

TTS-2 launched May 5, 2026 as a "closed-loop voice model" — the first TTS that actually listens to the conversation before speaking. That sounds like marketing, but the architecture is genuinely different from every other TTS on the market:

For the full technical analysis of how Inworld compares to competitors on quality, latency, and features, read our Inworld TTS 1.5 review. The review covers TTS 1.5 Max in depth; TTS-2 builds on that foundation with the closed-loop architecture.

Voice Agent Economics: What TTS Actually Costs in Production

Inworld's sweet spot is voice agents — AI-powered phone bots, customer support, sales qualification. Here's what production deployments actually cost at the Developer tier ($30/1M for TTS-2):

Use CaseDaily VolumeMonthly CharsInworld TTS-2ElevenLabs Flash
Support bot (500 calls, 3 min avg)500 calls/day~11.3M$339$678
Sales qualifier (200 calls, 5 min avg)200 calls/day~7.5M$225$450
Virtual companion (8 hrs/day)Continuous~18M$540$1,080
Contact center (10K calls, 4 min avg)10,000 calls/day~300M$3,000 (Enterprise)$18,000

The calculation: A typical 3-minute voice agent call generates about 750 characters of TTS output (the agent speaks roughly half the time, at ~125 words/min = ~625 chars + some padding). Monthly estimate: calls/day × 30 days × 750 chars.

At the enterprise contact center scale, Inworld's advantage is massive: $3,000/month at $10/1M (enterprise rate) versus $18,000 for ElevenLabs Flash. That's an 83% cost reduction — and you get the #1-ranked voice quality plus TTS-2's conversational awareness, which genuinely matters when your voice agent needs to respond to an upset customer differently than a casual inquiry.

Integration Costs: The Full Voice Agent Stack

TTS is just one piece of a voice agent. Inworld integrates with the major orchestration platforms, and each adds its own costs:

For the 500-call/day support bot scenario above, total monthly costs might look like: Inworld TTS ($339) + Vapi ($0.05/min × 500 calls × 6 min × 30 days = $4,500) + STT ($500) + LLM ($200) = ~$5,500. The TTS is typically 5-10% of total voice agent costs — the telephony and orchestration platform eat most of the budget.

Hidden Costs and Gotchas

TTS 1.5 vs TTS-2: Which Model to Use

FeatureTTS-2TTS 1.5 MaxTTS 1.5 Mini
Price (On-Demand)$35/1M$35/1M$25/1M
P90 Latency<250ms<250ms<130ms
Conversational AwarenessYes (closed-loop)NoNo
Voice DirectionNatural language proseNoNo
Non-Verbal Sounds[laugh], [sigh], [cough], [breathe], [clear_throat]NoNo
Languages100+ (cross-lingual)1515
Best ForVoice agents, companionsHigh-quality contentBudget-sensitive agents

My recommendation: Use TTS-2 for any voice agent or conversational AI where the system responds to real users in real time — the conversational awareness genuinely changes the experience. Use TTS 1.5 Mini for high-volume, cost-sensitive applications where quality is "good enough" (IVR, notifications, basic phone bots). Use TTS 1.5 Max for pre-rendered content where you want Inworld's quality without TTS-2's conversational overhead.

Inworld vs Competitors for Voice Agents

Voice agent builders care about three things: quality, latency, and cost. Here's how the options stack up:

ServiceCost/1M CharsLatency (P90)Arena RankEmotion Awareness
Inworld TTS-2$25-$35<250ms#1 (ELO 1,236)Yes (closed-loop)
Cartesia Sonic 3~$3740ms#10 (ELO 1,054)No
ElevenLabs Flash$60~300ms#4 (ELO 1,179)No
Fish Audio S2 Pro$15~150msELO 1,128 / #1 blind15,000+ emotion tags
OpenAI gpt-4o-mini-tts~$15~300msNot rankedSteerable via prompt
Gemini Flash TTS~$12~250ms#2 (ELO 1,211)200+ audio tags
Grok TTS (beta)$4.20~400msNot rankedNo

The tradeoff is clear: Cartesia wins on raw latency (40ms), but Inworld wins on quality (#1 ELO) and has the unique closed-loop awareness that no competitor matches. For voice agents where the AI needs to respond empathetically — think healthcare, mental health, customer retention calls — TTS-2's conversational awareness isn't a nice-to-have, it's the product differentiator.

For pure cost optimization where quality is secondary, Grok ($4.20/1M) or Gemini Flash (~$12/1M) are hard to beat. For a deeper speed-vs-quality analysis, see our Cartesia vs ElevenLabs comparison.

Who Should (and Shouldn't) Choose Inworld TTS

Pick Inworld if

  • Building voice agents that need empathetic, adaptive responses
  • Quality is the top priority (Arena #1)
  • You need 100+ language support with voice identity
  • You want non-verbal cues (laughter, sighs) in conversation
  • Enterprise scale with negotiable rates

Skip Inworld if

  • You need a web editor or studio UI (use ElevenLabs)
  • Budget is the #1 factor (use Grok or Gemini)
  • You need sub-50ms latency (use Cartesia)
  • You want a free tier for ongoing use (use ElevenLabs or Chatterbox)
  • You're creating audiobooks or pre-rendered content (the closed-loop advantage is wasted)

Related Pricing Guides

Use our TTS cost calculator to estimate Inworld costs at your specific volume. For the complete quality and feature analysis, read our full Inworld TTS review.

Comparing TTS options for a voice agent project? Our best text-to-speech guide covers all major providers, and the Canva TTS guide covers simpler alternatives for non-developers.

By TextToLab Research Team · Pricing verified May 2026 against inworld.ai/pricing and Inworld TTS API docs. TTS-2 launched May 5, 2026. Arena rankings from Artificial Analysis.