Guide11 min readMay 20, 2026

By TextToLab Research Team

Fish Audio Pricing 2026: S2 Pro API at $15/1M — 11x Cheaper Than ElevenLabs

Fish Audio costs $0-$749/month across 4 plans with API pricing at $15/1M characters. S2 Pro ranked #1 in blind tests. Full plan breakdown, real-world cost examples, self-hosting economics, and 11-service comparison.

Fish Audio Pricing at a Glance (May 2026)

Fish Audio charges $15 per million UTF-8 bytes through its API — roughly $15 per million English characters. The subscription plans range from Free (8,000 credits, ~7 minutes) to Max ($749/month, 25 million credits). That $15/1M rate is 11x cheaper than ElevenLabs Multilingual v2 at $120/1M, while Fish Audio S2 Pro ranked #1 in blind A/B tests with a Bradley-Terry score 1.7x higher than the next best model.

PlanPrice/MonthCredits≈ Generation TimeCommercial Use
Free$08,000~7 minNo
Plus$11250,000~200 minYes
Pro$752,000,000~1,620 min (27 hrs)Yes
Max$74925,000,000~6,250 min (104 hrs)Yes

The Credit Math

Fish Audio measures credits, not characters directly. One minute of generated audio costs roughly 600-625 credits. The Plus plan's 250,000 credits gives you about 200 minutes — enough for 40 YouTube videos at 5 minutes each, or about 13 podcast episodes at 15 minutes. Annual billing saves 33% across all plans ($11/mo becomes ~$7.33/mo on Plus).

What Each Plan Actually Includes

Free Tier — Personal Testing Only

8,000 credits, 500 characters per generation, 3 public voice slots, and access to both S1 and S2 models. No commercial use. Standard voice cloning only. That 500-character limit per generation is the real constraint — you can't process anything longer than a couple of paragraphs at once. For comparison, ElevenLabs' free tier gives 10,000 credits with no per-generation character limit.

Plus ($11/month) — Where Most Creators Should Start

The jump from Free to Plus is massive. 250,000 credits (~200 minutes), 15,000 characters per generation, unlimited public voices plus 10 private voice slots, enhanced voice cloning, commercial use rights, and API access at pay-as-you-go rates. At $11/month, this is cheaper than Speechify Premium ($139/year = $11.58/mo) and gives you actual content creation tools rather than just a reading app.

Who it's for: Solo creators narrating blog posts, short videos, or social media content. If you're producing under 3 hours of audio per month, Plus covers it.

Pro ($75/month) — Production Volume

2 million credits, 30,000 characters per generation, unlimited voice slots, 3 team seats, and enhanced cloning. That 30K character limit means you can process a full book chapter in one shot. At $75/month for ~27 hours of audio, the effective per-minute cost is about $0.046/minute — less than a tenth of ElevenLabs Pro ($99/mo for ~8 hours of Multilingual audio).

Max ($749/month) — Enterprise-Adjacent

25 million credits, 10 team seats, everything in Pro plus priority generation. At $749/month for ~104 hours, you're paying about $0.12/minute — still dramatically cheaper than ElevenLabs Scale ($330/mo for ~33 hours). The Max plan exists for content studios and SaaS products that burn through TTS at scale.

API Pricing: Pay-as-You-Go Rates

Fish Audio's API uses pure pay-as-you-go pricing — no subscription required for API access (though subscription credits can supplement API usage). The rates are identical across models:

ModelRate≈ Cost Per HourNotes
S2 Pro (TTS)$15/1M UTF-8 bytes~$1.25#1 blind test quality
S1 (TTS)$15/1M UTF-8 bytes~$1.25Previous generation model
ASR (transcribe-1)$0.36/audio hour$0.36Speech recognition, billed per second

UTF-8 Bytes vs Characters: Why It Matters

Fish Audio bills per UTF-8 byte, not per character. For English, 1 character = 1 byte, so $15/1M bytes = $15/1M characters. But for Chinese, Japanese, or Korean, each character uses 3 bytes. A million Chinese characters costs $45, not $15. If you're working in non-Latin scripts, factor this 3x multiplier into your cost estimates. Fish Audio's own docs note that 1M UTF-8 bytes ≈ 180,000 English words or 12 hours of speech.

Concurrency tiers unlock based on total prepaid spending: Starter (under $100 prepaid) gets 5 concurrent requests, Elevated ($100+) gets 15, High Volume ($1,000+) gets 50, and Enterprise gets custom limits. No per-request fees — just the per-byte charge.

Self-Hosting: When the API Costs Nothing

Fish Audio publishes the S2 model weights on HuggingFace and inference code on GitHub (18,000+ stars). If you have your own GPU, the per-character API cost drops to zero — you pay only for compute.

The real cost of self-hosting: An NVIDIA H200 or A100 gets you sub-100ms latency. Cloud rental runs $1.50-$4.00/hour depending on provider (AWS, GCP, Lambda Labs). At $2/hour running 24/7, that's $1,440/month — expensive compared to the $749 Max plan unless you're generating more than ~100 hours of audio monthly. The breakeven is around 50-70 hours/month of generation.

For most developers, the API at $15/1M is the better deal. Self-hosting only makes sense if you're a large operation generating 10M+ characters per month, or if you need data residency guarantees. Compare this to Dia TTS and Chatterbox, which also offer self-hosting but with lower quality benchmarks.

Fish Audio vs ElevenLabs: The 11x Price Gap Explained

The headline number — Fish Audio at $15/1M vs ElevenLabs at $120-$165/1M — tells only part of the story. Here's the honest comparison:

FeatureFish AudioElevenLabs
API Price/1M chars$15$60 (Flash) / $120 (Multilingual)
Blind Test Ranking#1 (BT score 3.07, 1.7x next best)Won 40% of head-to-head matchups
Arena ELO1,128 (Artificial Analysis)1,179 (#4 on Artificial Analysis)
Voice LibraryCommunity voices + custom cloning4,000+ preset + community
Voice Cloning10-30s sample, cross-lingual (80+ langs)Instant + PVC (30+ min audio)
Languages80+70+
Studio/Web EditorBasic web interfaceFull studio with Projects, dubbing, SFX
Open SourceYes (self-hostable)No
Company ScaleStartup (growing fast)$330M+ ARR, $11B valuation

The nuance: Fish Audio S2 Pro won 60% of blind head-to-head comparisons against ElevenLabs, with a Bradley-Terry score 1.7x higher. But ElevenLabs has a slightly higher ELO on the Artificial Analysis leaderboard (1,179 vs 1,128) and a dramatically more mature ecosystem — 4,000+ voices, a studio editor, dubbing, sound effects, and the deepest voice cloning in the market. For detailed voice quality and feature analysis, read our Fish Audio S2 Pro review.

When to pick Fish Audio: You care about per-character cost and voice quality above all else. You're a developer comfortable with APIs. You want cross-lingual cloning from short samples. You need to self-host.

When to pick ElevenLabs: You need a polished studio UI (no code). You need Professional Voice Cloning from 30+ minutes of audio. You're building audiobooks or long-form content that benefits from the Projects feature. You want the safety of a $11B company's infrastructure behind your production workload.

Real-World Cost Examples

Use CaseCharactersFish Audio APIElevenLabs FlashSavings
Blog post (1,500 words)~8,500$0.13$0.5175%
YouTube video (10 min)~15,000$0.23$0.9074%
Full audiobook (80K words)~450,000$6.75$27.0075%
SaaS integration (1M chars/mo)1,000,000$15.00$60.0075%
Enterprise (10M chars/mo)10,000,000$150.00$600.0075%

The savings ratio is remarkably consistent — about 75% less than ElevenLabs Flash, and 87% less than ElevenLabs Multilingual v2, regardless of volume. This flat-rate simplicity is one of Fish Audio's genuine advantages: no tiered pricing traps, no confusing credit systems, no overage surcharges.

Hidden Costs and Gotchas

Fish Audio vs 10 Competitors: Price per Million Characters

ServiceCost/1M CharsArena/Blind RankVoice CloningSelf-Host
Fish Audio S2 Pro$15#1 blind (BT 3.07) / ELO 1,128Yes (10-30s)Yes
Grok TTS (beta)$4.20Not rankedNoNo
Gemini Flash TTS~$12#2 Arena (ELO 1,211)NoNo
OpenAI TTS$15-$30Not rankedNoNo
Amazon Polly Neural$16Not rankedNoNo
Inworld TTS-2$25-$35#1 Arena (ELO 1,236)YesNo
Cartesia Sonic 3~$37#10 Arena (ELO 1,054)Yes (3s)No
ElevenLabs Flash$60#4 Arena (ELO 1,179)Yes (instant + PVC)No
Dia TTS~$40 (fal.ai) / Free (self-host)Not rankedNoYes
Chatterbox TurboFree (self-host)Not rankedYesYes
NaturalReader$60-$110/year (subscription)Not rankedNoNo

Fish Audio occupies a unique position: the best quality-to-price ratio in the market. Only Grok ($4.20) and Gemini Flash (~$12) are cheaper, but neither offers voice cloning or self-hosting. Only Chatterbox and Dia are self-hostable, but neither matches S2 Pro's quality in blind tests. ElevenLabs and Inworld have higher Arena ELO scores, but at 4-8x the cost.

Who Should (and Shouldn't) Choose Fish Audio

Pick Fish Audio if

  • Cost per character matters most to you
  • You need cross-lingual voice cloning from short samples
  • You want the option to self-host later
  • You're building a multilingual product (80+ languages)
  • You're comfortable with API-first workflows

Skip Fish Audio if

  • You need a polished web studio (use ElevenLabs or Murf AI)
  • You need sub-50ms latency for voice agents (use Cartesia)
  • Vendor stability matters more than price (ElevenLabs at $330M ARR)
  • You work primarily in CJK languages (3x UTF-8 byte multiplier)
  • You're reading documents aloud, not creating audio (use Kindle TTS or Google Docs TTS)

Related Pricing Guides

Use our TTS cost calculator to compare Fish Audio against all alternatives for your specific volume. For the full quality assessment — blind test methodology, emotion range, latency benchmarks — read our complete Fish Audio S2 Pro review.

Considering alternatives? Our best text-to-speech comparison covers every major provider, and the ElevenLabs alternatives guide specifically reviews cheaper options.

By TextToLab Research Team · Pricing verified May 2026 against fish.audio/plan and Fish Audio API docs. Blind test data from Fish Audio's 71,000-pair study (March-April 2026). Arena rankings from Artificial Analysis.