Fish Audio Pricing at a Glance (May 2026)
Fish Audio charges $15 per million UTF-8 bytes through its API — roughly $15 per million English characters. The subscription plans range from Free (8,000 credits, ~7 minutes) to Max ($749/month, 25 million credits). That $15/1M rate is 11x cheaper than ElevenLabs Multilingual v2 at $120/1M, while Fish Audio S2 Pro ranked #1 in blind A/B tests with a Bradley-Terry score 1.7x higher than the next best model.
| Plan | Price/Month | Credits | ≈ Generation Time | Commercial Use |
|---|---|---|---|---|
| Free | $0 | 8,000 | ~7 min | No |
| Plus | $11 | 250,000 | ~200 min | Yes |
| Pro | $75 | 2,000,000 | ~1,620 min (27 hrs) | Yes |
| Max | $749 | 25,000,000 | ~6,250 min (104 hrs) | Yes |
The Credit Math
Fish Audio measures credits, not characters directly. One minute of generated audio costs roughly 600-625 credits. The Plus plan's 250,000 credits gives you about 200 minutes — enough for 40 YouTube videos at 5 minutes each, or about 13 podcast episodes at 15 minutes. Annual billing saves 33% across all plans ($11/mo becomes ~$7.33/mo on Plus).
What Each Plan Actually Includes
Free Tier — Personal Testing Only
8,000 credits, 500 characters per generation, 3 public voice slots, and access to both S1 and S2 models. No commercial use. Standard voice cloning only. That 500-character limit per generation is the real constraint — you can't process anything longer than a couple of paragraphs at once. For comparison, ElevenLabs' free tier gives 10,000 credits with no per-generation character limit.
Plus ($11/month) — Where Most Creators Should Start
The jump from Free to Plus is massive. 250,000 credits (~200 minutes), 15,000 characters per generation, unlimited public voices plus 10 private voice slots, enhanced voice cloning, commercial use rights, and API access at pay-as-you-go rates. At $11/month, this is cheaper than Speechify Premium ($139/year = $11.58/mo) and gives you actual content creation tools rather than just a reading app.
Who it's for: Solo creators narrating blog posts, short videos, or social media content. If you're producing under 3 hours of audio per month, Plus covers it.
Pro ($75/month) — Production Volume
2 million credits, 30,000 characters per generation, unlimited voice slots, 3 team seats, and enhanced cloning. That 30K character limit means you can process a full book chapter in one shot. At $75/month for ~27 hours of audio, the effective per-minute cost is about $0.046/minute — less than a tenth of ElevenLabs Pro ($99/mo for ~8 hours of Multilingual audio).
Max ($749/month) — Enterprise-Adjacent
25 million credits, 10 team seats, everything in Pro plus priority generation. At $749/month for ~104 hours, you're paying about $0.12/minute — still dramatically cheaper than ElevenLabs Scale ($330/mo for ~33 hours). The Max plan exists for content studios and SaaS products that burn through TTS at scale.
API Pricing: Pay-as-You-Go Rates
Fish Audio's API uses pure pay-as-you-go pricing — no subscription required for API access (though subscription credits can supplement API usage). The rates are identical across models:
| Model | Rate | ≈ Cost Per Hour | Notes |
|---|---|---|---|
| S2 Pro (TTS) | $15/1M UTF-8 bytes | ~$1.25 | #1 blind test quality |
| S1 (TTS) | $15/1M UTF-8 bytes | ~$1.25 | Previous generation model |
| ASR (transcribe-1) | $0.36/audio hour | $0.36 | Speech recognition, billed per second |
UTF-8 Bytes vs Characters: Why It Matters
Fish Audio bills per UTF-8 byte, not per character. For English, 1 character = 1 byte, so $15/1M bytes = $15/1M characters. But for Chinese, Japanese, or Korean, each character uses 3 bytes. A million Chinese characters costs $45, not $15. If you're working in non-Latin scripts, factor this 3x multiplier into your cost estimates. Fish Audio's own docs note that 1M UTF-8 bytes ≈ 180,000 English words or 12 hours of speech.
Concurrency tiers unlock based on total prepaid spending: Starter (under $100 prepaid) gets 5 concurrent requests, Elevated ($100+) gets 15, High Volume ($1,000+) gets 50, and Enterprise gets custom limits. No per-request fees — just the per-byte charge.
Self-Hosting: When the API Costs Nothing
Fish Audio publishes the S2 model weights on HuggingFace and inference code on GitHub (18,000+ stars). If you have your own GPU, the per-character API cost drops to zero — you pay only for compute.
The real cost of self-hosting: An NVIDIA H200 or A100 gets you sub-100ms latency. Cloud rental runs $1.50-$4.00/hour depending on provider (AWS, GCP, Lambda Labs). At $2/hour running 24/7, that's $1,440/month — expensive compared to the $749 Max plan unless you're generating more than ~100 hours of audio monthly. The breakeven is around 50-70 hours/month of generation.
For most developers, the API at $15/1M is the better deal. Self-hosting only makes sense if you're a large operation generating 10M+ characters per month, or if you need data residency guarantees. Compare this to Dia TTS and Chatterbox, which also offer self-hosting but with lower quality benchmarks.
Fish Audio vs ElevenLabs: The 11x Price Gap Explained
The headline number — Fish Audio at $15/1M vs ElevenLabs at $120-$165/1M — tells only part of the story. Here's the honest comparison:
| Feature | Fish Audio | ElevenLabs |
|---|---|---|
| API Price/1M chars | $15 | $60 (Flash) / $120 (Multilingual) |
| Blind Test Ranking | #1 (BT score 3.07, 1.7x next best) | Won 40% of head-to-head matchups |
| Arena ELO | 1,128 (Artificial Analysis) | 1,179 (#4 on Artificial Analysis) |
| Voice Library | Community voices + custom cloning | 4,000+ preset + community |
| Voice Cloning | 10-30s sample, cross-lingual (80+ langs) | Instant + PVC (30+ min audio) |
| Languages | 80+ | 70+ |
| Studio/Web Editor | Basic web interface | Full studio with Projects, dubbing, SFX |
| Open Source | Yes (self-hostable) | No |
| Company Scale | Startup (growing fast) | $330M+ ARR, $11B valuation |
The nuance: Fish Audio S2 Pro won 60% of blind head-to-head comparisons against ElevenLabs, with a Bradley-Terry score 1.7x higher. But ElevenLabs has a slightly higher ELO on the Artificial Analysis leaderboard (1,179 vs 1,128) and a dramatically more mature ecosystem — 4,000+ voices, a studio editor, dubbing, sound effects, and the deepest voice cloning in the market. For detailed voice quality and feature analysis, read our Fish Audio S2 Pro review.
When to pick Fish Audio: You care about per-character cost and voice quality above all else. You're a developer comfortable with APIs. You want cross-lingual cloning from short samples. You need to self-host.
When to pick ElevenLabs: You need a polished studio UI (no code). You need Professional Voice Cloning from 30+ minutes of audio. You're building audiobooks or long-form content that benefits from the Projects feature. You want the safety of a $11B company's infrastructure behind your production workload.
Real-World Cost Examples
| Use Case | Characters | Fish Audio API | ElevenLabs Flash | Savings |
|---|---|---|---|---|
| Blog post (1,500 words) | ~8,500 | $0.13 | $0.51 | 75% |
| YouTube video (10 min) | ~15,000 | $0.23 | $0.90 | 74% |
| Full audiobook (80K words) | ~450,000 | $6.75 | $27.00 | 75% |
| SaaS integration (1M chars/mo) | 1,000,000 | $15.00 | $60.00 | 75% |
| Enterprise (10M chars/mo) | 10,000,000 | $150.00 | $600.00 | 75% |
The savings ratio is remarkably consistent — about 75% less than ElevenLabs Flash, and 87% less than ElevenLabs Multilingual v2, regardless of volume. This flat-rate simplicity is one of Fish Audio's genuine advantages: no tiered pricing traps, no confusing credit systems, no overage surcharges.
Hidden Costs and Gotchas
- UTF-8 byte billing hits non-English hard. Chinese, Japanese, Korean, and Cyrillic characters use 2-3 bytes each. A million Chinese characters costs $30-$45, not $15. If you primarily work in CJK languages, factor this in.
- Free tier is too limited to evaluate properly. 500 characters per generation means you can't hear how S2 Pro handles a full paragraph with natural pacing. Sign up for Plus ($11/mo) with the 7-day money-back guarantee to actually test it.
- Voice cloning rights verification. Fish Audio requires proof of rights for commercial use of cloned voices. If you clone a voice for a client project, make sure you have written permission. This isn't unique to Fish Audio, but they enforce it.
- No studio editor or web interface for serious work. The fish.audio web interface exists but is basic. For production workflows — script editing, version management, timeline controls — you'll need to use the API directly or build your own tooling.
- Startup risk. Fish Audio is a fast-growing company, but it's not a $330M ARR operation like ElevenLabs. If you're betting your production pipeline on a TTS provider, consider the vendor risk. The open-source model (self-hostable) partially mitigates this.
Fish Audio vs 10 Competitors: Price per Million Characters
| Service | Cost/1M Chars | Arena/Blind Rank | Voice Cloning | Self-Host |
|---|---|---|---|---|
| Fish Audio S2 Pro | $15 | #1 blind (BT 3.07) / ELO 1,128 | Yes (10-30s) | Yes |
| Grok TTS (beta) | $4.20 | Not ranked | No | No |
| Gemini Flash TTS | ~$12 | #2 Arena (ELO 1,211) | No | No |
| OpenAI TTS | $15-$30 | Not ranked | No | No |
| Amazon Polly Neural | $16 | Not ranked | No | No |
| Inworld TTS-2 | $25-$35 | #1 Arena (ELO 1,236) | Yes | No |
| Cartesia Sonic 3 | ~$37 | #10 Arena (ELO 1,054) | Yes (3s) | No |
| ElevenLabs Flash | $60 | #4 Arena (ELO 1,179) | Yes (instant + PVC) | No |
| Dia TTS | ~$40 (fal.ai) / Free (self-host) | Not ranked | No | Yes |
| Chatterbox Turbo | Free (self-host) | Not ranked | Yes | Yes |
| NaturalReader | $60-$110/year (subscription) | Not ranked | No | No |
Fish Audio occupies a unique position: the best quality-to-price ratio in the market. Only Grok ($4.20) and Gemini Flash (~$12) are cheaper, but neither offers voice cloning or self-hosting. Only Chatterbox and Dia are self-hostable, but neither matches S2 Pro's quality in blind tests. ElevenLabs and Inworld have higher Arena ELO scores, but at 4-8x the cost.
Who Should (and Shouldn't) Choose Fish Audio
Pick Fish Audio if
- Cost per character matters most to you
- You need cross-lingual voice cloning from short samples
- You want the option to self-host later
- You're building a multilingual product (80+ languages)
- You're comfortable with API-first workflows
Skip Fish Audio if
- You need a polished web studio (use ElevenLabs or Murf AI)
- You need sub-50ms latency for voice agents (use Cartesia)
- Vendor stability matters more than price (ElevenLabs at $330M ARR)
- You work primarily in CJK languages (3x UTF-8 byte multiplier)
- You're reading documents aloud, not creating audio (use Kindle TTS or Google Docs TTS)
Related Pricing Guides
Use our TTS cost calculator to compare Fish Audio against all alternatives for your specific volume. For the full quality assessment — blind test methodology, emotion range, latency benchmarks — read our complete Fish Audio S2 Pro review.
Considering alternatives? Our best text-to-speech comparison covers every major provider, and the ElevenLabs alternatives guide specifically reviews cheaper options.
By TextToLab Research Team · Pricing verified May 2026 against fish.audio/plan and Fish Audio API docs. Blind test data from Fish Audio's 71,000-pair study (March-April 2026). Arena rankings from Artificial Analysis.