High-quality TTS with 9 expressive voices
OpenAI's text-to-speech API offers natural-sounding voices optimized for various use cases. With two model tiers (tts-1 for speed and tts-1-hd for quality) and 9 distinct voices, it provides flexibility for real-time applications and pre-rendered content alike.
What makes OpenAI TTS stand out.
OpenAI TTS is ideal for teams that need reliable, natural-sounding voices with minimal setup.
The simplest TTS API to integrate. One endpoint, 9 voices, real-time streaming, and 6 output formats. No complex SDKs or configuration needed.
Consistent quality across all 9 voices with adjustable speed control. Cost-effective at $15/1M characters for high-volume audio production.
Clear pronunciation and predictable pacing make OpenAI TTS a reliable choice for screen readers, assistive tools, and accessible content.
Getting started takes minutes. Here's the typical workflow.
Sign up at platform.openai.com and generate an API key. TTS is available on all paid plans.
Pick from 9 voices: Alloy, Ash, Coral, Echo, Fable, Nova, Onyx, Sage, or Shimmer.
Use tts-1 for low latency or tts-1-hd for quality. Set speed from 0.25x to 4.0x.
Send text via the API. Get back MP3, WAV, OPUS, AAC, FLAC, or PCM. Stream in real-time or download.
See how OpenAI TTS stacks up against other TTS services.
OpenAI TTS includes 9 built-in voices: Alloy (neutral), Ash (warm), Coral (clear), Echo (deep), Fable (animated), Nova (bright), Onyx (bold), Sage (calm), and Shimmer (soft). Each voice has a distinct personality suited for different use cases.
tts-1 is optimized for speed and low latency, ideal for real-time applications like chatbots. tts-1-hd produces higher fidelity audio with better clarity, best for pre-rendered content like audiobooks and videos. tts-1-hd costs $30/1M characters vs $15/1M for tts-1.
OpenAI charges $15 per 1 million characters for tts-1 and $30 per 1 million characters for tts-1-hd. A typical 1,000-word blog post (about 5,000 characters) costs roughly $0.08 with tts-1 or $0.15 with tts-1-hd.
The API supports 6 output formats: MP3, WAV, OPUS, AAC, FLAC, and PCM. MP3 is the default and most widely compatible. OPUS is ideal for low-latency streaming, while FLAC and WAV are best for lossless audio.
No. OpenAI TTS does not support voice cloning or custom voice creation. You are limited to the 9 built-in voices. If you need voice cloning, consider ElevenLabs or Chatterbox Turbo.
Yes. You can set the speed parameter from 0.25x to 4.0x when generating audio. The speed is baked into the generated file. Lower speeds sound more deliberate, while higher speeds work for faster narration.
Yes. The API supports real-time audio streaming using chunk transfer encoding. Audio begins playing before the full file is generated, making it suitable for chatbots and voice assistants.
OpenAI TTS supports 57 languages following the Whisper model. However, the voices are primarily optimized for English. Quality may vary for other languages, and there is no multilingual model selection like some competitors offer.
Pay per character