Listen to all 9 voices. Pick a sample text, choose your model and speed, then hit play.
Same text, different voice. Pick two and listen back to back.
Find the right voice for your project with our curated recommendations.
Compare quality, latency, and cost between OpenAI's two text-to-speech models.
OpenAI's text-to-speech API converts written text into natural-sounding audio. It offers 9 built-in voices and two model variants: tts-1 for low-latency applications and tts-1-hd for higher fidelity output.
OpenAI TTS includes 9 voices: Alloy (neutral), Ash (warm), Coral (clear), Echo (deep), Fable (animated), Nova (bright), Onyx (bold), Sage (calm), and Shimmer (soft). Each has a distinct character suited for different use cases.
The tts-1 model is optimized for speed and low latency, ideal for real-time applications. tts-1-hd produces higher quality audio with better clarity, best for pre-rendered content.
Playback speed can be set from 0.25x to 4.0x when generating audio. Lower speeds sound more deliberate, while higher speeds are useful for faster narration. The speed is baked into the generated audio file.
The API supports MP3, WAV, OPUS, AAC, FLAC, and PCM output formats. MP3 is the default and most widely compatible. All samples on TextToLab use MP3.
OpenAI charges $15 per 1 million characters for tts-1 and $30 per 1 million characters for tts-1-hd. A typical 1,000-word blog post costs about $0.08 to $0.16.
It depends on your content. Alloy works for most general use cases. Echo and Onyx suit narration and announcements. Nova and Ash are friendly for apps and chatbots. Sage and Shimmer work well for calm, instructional content.
TextToLab is a free tool for previewing and comparing AI text-to-speech voices. All audio samples are pre-generated so you can quickly evaluate voices without needing an API key or spending credits.