Essentials
Voice Cloning
Create Custom Voices
Ultravox Realtime includes multiple, high-quality voices for all supported languages. The fastest way to experience the included voices is in the Playground. You can also use the List Voices endpoint to see all voices and their details.
Creating a Custom Voice
You can create a custom voice by uploading an audio sample using the Create (Clone) Voice endpoint. This process allows you to generate a unique voice that matches the characteristics of your audio sample.
Prerequisites
- An Ultravox Realtime API key
- A single audio file containing a clear voice sample (30 seconds recommended)
- The audio file must be in .mp3 or .wav format
Using the API
To create a custom voice, send a POST request to the /api/voices
endpoint with your audio file. Note: multiple files are not supported.
Here’s how to do it:
Requirements for Audio Samples
For optimal results, ensure your audio sample meets these criteria:
- Clear, high-quality audio without background noise or echo
- Single speaker throughout the recording
- Natural speaking pace and tone
- No music or other voices in the background
- 30-60 seconds in length (longer samples do not typically lead to better clones)
Limitations
- Maximum of one audio file per voice
- 30MB file size maximum