Voice Cloning

Ultravox Realtime includes multiple, high-quality voices for all supported languages. The fastest way to experience the included voices is in the Voices explorer in the web console. You can also use the List Voices endpoint to see all voices and their details.

Creating a Custom Voice

Currently, we support one cloned voice per account. If you need more cloned voices, please reach out.

You can create a custom voice by uploading an audio sample using the Create (Clone) Voice endpoint. This process allows you to generate a unique voice that matches the characteristics of your audio sample.

Prerequisites

An Ultravox Realtime API key
A single audio file containing a clear voice sample (30 seconds recommended)
The audio file must be in .mp3 or .wav format

Using the API

To create a custom voice, send a POST request to the /api/voices endpoint with your audio file. Note: multiple files are not supported. Here’s how to do it:

curl --request POST https://api.ultravox.ai/api/voices \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --form 'file=@"/path/to/your/audio-sample.wav"'
  --form 'name=My Custom Voice' \
  --form 'description=Voice recorded on Jan 1, 2024'

Requirements for Audio Samples

For optimal results, ensure your audio sample meets these criteria:

Clear, high-quality audio without background noise or echo
Single speaker throughout the recording
Natural speaking pace and tone
No music or other voices in the background
30-60 seconds in length (longer samples do not typically lead to better clones)

Limitations

Maximum of one audio file per voice
10MB file size maximum

Getting Started

Agents & Calls

Telephony

Web, Apps, Websockets

Tools

Voices

Webhooks

Noise & VAD

Creating a Custom Voice

Prerequisites

Using the API

Requirements for Audio Samples

Limitations

Getting Started

Agents & Calls

Telephony

Web, Apps, Websockets

Tools

Voices

Webhooks

Noise & VAD

​Creating a Custom Voice

​Prerequisites

​Using the API

​Requirements for Audio Samples

​Limitations

Creating a Custom Voice

Prerequisites

Using the API

Requirements for Audio Samples

Limitations