⚠️ SIP Billing Starts November 10, 2025 - See Ultravox Pricing for details.
⚠️ SIP Billing Starts November 10, 2025 - See Ultravox Pricing for details.
Gets details for the specified voice
curl --request GET \
--url https://api.ultravox.ai/api/voices/{voice_id} \
--header 'X-API-Key: <api-key>'{
"voiceId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"previewUrl": "<string>",
"ownership": "public",
"billingStyle": "VOICE_BILLING_STYLE_INCLUDED",
"provider": "<string>",
"definition": {
"elevenLabs": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"useSpeakerBoost": true,
"style": 123,
"similarityBoost": 123,
"stability": 123,
"pronunciationDictionaries": [
{
"dictionaryId": "<string>",
"versionId": "<string>"
}
],
"optimizeStreamingLatency": 123,
"maxSampleRate": 123
},
"cartesia": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"emotion": "<string>",
"emotions": [
"<string>"
],
"generationConfig": {
"volume": 123,
"speed": 123,
"emotion": "<string>"
}
},
"lmnt": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"conversational": true
},
"google": {
"voiceId": "<string>",
"speakingRate": 123
},
"generic": {
"url": "<string>",
"headers": {},
"body": {},
"responseSampleRate": 123,
"responseWordsPerMinute": 123,
"responseMimeType": "<string>",
"jsonAudioFieldPath": "<string>",
"jsonByteEncoding": "JSON_BYTE_ENCODING_UNSPECIFIED"
}
},
"description": "<string>",
"primaryLanguage": "<string>"
}API key
40public, private How billing works for this voice. VOICE_BILLING_STYLE_INCLUDED - The cost of this voice is included in the call cost. There are no additional charges for it. VOICE_BILLING_STYLE_EXTERNAL - This voice requires an API key for its provider, who will bill for usage separately.
VOICE_BILLING_STYLE_INCLUDED, VOICE_BILLING_STYLE_EXTERNAL A voice not known to Ultravox Realtime that can nonetheless be used for a call. Such voices are significantly less validated than normal voices and you'll be responsible for your own TTS-related errors. Exactly one field must be set.
Show child attributes
A voice served by ElevenLabs.
Show child attributes
The ID of the voice in ElevenLabs.
The ElevenLabs model to use.
The speaking rate. Must be between 0.7 and 1.2. Defaults to 1. See https://elevenlabs.io/docs/api-reference/text-to-speech/convert#request.body.voice_settings.speed
The maximum sample rate Ultravox will try to use. ElevenLabs limits your allowed sample rate based on your tier. See https://elevenlabs.io/pricing#pricing-table (and click "Show API details")
A voice served by Cartesia.
Show child attributes
The ID of the voice in Cartesia.
The Cartesia model to use.
(Deprecated) The speaking rate. Must be between -1 and 1. Defaults to 0.
(Deprecated) Use generation_config.emotion instead.
(Deprecated) Use generation_config.emotion instead.
Configure the various attributes of the generated speech.
Show child attributes
Adjust the volume of the generated speech between 0.5x and 2.0x the original volume (default is 1.0x). Valid values are between [0.5, 2.0] inclusive.
Adjust the speed of the generated speech between 0.6x and 2.0x the original speed (default is 1.0x). Valid values are between [0.6, 1.5] inclusive.
The primary emotions are neutral, calm, angry, content, sad, scared. For more options, see Prompting Sonic-3.
A voice served by LMNT.
Show child attributes
The ID of the voice in LMNT.
The LMNT model to use.
The speaking rate. Must be between 0.25 and 2. Defaults to 1. See https://docs.lmnt.com/api-reference/speech/synthesize-speech-bytes#body-speed
A voice served by Google, using bidirectional streaming. (For non-streaming or output-only streaming, use generic.)
Show child attributes
The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".
The speaking rate. Must be between 0.25 and 2. Defaults to 1. See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig
A voice served by a generic REST-based TTS API.
Show child attributes
The endpoint to which requests are sent.
The request body to send. Some field should include a placeholder for text represented as {text}. The placeholder will be replaced with the text to synthesize.
The sample rate of the audio returned by the API.
An estimate of the speaking rate of the returned audio in words per minute. This is used for transcript timing while audio is streamed in the response. (Once the response is complete, Ultravox Realtime uses the real audio duration to adjust the timing.) Defaults to 150 and is unused for non-streaming responses.
The real mime type of the content returned by the API. If unset, the Content-Type response header will be used. This is useful for APIs whose response bodies don't strictly adhere to what the API claims via header. For example, if your API claims to return audio/wav but omits the WAV header (thus really returning raw PCM), set this to audio/l16. Similarly, if your API claims to return JSON but actually streams JSON Lines, set this to application/jsonl.
For JSON responses, the path to the field containing base64-encoded audio data. The data must be PCM audio, optionally with a WAV header.
For JSON responses, how audio bytes are encoded into the json_audio_field_path string. Defaults to base64. Also supports hex.
JSON_BYTE_ENCODING_UNSPECIFIED, JSON_BYTE_ENCODING_BASE64, JSON_BYTE_ENCODING_HEX 240BCP47 language code for the primary language supported by this voice.
10curl --request GET \
--url https://api.ultravox.ai/api/voices/{voice_id} \
--header 'X-API-Key: <api-key>'{
"voiceId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"previewUrl": "<string>",
"ownership": "public",
"billingStyle": "VOICE_BILLING_STYLE_INCLUDED",
"provider": "<string>",
"definition": {
"elevenLabs": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"useSpeakerBoost": true,
"style": 123,
"similarityBoost": 123,
"stability": 123,
"pronunciationDictionaries": [
{
"dictionaryId": "<string>",
"versionId": "<string>"
}
],
"optimizeStreamingLatency": 123,
"maxSampleRate": 123
},
"cartesia": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"emotion": "<string>",
"emotions": [
"<string>"
],
"generationConfig": {
"volume": 123,
"speed": 123,
"emotion": "<string>"
}
},
"lmnt": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"conversational": true
},
"google": {
"voiceId": "<string>",
"speakingRate": 123
},
"generic": {
"url": "<string>",
"headers": {},
"body": {},
"responseSampleRate": 123,
"responseWordsPerMinute": 123,
"responseMimeType": "<string>",
"jsonAudioFieldPath": "<string>",
"jsonByteEncoding": "JSON_BYTE_ENCODING_UNSPECIFIED"
}
},
"description": "<string>",
"primaryLanguage": "<string>"
}