⚠️ SIP Billing Starts November 10, 2025 - See Ultravox Pricing for details.
⚠️ SIP Billing Starts November 10, 2025 - See Ultravox Pricing for details.
Lists all stages that occurred during the specified call
curl --request GET \
--url https://api.ultravox.ai/api/calls/{call_id}/stages \
--header 'X-API-Key: <api-key>'{
"results": [
{
"callId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"callStageId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"created": "2023-11-07T05:31:56Z",
"temperature": 123,
"errorCount": 123,
"experimentalSettings": "<unknown>",
"initialState": {},
"inactivityMessages": [
{
"duration": "<string>",
"message": "<string>",
"endBehavior": "END_BEHAVIOR_UNSPECIFIED"
}
],
"languageHint": "<string>",
"model": "fixie-ai/ultravox",
"systemPrompt": "<string>",
"timeExceededMessage": "<string>",
"voice": "<string>",
"externalVoice": {
"elevenLabs": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"useSpeakerBoost": true,
"style": 123,
"similarityBoost": 123,
"stability": 123,
"pronunciationDictionaries": [
{
"dictionaryId": "<string>",
"versionId": "<string>"
}
],
"optimizeStreamingLatency": 123,
"maxSampleRate": 123
},
"cartesia": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"emotion": "<string>",
"emotions": [
"<string>"
],
"generationConfig": {
"volume": 123,
"speed": 123,
"emotion": "<string>"
}
},
"lmnt": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"conversational": true
},
"google": {
"voiceId": "<string>",
"speakingRate": 123
},
"generic": {
"url": "<string>",
"headers": {},
"body": {},
"responseSampleRate": 123,
"responseWordsPerMinute": 123,
"responseMimeType": "<string>",
"jsonAudioFieldPath": "<string>",
"jsonByteEncoding": "JSON_BYTE_ENCODING_UNSPECIFIED"
}
}
}
],
"next": "http://api.example.org/accounts/?cursor=cD00ODY%3D\"",
"previous": "http://api.example.org/accounts/?cursor=cj0xJnA9NDg3",
"total": 123
}API key
The pagination cursor value.
Number of results to return per page.
Show child attributes
The number of errors in this call stage.
Experimental settings for this call stage.
Messages spoken by the agent when the user is inactive for the specified duration. Durations are cumulative, so a message m > 1 with duration 30s will be spoken 30 seconds after message m-1.
Show child attributes
The duration after which the message should be spoken.
The message to speak.
The behavior to exhibit when the message is finished being spoken.
END_BEHAVIOR_UNSPECIFIED, END_BEHAVIOR_HANG_UP_SOFT, END_BEHAVIOR_HANG_UP_STRICT BCP47 language code that may be used to guide speech recognition.
16A voice not known to Ultravox Realtime that can nonetheless be used for a call. Such voices are significantly less validated than normal voices and you'll be responsible for your own TTS-related errors. Exactly one field must be set.
Show child attributes
A voice served by ElevenLabs.
Show child attributes
The ID of the voice in ElevenLabs.
The ElevenLabs model to use.
The speaking rate. Must be between 0.7 and 1.2. Defaults to 1. See https://elevenlabs.io/docs/api-reference/text-to-speech/convert#request.body.voice_settings.speed
The maximum sample rate Ultravox will try to use. ElevenLabs limits your allowed sample rate based on your tier. See https://elevenlabs.io/pricing#pricing-table (and click "Show API details")
A voice served by Cartesia.
Show child attributes
The ID of the voice in Cartesia.
The Cartesia model to use.
(Deprecated) The speaking rate. Must be between -1 and 1. Defaults to 0.
(Deprecated) Use generation_config.emotion instead.
(Deprecated) Use generation_config.emotion instead.
Configure the various attributes of the generated speech.
Show child attributes
Adjust the volume of the generated speech between 0.5x and 2.0x the original volume (default is 1.0x). Valid values are between [0.5, 2.0] inclusive.
Adjust the speed of the generated speech between 0.6x and 2.0x the original speed (default is 1.0x). Valid values are between [0.6, 1.5] inclusive.
The primary emotions are neutral, calm, angry, content, sad, scared. For more options, see Prompting Sonic-3.
A voice served by LMNT.
Show child attributes
The ID of the voice in LMNT.
The LMNT model to use.
The speaking rate. Must be between 0.25 and 2. Defaults to 1. See https://docs.lmnt.com/api-reference/speech/synthesize-speech-bytes#body-speed
A voice served by Google, using bidirectional streaming. (For non-streaming or output-only streaming, use generic.)
Show child attributes
The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".
The speaking rate. Must be between 0.25 and 2. Defaults to 1. See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig
A voice served by a generic REST-based TTS API.
Show child attributes
The endpoint to which requests are sent.
The request body to send. Some field should include a placeholder for text represented as {text}. The placeholder will be replaced with the text to synthesize.
The sample rate of the audio returned by the API.
An estimate of the speaking rate of the returned audio in words per minute. This is used for transcript timing while audio is streamed in the response. (Once the response is complete, Ultravox Realtime uses the real audio duration to adjust the timing.) Defaults to 150 and is unused for non-streaming responses.
The real mime type of the content returned by the API. If unset, the Content-Type response header will be used. This is useful for APIs whose response bodies don't strictly adhere to what the API claims via header. For example, if your API claims to return audio/wav but omits the WAV header (thus really returning raw PCM), set this to audio/l16. Similarly, if your API claims to return JSON but actually streams JSON Lines, set this to application/jsonl.
For JSON responses, the path to the field containing base64-encoded audio data. The data must be PCM audio, optionally with a WAV header.
For JSON responses, how audio bytes are encoded into the json_audio_field_path string. Defaults to base64. Also supports hex.
JSON_BYTE_ENCODING_UNSPECIFIED, JSON_BYTE_ENCODING_BASE64, JSON_BYTE_ENCODING_HEX "http://api.example.org/accounts/?cursor=cD00ODY%3D\""
"http://api.example.org/accounts/?cursor=cj0xJnA9NDg3"
123
curl --request GET \
--url https://api.ultravox.ai/api/calls/{call_id}/stages \
--header 'X-API-Key: <api-key>'{
"results": [
{
"callId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"callStageId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"created": "2023-11-07T05:31:56Z",
"temperature": 123,
"errorCount": 123,
"experimentalSettings": "<unknown>",
"initialState": {},
"inactivityMessages": [
{
"duration": "<string>",
"message": "<string>",
"endBehavior": "END_BEHAVIOR_UNSPECIFIED"
}
],
"languageHint": "<string>",
"model": "fixie-ai/ultravox",
"systemPrompt": "<string>",
"timeExceededMessage": "<string>",
"voice": "<string>",
"externalVoice": {
"elevenLabs": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"useSpeakerBoost": true,
"style": 123,
"similarityBoost": 123,
"stability": 123,
"pronunciationDictionaries": [
{
"dictionaryId": "<string>",
"versionId": "<string>"
}
],
"optimizeStreamingLatency": 123,
"maxSampleRate": 123
},
"cartesia": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"emotion": "<string>",
"emotions": [
"<string>"
],
"generationConfig": {
"volume": 123,
"speed": 123,
"emotion": "<string>"
}
},
"lmnt": {
"voiceId": "<string>",
"model": "<string>",
"speed": 123,
"conversational": true
},
"google": {
"voiceId": "<string>",
"speakingRate": 123
},
"generic": {
"url": "<string>",
"headers": {},
"body": {},
"responseSampleRate": 123,
"responseWordsPerMinute": 123,
"responseMimeType": "<string>",
"jsonAudioFieldPath": "<string>",
"jsonByteEncoding": "JSON_BYTE_ENCODING_UNSPECIFIED"
}
}
}
],
"next": "http://api.example.org/accounts/?cursor=cD00ODY%3D\"",
"previous": "http://api.example.org/accounts/?cursor=cj0xJnA9NDg3",
"total": 123
}