Authorizations
API key
Query Parameters
Adds a prompt for a greeting if there's not an initial message that the model would naturally respond to (a user message or tool result).
The UUID of a prior call. When specified, the new call will use the same properites as the prior call unless overriden in this request's body. The new call will also use the prior call's message history as its own initial_messages. (It's illegal to also set initial_messages in the body.)
Body
A request to start a call.
The system prompt provided to the model during generations.
The model temperature, between 0 and 1. Defaults to 0.
The model used for generations. Defaults to fixie-ai/ultravox.
The ID (or name if unique) of the voice the agent should use for this call.
A voice not known to Ultravox Realtime that can nonetheless be used for this call.
Your account must have an API key set for the provider of the voice.
Either this or voice
may be set, but not both.
A voice not known to Ultravox Realtime that can nonetheless be used for a call.
Such voices are significantly less validated than normal voices and you'll be
responsible for your own TTS-related errors.
Exactly one field must be set.
A BCP47 language code that may be used to guide speech recognition and synthesis.
The conversation history to start from for this call.
A timeout for joining the call. Defaults to 30 seconds.
The maximum duration of the call. Defaults to 1 hour.
What the agent should say immediately before hanging up if the call's time limit is reached.
Messages spoken by the agent when the user is inactive for the specified duration. Durations are cumulative, so a message m > 1 with duration 30s will be spoken 30 seconds after message m-1.
The tools available to the agent for (the first stage of) this call.
The medium used for this call. Details about a call's protocol. By default, calls occur over WebRTC using the Ultravox client SDK. Setting a different call medium will prepare the server for a call using a different protocol. At most one call medium may be set.
Whether the call should be recorded.
Who should talk first when the call starts. Typically set to FIRST_SPEAKER_USER for outgoing
calls and left as the default (FIRST_SPEAKER_AGENT) otherwise.
Deprecated. Prefer firstSpeakerSettings
. If both are set, they must match.
FIRST_SPEAKER_UNSPECIFIED
, FIRST_SPEAKER_AGENT
, FIRST_SPEAKER_USER
Indicates whether a transcript is optional for the call.
The medium to use for the call initially. May be altered by the client later. Defaults to voice.
MESSAGE_MEDIUM_UNSPECIFIED
, MESSAGE_MEDIUM_VOICE
, MESSAGE_MEDIUM_TEXT
VAD settings for the call. Call-level VAD settings.
The settings for the initial message to get a conversation started.
Defaults to agent: {}
which means the agent will start the conversation with an
(interruptible) greeting generated based on the system prompt and any initial messages.
(If first_speaker is set and this is not, first_speaker will be used instead.)
Settings for the initial message to get a conversation started.
Exactly one of user or agent should be set. The default is agent
(unless firstSpeaker is also set, in which case the default will
match that).
Experimental settings for the call.
Optional metadata key-value pairs to associate with the call. All values must be strings. Keys may not start with "ultravox.", which is reserved for system-provided metadata.
The initial state of the call stage which is readable/writable by tools.
Data connection configuration. Data connection enables an auxiliary websocket for streaming data messages.
Response
The version of the client that joined this call.
The reason the call ended.
unjoined
- Client never joinedhangup
- Client hung upagent_hangup
- Agent hung uptimeout
- Call timed outconnection_error
- Connection errorsystem_error
- System error
unjoined
, hangup
, agent_hangup
, timeout
, connection_error
, system_error
Who was supposed to talk first when the call started. Typically set to FIRST_SPEAKER_USER for outgoing calls and left as the default (FIRST_SPEAKER_AGENT) otherwise.
FIRST_SPEAKER_AGENT
, FIRST_SPEAKER_USER
Settings for the initial message to get the call started. Settings for the initial message to get a conversation started. Exactly one of user or agent should be set. The default is agent (unless firstSpeaker is also set, in which case the default will match that).
The medium used initially by the agent. May later be changed by the client.
MESSAGE_MEDIUM_VOICE
, MESSAGE_MEDIUM_TEXT
The number of errors in this call.
A short summary of the call.
A summary of the call.
The agent used for this call.
The ID of the agent used for this call.
Experimental settings for the call.
Optional metadata key-value pairs to associate with the call. All values must be strings.
The initial state of the call which is readable/writable by tools.
Messages spoken by the agent when the user is inactive for the specified duration. Durations are cumulative, so a message m > 1 with duration 30s will be spoken 30 seconds after message m-1.
BCP47 language code that may be used to guide speech recognition.
16
Details about a call's protocol. By default, calls occur over WebRTC using the Ultravox client SDK. Setting a different call medium will prepare the server for a call using a different protocol. At most one call medium may be set.
0 <= x <= 1
A voice not known to Ultravox Realtime that can nonetheless be used for a call. Such voices are significantly less validated than normal voices and you'll be responsible for your own TTS-related errors. Exactly one field must be set.
Indicates whether a transcript is optional for the call.
VAD settings for the call. Call-level VAD settings.
Settings for exchanging data messages with an additional participant. Data connection enables an auxiliary websocket for streaming data messages.