Accounts
Agents
Calls, Messages, Stages
Corpora, Query, Sources
- Corpus Service (RAG) Overview
- GETList Corpora
- POSTCreate Corpus
- GETGet Corpus
- PATCHUpdate Corpus
- DELDelete Corpus
- POSTQuery Corpus
- GETList Corpus Sources
- POSTCreate Corpus Source
- GETGet Corpus Source
- PATCHUpdate Corpus Source
- DELDelete Corpus Source
- GETList Corpus Source Documents
- GETGet Corpus Source Document
- POSTCreate Corpus File Upload
Webhooks
List Agents
Returns details for all agents
curl --request GET \
--url https://api.ultravox.ai/api/agents \
--header 'X-API-Key: <api-key>'
{
"next": "http://api.example.org/accounts/?cursor=cD00ODY%3D\"",
"previous": "http://api.example.org/accounts/?cursor=cj0xJnA9NDg3",
"results": [
{
"agentId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"created": "2023-11-07T05:31:56Z",
"callTemplate": {
"name": "<string>",
"created": "2023-11-07T05:31:56Z",
"updated": "2023-11-07T05:31:56Z",
"medium": {
"webRtc": {},
"twilio": {},
"serverWebSocket": {
"inputSampleRate": 123,
"outputSampleRate": 123,
"clientBufferSizeMs": 123
},
"telnyx": {},
"plivo": {},
"exotel": {}
},
"initialOutputMedium": "MESSAGE_MEDIUM_UNSPECIFIED",
"joinTimeout": "<string>",
"maxDuration": "<string>",
"vadSettings": {
"turnEndpointDelay": "<string>",
"minimumTurnDuration": "<string>",
"minimumInterruptionDuration": "<string>",
"frameActivationThreshold": 123
},
"recordingEnabled": true,
"firstSpeakerSettings": {
"user": {
"fallback": {
"delay": "<string>",
"text": "<string>"
}
},
"agent": {
"uninterruptible": true,
"text": "<string>",
"delay": "<string>"
}
},
"systemPrompt": "<string>",
"temperature": 123,
"model": "<string>",
"voice": "<string>",
"languageHint": "<string>",
"timeExceededMessage": "<string>",
"inactivityMessages": [
{
"duration": "<string>",
"message": "<string>",
"endBehavior": "END_BEHAVIOR_UNSPECIFIED"
}
],
"selectedTools": [
{
"toolId": "<string>",
"toolName": "<string>",
"temporaryTool": {
"modelToolName": "<string>",
"description": "<string>",
"dynamicParameters": [
{
"name": "<string>",
"location": "PARAMETER_LOCATION_UNSPECIFIED",
"schema": {},
"required": true
}
],
"staticParameters": [
{
"name": "<string>",
"location": "PARAMETER_LOCATION_UNSPECIFIED",
"value": "<any>"
}
],
"automaticParameters": [
{
"name": "<string>",
"location": "PARAMETER_LOCATION_UNSPECIFIED",
"knownValue": "KNOWN_PARAM_UNSPECIFIED"
}
],
"requirements": {
"httpSecurityOptions": {
"options": [
{
"requirements": {},
"ultravoxCallTokenRequirement": {
"scopes": [
"<string>"
]
}
}
]
},
"requiredParameterOverrides": [
"<string>"
]
},
"timeout": "<string>",
"precomputable": true,
"http": {
"baseUrlPattern": "<string>",
"httpMethod": "<string>"
},
"client": {},
"defaultReaction": "AGENT_REACTION_UNSPECIFIED",
"staticResponse": {
"responseText": "<string>"
}
},
"nameOverride": "<string>",
"authTokens": {},
"parameterOverrides": {},
"transitionId": "<string>"
}
],
"contextSchema": {}
}
}
],
"total": 123
}
Authorizations
API key
Query Parameters
The pagination cursor value.
Number of results to return per page.
Response
64
A CallTemplate that can be used to create Ultravox calls with shared properties.
The name of the call template.
When the call template was created.
When the call template was last modified.
The medium used for calls by default.
The call will use WebRTC with the Ultravox client SDK. This is the default.
The call will use Twilio's "Media Streams" protocol. Once you have a join URL from starting a call, include it in your TwiML like so: <Connect><Stream url=${your-join-url} /></Connect> This works for both inbound and outbound calls.
The call will use a plain websocket connection. This is unlikely to yield an acceptable user experience if used from a browser or mobile client, but may be suitable for a server-to-server connection. This option provides a simple way to connect your own server to an Ultravox inference instance.
The sample rate for input (user) audio. Required.
The desired sample rate for output (agent) audio. If unset, defaults to the input_sample_rate.
The size of the client-side audio buffer in milliseconds. Smaller buffers allow for faster interruptions but may cause audio underflow if network latency fluctuates too greatly. For the best of both worlds, set this to some large value (e.g. 30000) and implement support for playback_clear_buffer messages. Defaults to 60.
The call will use Telnyx's media streaming protocol. Once you have a join URL from starting a call, include it in your TexML like so: <Connect><Stream url=${your-join-url} bidirectionalMode="rtp" /></Connect> This works for both inbound and outbound calls.
The call will use Plivo's AudioStreams protocol. Once you have a join URL from starting a call, include it in your Plivo XML like so: <Stream keepCallAlive="true" bidirectional="true" contentType="audio/x-l16;rate=16000">${your-join-url}</Stream> This works for both inbound and outbound calls.
The call will use Exotel's "Voicebot" protocol. Once you have a join URL from starting a call, provide it to Exotel as the wss target URL for your Voicebot (either directly or more likely dynamically from your own server).
The medium initially used for calls by default. Defaults to voice.
MESSAGE_MEDIUM_UNSPECIFIED
, MESSAGE_MEDIUM_VOICE
, MESSAGE_MEDIUM_TEXT
A default timeout for joining calls. Defaults to 30 seconds.
The default maximum duration of calls. Defaults to 1 hour.
The default voice activity detection settings for calls.
The minimum amount of time the agent will wait to respond after the user seems to be done speaking. Increasing this value will make the agent less eager to respond, which may increase perceived response latency but will also make the agent less likely to jump in before the user is really done speaking.
Built-in VAD currently operates on 32ms frames, so only multiples of 32ms are meaningful. (Anything from 1ms to 31ms will produce the same result.)
Defaults to "0.384s" (384ms) as a starting point, but there's nothing special about this value aside from it corresponding to 12 VAD frames.
The minimum duration of user speech required to be considered a user turn. Increasing this value will cause the agent to ignore short user audio. This may be useful in particularly noisy environments, but it comes at the cost of possibly ignoring very short user responses such as "yes" or "no".
Defaults to "0s" meaning the agent considers all user audio inputs (that make it through built-in noise cancellation).
The minimum duration of user speech required to interrupt the agent. This works the same way as minimumTurnDuration, but allows for a higher threshold for interrupting the agent. (This value will be ignored if it is less than minimumTurnDuration.)
Defaults to "0.09s" (90ms) as a starting point, but there's nothing special about this value.
The threshold for the VAD to consider a frame as speech. This is a value between 0.1 and 1.
Miniumum value is 0.1, which is the default value.
Whether calls are recorded by default.
The default settings for the initial message to get a conversation started for calls.
Defaults to agent: {}
which means the agent will start the conversation with an
(interruptible) greeting generated based on the system prompt and any initial messages.
If set, the user should speak first.
If set, the agent will start the conversation itself if the user doesn't start speaking within the given delay.
If set, the agent should speak first.
Whether the user should be prevented from interrupting the agent's first message. Defaults to false (meaning the agent is interruptible as usual).
What the agent should say. If unset, the model will generate a greeting.
If set, the agent will wait this long before starting its greeting. This may be useful for ensuring the user is ready.
The system prompt used for generations. If multiple stages are defined for the call, this will be used only for stages without their own systemPrompt.
The model temperature, between 0 and 1. Defaults to 0. If multiple stages are defined for the call, this will be used only for stages without their own temperature.
The model used for generations. Defaults to fixie-ai/ultravox. If multiple stages are defined for the call, this will be used only for stages without their own model.
The name or ID of the voice the agent should use for this call. If multiple stages are defined for the call, this will be used only for stages without their own voice.
A BCP47 language code that may be used to guide speech recognition and synthesis. If multiple stages are defined for the call, this will be used only for stages without their own languageHint.
What the agent should say immediately before hanging up if the call's time limit is reached. If multiple stages are defined for the call, this will be used only for stages without their own timeExceededMessage.
Messages spoken by the agent when the user is inactive for the specified duration. Durations are cumulative, so a message m > 1 with duration 30s will be spoken 30 seconds after message m-1. If multiple stages are defined for the call, this will be used only for stages without their own inactivityMessages.
A message the agent should say after some duration. The duration's meaning varies depending on the context.
The duration after which the message should be spoken.
The message to speak.
The behavior to exhibit when the message is finished being spoken.
END_BEHAVIOR_UNSPECIFIED
, END_BEHAVIOR_HANG_UP_SOFT
, END_BEHAVIOR_HANG_UP_STRICT
The tools available to the agent for this call. The following fields are treated as templates when converting to a CallTool.
- description
- static_parameters.value
- http.auth_headers.value
- http.auth_query_params.value If multiple stages are defined for the call, this will be used only for stages without their own selectedTools.
A tool selected for a particular call. Exactly one of tool_id, tool_name, or temporary_tool should be set.
The ID of an existing base tool.
The name of an existing base tool. The name must uniquely identify the tool.
A temporary tool definition, available only for this call (and subsequent calls created using priorCallId without overriding selected tools).
The name of the tool, as presented to the model. Must match ^[a-zA-Z0-9_-]{1,64}$.
The description of the tool.
The parameters that the tool accepts.
A dynamic parameter the tool accepts that may be set by the model.
The static parameters added when the tool is invoked.
A static parameter that is unconditionally added when the tool is invoked. This parameter is not exposed to or set by the model.
Additional parameters that are automatically set by the system when the tool is invoked.
A parameter that is automatically set by the system.
Requirements that must be fulfilled when creating a call for the tool to be used.
The maximum amount of time the tool is allowed for execution. The conversation is frozen while tools run, so prefer sticking to the default unless you're comfortable with that consequence. If your tool is too slow for the default and can't be made faster, still try to keep this timeout as low as possible.
The tool is guaranteed to be non-mutating, repeatable, and free of side-effects. Such tools can safely be executed speculatively, reducing their effective latency. However, the fact they were called may not be reflected in the call history if their result ends up unused.
Details for an HTTP tool.
Details for a client-implemented tool. Only body parameters are allowed for client tools.
Indicates the default for how the agent should proceed after the tool is invoked. Can be overridden by the tool implementation via the X-Ultravox-Agent-Reaction header.
AGENT_REACTION_UNSPECIFIED
, AGENT_REACTION_SPEAKS
, AGENT_REACTION_LISTENS
, AGENT_REACTION_SPEAKS_ONCE
Static response to a tool. When this is used, this response will be returned without waiting for the tool's response.
An override for the model_tool_name. This is primarily useful when using multiple instances of the same durable tool (presumably with different parameter overrides.) The set of tools used within a call must have a unique set of model names and every name must match this pattern: ^[a-zA-Z0-9_-]{1,64}$.
Auth tokens used to satisfy the tool's security requirements.
Static values to use in place of dynamic parameters. Any parameter included here will be hidden from the model and the static value will be used instead. Some tools may require certain parameters to be overridden, but any parameter can be overridden regardless of whether it is required to be.
Represents a dynamically typed value which can be either null, a number, a string, a boolean, a recursive struct value, or a list of values.
For internal use. Relates this tool to a stage transition definition within a call template for attribution.
JSON schema for the variables used in string templates. If unset, a default schema will be created from the variables used in the string templates. Call creation requests must provide context adhering to this schema. The follow fields are treated as templates:
- system_prompt
- language_hint
- time_exceeded_message
- inactivity_messages.message
- selected_tools.description
- selected_tools.static_parameters.value
- selected_tools.http.auth_headers.value
- selected_tools.http.auth_query_params.value If multiple stages are defined for the call, each must define its own context schema (or use the generated one).
"http://api.example.org/accounts/?cursor=cD00ODY%3D\""
"http://api.example.org/accounts/?cursor=cj0xJnA9NDg3"
123
curl --request GET \
--url https://api.ultravox.ai/api/agents \
--header 'X-API-Key: <api-key>'
{
"next": "http://api.example.org/accounts/?cursor=cD00ODY%3D\"",
"previous": "http://api.example.org/accounts/?cursor=cj0xJnA9NDg3",
"results": [
{
"agentId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"created": "2023-11-07T05:31:56Z",
"callTemplate": {
"name": "<string>",
"created": "2023-11-07T05:31:56Z",
"updated": "2023-11-07T05:31:56Z",
"medium": {
"webRtc": {},
"twilio": {},
"serverWebSocket": {
"inputSampleRate": 123,
"outputSampleRate": 123,
"clientBufferSizeMs": 123
},
"telnyx": {},
"plivo": {},
"exotel": {}
},
"initialOutputMedium": "MESSAGE_MEDIUM_UNSPECIFIED",
"joinTimeout": "<string>",
"maxDuration": "<string>",
"vadSettings": {
"turnEndpointDelay": "<string>",
"minimumTurnDuration": "<string>",
"minimumInterruptionDuration": "<string>",
"frameActivationThreshold": 123
},
"recordingEnabled": true,
"firstSpeakerSettings": {
"user": {
"fallback": {
"delay": "<string>",
"text": "<string>"
}
},
"agent": {
"uninterruptible": true,
"text": "<string>",
"delay": "<string>"
}
},
"systemPrompt": "<string>",
"temperature": 123,
"model": "<string>",
"voice": "<string>",
"languageHint": "<string>",
"timeExceededMessage": "<string>",
"inactivityMessages": [
{
"duration": "<string>",
"message": "<string>",
"endBehavior": "END_BEHAVIOR_UNSPECIFIED"
}
],
"selectedTools": [
{
"toolId": "<string>",
"toolName": "<string>",
"temporaryTool": {
"modelToolName": "<string>",
"description": "<string>",
"dynamicParameters": [
{
"name": "<string>",
"location": "PARAMETER_LOCATION_UNSPECIFIED",
"schema": {},
"required": true
}
],
"staticParameters": [
{
"name": "<string>",
"location": "PARAMETER_LOCATION_UNSPECIFIED",
"value": "<any>"
}
],
"automaticParameters": [
{
"name": "<string>",
"location": "PARAMETER_LOCATION_UNSPECIFIED",
"knownValue": "KNOWN_PARAM_UNSPECIFIED"
}
],
"requirements": {
"httpSecurityOptions": {
"options": [
{
"requirements": {},
"ultravoxCallTokenRequirement": {
"scopes": [
"<string>"
]
}
}
]
},
"requiredParameterOverrides": [
"<string>"
]
},
"timeout": "<string>",
"precomputable": true,
"http": {
"baseUrlPattern": "<string>",
"httpMethod": "<string>"
},
"client": {},
"defaultReaction": "AGENT_REACTION_UNSPECIFIED",
"staticResponse": {
"responseText": "<string>"
}
},
"nameOverride": "<string>",
"authTokens": {},
"parameterOverrides": {},
"transitionId": "<string>"
}
],
"contextSchema": {}
}
}
],
"total": 123
}