List Calls
Returns details for all calls
Authorizations
API key
Query Parameters
The pagination cursor value.
Number of results to return per page.
Response
The version of the client that joined this call.
The reason the call ended.
unjoined
- Client never joinedhangup
- Client hung upagent_hangup
- Agent hung uptimeout
- Call timed outconnection_error
- Connection error
unjoined
, hangup
, agent_hangup
, timeout
, connection_error
The number of errors in this call.
Who was supposed to talk first when the call started. Typically set to FIRST_SPEAKER_USER for outgoing calls and left as the default (FIRST_SPEAKER_AGENT) otherwise.
FIRST_SPEAKER_AGENT
, FIRST_SPEAKER_USER
Settings for the initial message to get the call started.
If set, the agent should speak first.
What the agent should say. If unset, the model will generate a greeting.
Whether the user should be prevented from interrupting the agent's first message. Defaults to false (meaning the agent is interruptible as usual).
If set, the user should speak first.
The medium used initially by the agent. May later be changed by the client.
MESSAGE_MEDIUM_VOICE
, MESSAGE_MEDIUM_TEXT
A short summary of the call.
A summary of the call.
Messages spoken by the agent when the user is inactive for the specified duration. Durations are cumulative, so a message m > 1 with duration 30s will be spoken 30 seconds after message m-1.
The duration after which the message should be spoken.
The behavior to exhibit when the message is finished being spoken.
END_BEHAVIOR_UNSPECIFIED
, END_BEHAVIOR_HANG_UP_SOFT
, END_BEHAVIOR_HANG_UP_STRICT
The message to speak.
BCP47 language code that may be used to guide speech recognition.
16
Details about a call's protocol. By default, calls occur over WebRTC using the Ultravox client SDK. Setting a different call medium will prepare the server for a call using a different protocol. At most one call medium may be set.
The call will use Plivo's AudioStreams protocol. Once you have a join URL from starting a call, include it in your Plivo XML like so: <Stream keepCallAlive="true" bidirectional="true" contentType="audio/x-l16;rate=16000">${your-join-url}</Stream> This works for both inbound and outbound calls.
The call will use a plain websocket connection. This is unlikely to yield an acceptable user experience if used from a browser or mobile client, but may be suitable for a server-to-server connection. This option provides a simple way to connect your own server to an Ultravox inference instance.
The size of the client-side audio buffer in milliseconds. Smaller buffers allow for faster interruptions but may cause audio underflow if network latency fluctuates too greatly. For the best of both worlds, set this to some large value (e.g. 30000) and implement support for playback_clear_buffer messages. Defaults to 60.
The sample rate for input (user) audio. Required.
The desired sample rate for output (agent) audio. If unset, defaults to the input_sample_rate.
The call will use Telnyx's media streaming protocol. Once you have a join URL from starting a call, include it in your TexML like so: <Connect><Stream url=${your-join-url} bidirectionalMode="rtp" /></Connect> This works for both inbound and outbound calls.
The call will use Twilio's "Media Streams" protocol. Once you have a join URL from starting a call, include it in your TwiML like so: <Connect><Stream url=${your-join-url} /></Connect> This works for both inbound and outbound calls.
The call will use WebRTC with the Ultravox client SDK. This is the default.
0 < x < 1
Indicates whether a transcript is optional for the call.
VAD settings for the call.
The minimum duration of user speech required to interrupt the agent. This works the same way as minimumTurnDuration, but allows for a higher threshold for interrupting the agent. (This value will be ignored if it is less than minimumTurnDuration.)
Defaults to "0.09s" (90ms) as a starting point, but there's nothing special about this value.
The minimum duration of user speech required to be considered a user turn. Increasing this value will cause the agent to ignore short user audio. This may be useful in particularly noisy environments, but it comes at the cost of possibly ignoring very short user responses such as "yes" or "no".
Defaults to "0s" meaning the agent considers all user audio inputs (that make it through built-in noise cancellation).
The minimum amount of time the agent will wait to respond after the user seems to be done speaking. Increasing this value will make the agent less eager to respond, which may increase perceived response latency but will also make the agent less likely to jump in before the user is really done speaking.
Built-in VAD currently operates on 32ms frames, so only multiples of 32ms are meaningful. (Anything from 1ms to 31ms will produce the same result.)
Defaults to "0.384s" (384ms) as a starting point, but there's nothing special about this value aside from it corresponding to 12 VAD frames.