Skip to main content
Data messages are used to communicate non-audio information during Ultravox calls. These messages enable real-time control and interaction with ongoing conversations.

Communication Methods

  • Client Data Channels → Used by our SDKs and WebSocket connections for bi-directional, real-time message exchange during calls. This is the primary method for client apps to interact with calls.
  • Data Connection → Add a data connection to your call to receive messages via a separate WebSocket connection. This is particularly useful for:
    • Telephony integrations where the client doesn’t support WebRTC
    • Server-side applications that need to monitor call events or route data to external systems
  • REST API → Inject messages into active calls via HTTP POST requests. See Sending Messages to Live Calls via REST API below for detailed implementation guidelines.

Messages at a Glance

Details on each message type appear below in Data Message Details.

Client-to-Server Messages

TypeMessageDescription
Agent BehaviorForcedAgentMessageForces the agent to say a specific message or invoke tools.
Call ControlHangUpInstructs the agent to end the call with an optional farewell message.
Call ControlSetOutputMediumSets server’s output medium to text or voice.
SystemPingMeasures round-trip data latency.
ToolsClientToolResult and DataConnectionToolResultContains the result of a tool invocation.
User InputUserTextMessageUsed to send a user message to the agent.

Server-to-Client Messages

TypeMessageDescription
ConversationCallStartedProvides some basic information about the call at its start.
ConversationTranscriptContains text for an utterance made during the call.
SystemDebugUseful for application debugging. Excluded by default.
SystemPlaybackClearBufferUsed to clear buffered output audio. WebSocket only.
SystemPongServer reply to a ping message.
SystemStateIndicates the server’s current state.
ToolsClientToolInvocation and DataConnectionToolInvocationAsks the client or data connection to invoke a tool.

Data Message Details

All messages are JSON objects with camelCase keys containing:
  • A required type field identifying the message type
  • Additional fields specific to each message type

Ping

A message sent by the client to measure round-trip data message latency. Message Structure
{
  "type": "ping",
  "timestamp": 1234567890.123
}
Fields
timestamp
float
required
Unix timestamp with millisecond precision. Client timestamp for latency measurement.

Pong

A message sent by the server in response to a PingMessage. The timestamp is copied from the PingMessage. Message Structure
{
  "type": "pong",
  "timestamp": 1234567890.123
}
Fields
timestamp
float
required
Echoed timestamp from the original ping message.

State

A message sent by the server to indicate its current state. Message Structure
{
  "type": "state",
  "state": "listening"
}
Fields
state
string
required
Current session state. One of: idle, listening, thinking, or speaking.

Transcript

A message containing text transcripts of user and agent utterances. Message Structure
{
  "type": "transcript",
  "role": "agent",
  "medium": "voice",
  "text": "Full transcript so far",  // Either text or delta will be set
  "delta": null,
  "final": false,
  "ordinal": 1
}
Fields
role
string
required
Who emitted the utterance. Must be either “user” or “agent”.
medium
string
default:"voice"
The medium through which the utterance was emitted. Either “text” or “voice”.
text
string
The full text of the transcript so far. Either this or delta will be set, but not both.
delta
string
The additional transcript text added since the last transcript message. Either this or text will be set, but not both.
final
boolean
required
Whether to expect additional transcript messages for this conversation round.
ordinal
integer
required
The ordinal of the message within the current call, used for ordering transcripts.

UserTextMessage

A user message sent via text. The message appears to the agent as if it came from the user. Message Structure
{
  "type": "user_text_message",
  "text": "Your message here",
  "urgency": "soon"  // Optional, defaults to "soon"
}
Fields
text
string
required
The content of the user message.
urgency
string
default:"soon"
Determines whether this message can interrupt the agent and whether it should trigger a generation. Options:
  • immediate → Interrupts the agent if speaking and starts a new generation immediately.
  • soon → Doesn’t interrupt but starts a generation at the next opportunity.
  • later → Message is considered during the next natural generation without forcing a new generation.

SetOutputMedium

Message sent by the client to set the server’s output medium. Message Structure
{
  "type": "set_output_medium",
  "medium": "voice"
}
Fields
medium
string
required
Output medium to use. Must be either “voice” or “text”.

ClientToolInvocation and DataConnectionToolInvocation

Sent by the server to ask the client or data connection to invoke a tool with the given parameters. The client or data connection is expected to send back a ClientToolResultMessage or DataConnectionToolResultMessage with a matching invocationId. Message Structure
{
  "type": "client_tool_invocation", // Or "data_connection_tool_invocation" for data connections
  "toolName": "get_weather",
  "invocationId": "unique-invocation-id",
  "parameters": {
    "location": "Seattle"
  }
}
Fields
toolName
string
required
Name of the tool to invoke.
invocationId
string
required
Unique identifier for this invocation. Must be included in the corresponding result.
parameters
object
required
Tool-specific parameters as a JSON object.

ClientToolResult and DataConnectionToolResult

Contains the result of a tool invocation. Message Structure
{
  "type": "client_tool_result", // Or "data_connection_tool_result" for data connections
  "invocationId": "matching-invocation-id",
  "result": "Tool execution result",
  "responseType": "tool-response",
  "agentReaction": "speaks",
  "errorType": null,
  "errorMessage": null,
  "updateCallState": null
}
Fields
invocationId
string
required
Must match the invocationId from the corresponding invocation.
result
string
Typically the tool execution result as viewed by the agent, which is often a JSON string. May be omitted for errors. For responseTypes other than tool-response, this may be a JSON string for an object that further specifies how the response should be handled. See special response types.
responseType
string
default:"tool-response"
Type of response being provided. See special response types.
agentReaction
string
default:"speaks"
How the agent should react. Options: “speaks” (default), “listens”, or “speaks-once”. See Agent Responses to Tools for more.
errorType
string
Error classification if the tool failed. Should be omitted when result is set.Options:
  • undefined → Tool with the given name does not exist
  • implementation-error → Tool exists but execution failed
errorMessage
string
Human-readable error description if the tool failed. This is not seen by the model but may be used for debugging.
updateCallState
object
Optional state updates to apply to the call. See Tool State for more.

Debug

A message sent by the server to communicate debug information. Disabled by default. Message Structure
{
  "type": "debug",
  "message": "Debug information here"
}
Fields
message
string
required
Debug information or diagnostic details.
Debug messages are disabled by default and must be explicitly enabled for debugging purposes.

CallStarted

Basic call metadata shared by the server when a call begins. Message Structure
{
  "type": "call_started",
  "callId": "550e8400-e29b-41d4-a716-446655440000"
}
Fields
callId
string
required
The UUID of the call that has started.

PlaybackClearBuffer

Message sent by the server to clear buffered output audio. Integrators should drop as much unplayed output audio as possible for interruptions to function properly. Message Structure
{
  "type": "playback_clear_buffer"
}
This message is only used with WebSocket connections. Handling this message allows for larger client buffers while maintaining responsive interrupts. Larger client buffers make choppy audio less likely in the presence of network disruption or resource contention.

ForcedAgentMessage

Forces the agent to say a specific message or invoke tools. Message Structure
{
  "type": "forced_agent_message",
  "content": "Text for the agent to say",  // Optional (default: "")
  "toolCalls": [  // Optional array of tool calls
    {
      "id": "unique-invocation-id",  // Optional, generated if not provided
      "name": "tool_name",
      "arguments": {
        "param1": "value1"
      }
    }
  ],
  "uninterruptible": false,  // Optional (default: false)
  "urgency": "soon"  // Optional: "immediate" or "soon" (default: "soon")
}
Fields
content
string
default:""
Text content the agent should speak.
toolCalls
array
Array of tool invocations to execute.
uninterruptible
boolean
If true, prevents interruption while the agent speaks this message. (Note that tools are always uninterruptible.)
urgency
string
default:"soon"
Controls when the message is processed. Must be either “immediate” (may interrupt the user or agent) or “soon” (process at next opportunity).

HangUp

Instructs the agent to end the call with an optional farewell message. Message Structure
{
  "type": "hang_up",
  "message": "Goodbye!"
}
Fields
message
string
default:""
Final message to speak before ending the call.

Sending Messages to Live Calls via REST API

The Send Data Message to Call endpoint allows you to inject messages into active calls (calls that are joined and not yet ended).

Supported Message Types

Responses

Successful messages sent via the REST API will receive a 204 No Content response with an empty body. Potential error responses include:
  • 401 Unauthorized: Missing or invalid API key
  • 403 Forbidden: Insufficient authorization
  • 422 Unprocessable Entity: Call is not active (either not joined yet or already ended)