Protocol & Data Messages

Data messages are used to communicate non-audio information during Ultravox calls. These messages enable real-time control and interaction with ongoing conversations.

Communication Methods

Client Data Channels → Used by our SDKs and WebSocket connections for bi-directional, real-time message exchange during calls. This is the primary method for client apps to interact with calls.
Data Connection → Add a data connection to your call to receive messages via a separate WebSocket connection. This is particularly useful for:
- Telephony integrations where the client doesn’t support WebRTC
- Server-side applications that need to monitor call events or route data to external systems
REST API → Inject messages into active calls via HTTP POST requests. See Sending Messages to Live Calls via REST API below for detailed implementation guidelines.

Messages at a Glance

Details on each message type appear below in Data Message Details.

Client-to-Server Messages

Type	Message	Description
Agent Behavior	ForcedAgentMessage	Forces the agent to say a specific message or invoke tools.
Call Control	HangUp	Instructs the agent to end the call with an optional farewell message.
Call Control	SetOutputMedium	Sets server’s output medium to text or voice.
System	Ping	Measures round-trip data latency.
Tools	ClientToolResult and DataConnectionToolResult	Contains the result of a tool invocation.
User Input	UserTextMessage	Used to send a user message to the agent.

Server-to-Client Messages

Type	Message	Description
Conversation	CallStarted	Provides some basic information about the call at its start.
Conversation	Transcript	Contains text for an utterance made during the call.
System	Debug	Useful for application debugging. Excluded by default.
System	PlaybackClearBuffer	Used to clear buffered output audio. WebSocket only.
System	Pong	Server reply to a ping message.
System	State	Indicates the server’s current state.
Tools	ClientToolInvocation and DataConnectionToolInvocation	Asks the client or data connection to invoke a tool.

Data Message Details

All messages are JSON objects with camelCase keys containing:

A required type field identifying the message type
Additional fields specific to each message type

Ping

A message sent by the client to measure round-trip data message latency. Message Structure

{
  "type": "ping",
  "timestamp": 1234567890.123
}

Fields

timestamp

float

required

Unix timestamp with millisecond precision. Client timestamp for latency measurement.

Pong

A message sent by the server in response to a PingMessage. The timestamp is copied from the PingMessage. Message Structure

{
  "type": "pong",
  "timestamp": 1234567890.123
}

Fields

timestamp

float

required

Echoed timestamp from the original ping message.

State

A message sent by the server to indicate its current state. Message Structure

{
  "type": "state",
  "state": "listening"
}

Fields

state

string

required

Current session state. One of: idle, listening, thinking, or speaking.

Transcript

A message containing text transcripts of user and agent utterances. Message Structure

{
  "type": "transcript",
  "role": "agent",
  "medium": "voice",
  "text": "Full transcript so far",  // Either text or delta will be set
  "delta": null,
  "final": false,
  "ordinal": 1
}

Fields

role

string

required

Who emitted the utterance. Must be either “user” or “agent”.

medium

string

default:"voice"

The medium through which the utterance was emitted. Either “text” or “voice”.

text

string

The full text of the transcript so far. Either this or delta will be set, but not both.

delta

string

The additional transcript text added since the last transcript message. Either this or text will be set, but not both.

final

boolean

required

Whether to expect additional transcript messages for this conversation round.

ordinal

integer

required

The ordinal of the message within the current call, used for ordering transcripts.

UserTextMessage

A user message sent via text. The message appears to the agent as if it came from the user. Message Structure

{
  "type": "user_text_message",
  "text": "Your message here",
  "urgency": "soon"  // Optional, defaults to "soon"
}

Fields

text

string

required

The content of the user message.

urgency

string

default:"soon"

Determines whether this message can interrupt the agent and whether it should trigger a generation. Options:

immediate → Interrupts the agent if speaking and starts a new generation immediately.
soon → Doesn’t interrupt but starts a generation at the next opportunity.
later → Message is considered during the next natural generation without forcing a new generation.

SetOutputMedium

Message sent by the client to set the server’s output medium. Message Structure

{
  "type": "set_output_medium",
  "medium": "voice"
}

Fields

medium

string

required

Output medium to use. Must be either “voice” or “text”.

ClientToolInvocation and DataConnectionToolInvocation

Sent by the server to ask the client or data connection to invoke a tool with the given parameters. The client or data connection is expected to send back a ClientToolResultMessage or DataConnectionToolResultMessage with a matching invocationId. Message Structure

{
  "type": "client_tool_invocation", // Or "data_connection_tool_invocation" for data connections
  "toolName": "get_weather",
  "invocationId": "unique-invocation-id",
  "parameters": {
    "location": "Seattle"
  }
}

Fields

toolName

string

required

Name of the tool to invoke.

invocationId

string

required

Unique identifier for this invocation. Must be included in the corresponding result.

parameters

object

required

Tool-specific parameters as a JSON object.

ClientToolResult and DataConnectionToolResult

Contains the result of a tool invocation. Message Structure

{
  "type": "client_tool_result", // Or "data_connection_tool_result" for data connections
  "invocationId": "matching-invocation-id",
  "result": "Tool execution result",
  "responseType": "tool-response",
  "agentReaction": "speaks",
  "errorType": null,
  "errorMessage": null,
  "updateCallState": null
}

Fields

invocationId

string

required

Must match the invocationId from the corresponding invocation.

result

string

Typically the tool execution result as viewed by the agent, which is often a JSON string. May be omitted for errors. For responseTypes other than tool-response, this may be a JSON string for an object that further specifies how the response should be handled. See special response types.

responseType

string

default:"tool-response"

Type of response being provided. See special response types.

agentReaction

string

default:"speaks"

How the agent should react. Options: “speaks” (default), “listens”, or “speaks-once”. See Agent Responses to Tools for more.

errorType

string

Error classification if the tool failed. Should be omitted when result is set.Options:

undefined → Tool with the given name does not exist
implementation-error → Tool exists but execution failed

errorMessage

string

Human-readable error description if the tool failed. This is not seen by the model but may be used for debugging.

updateCallState

object

Optional state updates to apply to the call. See Tool State for more.

Debug

A message sent by the server to communicate debug information. Disabled by default. Message Structure

{
  "type": "debug",
  "message": "Debug information here"
}

Fields

message

string

required

Debug information or diagnostic details.

Debug messages are disabled by default and must be explicitly enabled for debugging purposes.

CallStarted

Basic call metadata shared by the server when a call begins. Message Structure

{
  "type": "call_started",
  "callId": "550e8400-e29b-41d4-a716-446655440000"
}

Fields

callId

string

required

The UUID of the call that has started.

PlaybackClearBuffer

Message sent by the server to clear buffered output audio. Integrators should drop as much unplayed output audio as possible for interruptions to function properly. Message Structure

{
  "type": "playback_clear_buffer"
}

This message is only used with WebSocket connections. Handling this message allows for larger client buffers while maintaining responsive interrupts. Larger client buffers make choppy audio less likely in the presence of network disruption or resource contention.

ForcedAgentMessage

Forces the agent to say a specific message or invoke tools. Message Structure

{
  "type": "forced_agent_message",
  "content": "Text for the agent to say",  // Optional (default: "")
  "toolCalls": [  // Optional array of tool calls
    {
      "id": "unique-invocation-id",  // Optional, generated if not provided
      "name": "tool_name",
      "arguments": {
        "param1": "value1"
      }
    }
  ],
  "uninterruptible": false,  // Optional (default: false)
  "urgency": "soon"  // Optional: "immediate" or "soon" (default: "soon")
}

Fields

content

string

default:""

Text content the agent should speak.

toolCalls

array

Array of tool invocations to execute.

uninterruptible

boolean

If true, prevents interruption while the agent speaks this message. (Note that tools are always uninterruptible.)

urgency

string

default:"soon"

Controls when the message is processed. Must be either “immediate” (may interrupt the user or agent) or “soon” (process at next opportunity).

HangUp

Instructs the agent to end the call with an optional farewell message. Message Structure

{
  "type": "hang_up",
  "message": "Goodbye!"
}

Fields

message

string

default:""

Final message to speak before ending the call.

Sending Messages to Live Calls via REST API

The Send Data Message to Call endpoint allows you to inject messages into active calls (calls that are joined and not yet ended).

Responses

Successful messages sent via the REST API will receive a 204 No Content response with an empty body. Potential error responses include:

401 Unauthorized: Missing or invalid API key
403 Forbidden: Insufficient authorization
422 Unprocessable Entity: Call is not active (either not joined yet or already ended)

Getting Started

Agents & Calls

Telephony

Web, Apps, Websockets

Tools

Voices

Webhooks

Noise & VAD

Protocol & Data Messages

Communication Methods

Messages at a Glance

Client-to-Server Messages

Server-to-Client Messages

Data Message Details

Ping

Pong

State

Transcript

UserTextMessage

SetOutputMedium

ClientToolInvocation and DataConnectionToolInvocation

ClientToolResult and DataConnectionToolResult

Debug

CallStarted

PlaybackClearBuffer

ForcedAgentMessage

HangUp

Sending Messages to Live Calls via REST API

Supported Message Types

Responses

Getting Started

Agents & Calls

Telephony

Web, Apps, Websockets

Tools

Voices

Webhooks

Noise & VAD

​Communication Methods

​Messages at a Glance

​Client-to-Server Messages

​Server-to-Client Messages

​Data Message Details

​Ping

​Pong

​State

​Transcript

​UserTextMessage

​SetOutputMedium

​ClientToolInvocation and DataConnectionToolInvocation

​ClientToolResult and DataConnectionToolResult

​Debug

​CallStarted

​PlaybackClearBuffer

​ForcedAgentMessage

​HangUp

​Sending Messages to Live Calls via REST API

​Supported Message Types

​Responses

Communication Methods

Messages at a Glance

Client-to-Server Messages

Server-to-Client Messages

Data Message Details

Ping

Pong

State

Transcript

UserTextMessage

SetOutputMedium

ClientToolInvocation and DataConnectionToolInvocation

ClientToolResult and DataConnectionToolResult

Debug

CallStarted

PlaybackClearBuffer

ForcedAgentMessage

HangUp

Sending Messages to Live Calls via REST API

Supported Message Types

Responses