Data Messages
Data messages are used to communicate non-audio information between your client and an Ultravox server during calls. These messages work across WebRTC data channels and WebSocket connections.
All messages are JSON objects with camelCase keys containing:
- A required
type
field identifying the message type - Additional fields specific to each message type
Messages at a Glance
This table provides all messages at a glance. Details on each message type appears below. Sender indicates client or server message. Client messages are sent from the client to the server. Server messages are sent from the server to the client.
Message | Sender | Description |
---|---|---|
Ping | Client | Measures round-trip data latency. |
Pong | Server | Server reply to a ping message. |
State | Server | Indicates the server’s current state. |
Transcript | Server | Contains text for an utterance made during the call. |
InputTextMessage | Client | Used to send a user message to the agent via text. |
SetOutputMedium | Client | Sets server’s output medium to text or voice. |
ClientToolInvocation | Server | Asks the client to invoke a client tool. |
ClientToolResult | Client | Contains the result of a client tool invocation. |
Debug | Server | Useful for application debugging. |
PlaybackClearBuffer | Server | Used to clear buffered output audio. WebSocket only. |
Ping
A message sent by the client to measure round-trip data message latency.
type: "ping"
timestamp
: Float. Client timestamp for latency measurement.
Pong
A message sent by the server in response to a PingMessage. The timestamp is copied from the PingMessage.
type: "pong"
timestamp
: Float. Matching ping timestamp.
State
A message sent by the server to indicate its current state.
type: "state"
state
: Current session state
Transcript
A message containing text transcripts of user and agent utterances.
type: "transcript"
role
: “user” or “agent”. Who emitted the utterance.medium
: “text” or “voice”. The medium through which the utterance was emitted.text
: String. Full transcript text (exclusive with delta). The full text of the transcript so far. Either this or delta will be set.delta
: String. Incremental transcript update (exclusive with text). The additional transcript text added since the last agent transcript message.final
: Boolean. Whether more updates are expected for this utterance.ordinal
: int. Used for ordering transcripts within a call.
InputTextMessage
A user message sent via text.
type: "input_text_message"
text
: String. The content of the user message.
SetOutputMedium
Message sent by the client to set the server’s output medium.
type: "set_output_medium"
medium
: Either “voice” or “text”.
ClientToolInvocation
Sent by the server to ask the client to invoke a client-implemented tool with the given parameters. The client is expected to send back a ClientToolResultMessage with a matching invocation_id.
type: "client_tool_invocation"
tool_name
: String. Tool to invokeinvocation_id
: String. Unique invocation IDparameters
: Dict[String, Any]. Tool parameters
ClientToolResult
Contains the result of a client-implemented tool invocation.
type: "client_tool_result"
invocation_id
: String. Matches corresponding invocation.result
: String. Tool execution result. Often a JSON string. May be omitted for errors.response_type
: String. Defaults to “tool-response”.error_type
: Optional string. Should be omitted when result is set. Otherwise, should be “undefined” if the a tool with the given name does not exist or “implementation-error” otherwise.error_message
: String. Error details if failed (optional).
Debug
A message sent by the server to communicate debug information.
type: "debug"
message
: String. Debug information- Disabled by default
PlaybackClearBuffer
Message sent by our server to clear buffered output audio. Integrators should drop as much unplayed output audio as possible in order for interruptions to function properly.
type: "playback_clear_buffer"
- WebSocket connections only