Data messages are used to communicate non-audio information between your client and an Ultravox server during calls. These messages work across WebRTC data channels and WebSocket connections.

All messages are JSON objects with camelCase keys containing:

  • A required type field identifying the message type
  • Additional fields specific to each message type

Messages at a Glance

This table provides all messages at a glance. Details on each message type appears below. Sender indicates client or server message. Client messages are sent from the client to the server. Server messages are sent from the server to the client.

MessageSenderDescription
PingClientMeasures round-trip data latency.
PongServerServer reply to a ping message.
StateServerIndicates the server’s current state.
TranscriptServerContains text for an utterance made during the call.
InputTextMessageClientUsed to send a user message to the agent via text.
SetOutputMediumClientSets server’s output medium to text or voice.
ClientToolInvocationServerAsks the client to invoke a client tool.
ClientToolResultClientContains the result of a client tool invocation.
DebugServerUseful for application debugging.
PlaybackClearBufferServerUsed to clear buffered output audio. WebSocket only.

Ping

A message sent by the client to measure round-trip data message latency.

  • type: "ping"
  • timestamp: Float. Client timestamp for latency measurement.

Pong

A message sent by the server in response to a PingMessage. The timestamp is copied from the PingMessage.

  • type: "pong"
  • timestamp: Float. Matching ping timestamp.

State

A message sent by the server to indicate its current state.

  • type: "state"
  • state: Current session state

Transcript

A message containing text transcripts of user and agent utterances.

  • type: "transcript"
  • role: “user” or “agent”. Who emitted the utterance.
  • medium: “text” or “voice”. The medium through which the utterance was emitted.
  • text: String. Full transcript text (exclusive with delta). The full text of the transcript so far. Either this or delta will be set.
  • delta: String. Incremental transcript update (exclusive with text). The additional transcript text added since the last agent transcript message.
  • final: Boolean. Whether more updates are expected for this utterance.
  • ordinal: int. Used for ordering transcripts within a call.

InputTextMessage

A user message sent via text.

  • type: "input_text_message"
  • text: String. The content of the user message.

SetOutputMedium

Message sent by the client to set the server’s output medium.

  • type: "set_output_medium"
  • medium: Either “voice” or “text”.

ClientToolInvocation

Sent by the server to ask the client to invoke a client-implemented tool with the given parameters. The client is expected to send back a ClientToolResultMessage with a matching invocation_id.

  • type: "client_tool_invocation"
  • tool_name: String. Tool to invoke
  • invocation_id: String. Unique invocation ID
  • parameters: Dict[String, Any]. Tool parameters

ClientToolResult

Contains the result of a client-implemented tool invocation.

  • type: "client_tool_result"
  • invocation_id: String. Matches corresponding invocation.
  • result: String. Tool execution result. Often a JSON string. May be omitted for errors.
  • response_type: String. Defaults to “tool-response”.
  • error_type: Optional string. Should be omitted when result is set. Otherwise, should be “undefined” if the a tool with the given name does not exist or “implementation-error” otherwise.
  • error_message: String. Error details if failed (optional).

Debug

A message sent by the server to communicate debug information.

  • type: "debug"
  • message: String. Debug information
  • Disabled by default

PlaybackClearBuffer

Message sent by our server to clear buffered output audio. Integrators should drop as much unplayed output audio as possible in order for interruptions to function properly.

  • type: "playback_clear_buffer"
  • WebSocket connections only