Protocol documentation for messages exchanged between client and server during Ultravox calls.
type
field identifying the message typeMessage | Sender | Description |
---|---|---|
Ping | Client | Measures round-trip data latency. |
Pong | Server | Server reply to a ping message. |
State | Server | Indicates the server’s current state. |
Transcript | Server | Contains text for an utterance made during the call. |
UserTextMessage | Client | Used to send a user message to the agent via text. |
SetOutputMedium | Client | Sets server’s output medium to text or voice. |
ClientToolInvocation | Server | Asks the client to invoke a client tool. |
ClientToolResult | Client | Contains the result of a client tool invocation. |
Debug | Server | Useful for application debugging. Excluded by default. |
CallStarted | Server | Notifies that a call has started. |
PlaybackClearBuffer | Server | Used to clear buffered output audio. WebSocket only. |
type: "ping"
timestamp
: Float. Client timestamp for latency measurement.type: "pong"
timestamp
: Float. Matching ping timestamp.type: "state"
state
: Current session statetype: "transcript"
role
: “user” or “agent”. Who emitted the utterance.medium
: “text” or “voice”. The medium through which the utterance was emitted.text
: String. Full transcript text (exclusive with delta). The full text of the transcript so far. Either this or delta will be set.delta
: String. Incremental transcript update (exclusive with text). The additional transcript text added since the last agent transcript message.final
: Boolean. Whether more updates are expected for this utterance.ordinal
: int. Used for ordering transcripts within a call.type: "user_text_message"
text
: String. The content of the user message.urgency
: “immediate” | “soon” | “later”. Optional, defaults to “soon”. Determines whether this message can interrupt the agent (only “immediate” will interrupt) and whether it should trigger a generation in the absence of other user input (“later” will wait until the next natural generation).Emphasizing the Task | Provide instructions to the agent to emphasize what it should do. (e.g. "Translate, don't respond" )Have your application send information about what the user is doing or looking at. (e.g. "User is viewing property #123-456" ) |
Moving the Conversation Forward | After N turns, tell the agent to try to close the sale or end the call. |
Running Background Processes | Allow the conversation to proceed while a background process runs, then have the agent become aware of the result. |
type: "set_output_medium"
medium
: Either “voice” or “text”.type: "client_tool_invocation"
toolName
: String. Tool to invokeinvocationId
: String. Unique invocation IDparameters
: Dict[String, Any]. Tool parameterstype: "client_tool_result"
agentReaction
: String. Optional. Must be one of the following: “speaks” (the default), “listens”, or “speaks-once”. See Agent Responses to Tools for more.invocationId
: String. Matches corresponding invocation.result
: String. Tool execution result. Often a JSON string. May be omitted for errors.responseType
: String. Defaults to “tool-response”.errorType
: Optional string. Should be omitted when result is set. Otherwise, should be “undefined” if the a tool with the given name does not exist or “implementation-error” otherwise.errorMessage
: String. Error details if failed (optional).type: "debug"
message
: String. Debug informationtype: "call_started"
callId
: String. The ID of the call that has started.type: "playback_clear_buffer"