Communication Methods
- Client Data Channels → Used by our SDKs and WebSocket connections for bi-directional, real-time message exchange during calls. This is the primary method for client apps to interact with calls.
- Data Connection → Add a data connection to your call to receive messages via a separate WebSocket connection. This is particularly useful for:
- Telephony integrations where the client doesn’t support WebRTC
- Server-side applications that need to monitor call events or route data to external systems
- REST API → Inject messages into active calls via HTTP POST requests. See Sending Messages to Live Calls via REST API below for detailed implementation guidelines.
Messages at a Glance
Details on each message type appear below in Data Message Details.Client-to-Server Messages
| Type | Message | Description |
|---|---|---|
| Agent Behavior | ForcedAgentMessage | Forces the agent to say a specific message or invoke tools. |
| Call Control | HangUp | Instructs the agent to end the call with an optional farewell message. |
| Call Control | SetOutputMedium | Sets server’s output medium to text or voice. |
| System | Ping | Measures round-trip data latency. |
| Tools | ClientToolResult and DataConnectionToolResult | Contains the result of a tool invocation. |
| User Input | UserTextMessage | Used to send a user message to the agent. |
Server-to-Client Messages
| Type | Message | Description |
|---|---|---|
| Conversation | CallStarted | Provides some basic information about the call at its start. |
| Conversation | Transcript | Contains text for an utterance made during the call. |
| System | Debug | Useful for application debugging. Excluded by default. |
| System | PlaybackClearBuffer | Used to clear buffered output audio. WebSocket only. |
| System | Pong | Server reply to a ping message. |
| System | State | Indicates the server’s current state. |
| Tools | ClientToolInvocation and DataConnectionToolInvocation | Asks the client or data connection to invoke a tool. |
Data Message Details
All messages are JSON objects with camelCase keys containing:- A required
typefield identifying the message type - Additional fields specific to each message type
Ping
A message sent by the client to measure round-trip data message latency. Message StructureUnix timestamp with millisecond precision. Client timestamp for latency measurement.
Pong
A message sent by the server in response to a PingMessage. The timestamp is copied from the PingMessage. Message StructureEchoed timestamp from the original ping message.
State
A message sent by the server to indicate its current state. Message StructureCurrent session state. One of:
idle, listening, thinking, or speaking.Transcript
A message containing text transcripts of user and agent utterances. Message StructureWho emitted the utterance. Must be either “user” or “agent”.
The medium through which the utterance was emitted. Either “text” or “voice”.
The full text of the transcript so far. Either this or delta will be set, but not both.
The additional transcript text added since the last transcript message. Either this or text will be set, but not both.
Whether to expect additional transcript messages for this conversation round.
The ordinal of the message within the current call, used for ordering transcripts.
UserTextMessage
A user message sent via text. The message appears to the agent as if it came from the user. Message StructureThe content of the user message.
Determines whether this message can interrupt the agent and whether it should trigger a generation. Options:
- immediate → Interrupts the agent if speaking and starts a new generation immediately.
- soon → Doesn’t interrupt but starts a generation at the next opportunity.
- later → Message is considered during the next natural generation without forcing a new generation.
SetOutputMedium
Message sent by the client to set the server’s output medium. Message StructureOutput medium to use. Must be either “voice” or “text”.
ClientToolInvocation and DataConnectionToolInvocation
Sent by the server to ask the client or data connection to invoke a tool with the given parameters. The client or data connection is expected to send back a ClientToolResultMessage or DataConnectionToolResultMessage with a matching invocationId. Message StructureName of the tool to invoke.
Unique identifier for this invocation. Must be included in the corresponding result.
Tool-specific parameters as a JSON object.
ClientToolResult and DataConnectionToolResult
Contains the result of a tool invocation. Message StructureMust match the invocationId from the corresponding invocation.
Typically the tool execution result as viewed by the agent, which is often a JSON string. May be omitted for errors.
For responseTypes other than
tool-response, this may be a JSON string for an object that further specifies how the response should be handled. See special response types.Type of response being provided. See special response types.
How the agent should react. Options: “speaks” (default), “listens”, or “speaks-once”. See Agent Responses to Tools for more.
Error classification if the tool failed. Should be omitted when result is set.Options:
- undefined → Tool with the given name does not exist
- implementation-error → Tool exists but execution failed
Human-readable error description if the tool failed. This is not seen by the model but may be used for debugging.
Optional state updates to apply to the call. See Tool State for more.
Debug
A message sent by the server to communicate debug information. Disabled by default. Message StructureDebug information or diagnostic details.
Debug messages are disabled by default and must be explicitly enabled for debugging purposes.
CallStarted
Basic call metadata shared by the server when a call begins. Message StructureThe UUID of the call that has started.
PlaybackClearBuffer
Message sent by the server to clear buffered output audio. Integrators should drop as much unplayed output audio as possible for interruptions to function properly. Message StructureThis message is only used with WebSocket connections. Handling this message allows for larger client buffers while maintaining responsive interrupts. Larger client buffers make choppy audio less likely in the presence of network disruption or resource contention.
ForcedAgentMessage
Forces the agent to say a specific message or invoke tools. Message StructureText content the agent should speak.
Array of tool invocations to execute.
If true, prevents interruption while the agent speaks this message. (Note that tools are always uninterruptible.)
Controls when the message is processed. Must be either “immediate” (may interrupt the user or agent) or “soon” (process at next opportunity).
HangUp
Instructs the agent to end the call with an optional farewell message. Message StructureFinal message to speak before ending the call.
Sending Messages to Live Calls via REST API
The Send Data Message to Call endpoint allows you to inject messages into active calls (calls that are joined and not yet ended).Supported Message Types
Responses
Successful messages sent via the REST API will receive a204 No Content response with an empty body.
Potential error responses include:
401 Unauthorized: Missing or invalid API key403 Forbidden: Insufficient authorization422 Unprocessable Entity: Call is not active (either not joined yet or already ended)