Advanced FeatureThreads are designed for use cases that require background processing or parallel reasoning during a live call. For many applications, the main
conversation is sufficient and you won’t need to worry about multi-threading. Make sure you’re comfortable with Tools
and Data Messages before diving in.
When to Use Threads
Threads are useful when you need the agent to do work in the background without disrupting the flow of the conversation. Common scenarios include: Background Research → While the agent talks with the user, a thread can query APIs, search databases, or perform analysis and have results ready when needed. Delegated Tasks → Hand off subtasks like generating images or changing code to a thread while the conversation continues. Observer → Have a separate process monitor the conversation to provide real-time status updates or intervene if certain conditions are met.How to Think About Threads
Ultravox uses the term “thread” to draw on familiar concepts: the communication app concept for a branch of a conversation and the computer science concept for parallel execution. Both are instructive here. Like conversation threads, Ultravox threads rely on context from the parent conversation, but once forked follow their own path. Like computer science threads (or even moreso Dart’s isolates), Ultravox threads run independently and don’t share state with each other except through well-defined channels. If you’re familiar with mobile app development, you’re probably used to a “main” or “UI” thread that handles user interactions and your primary code flow. The same idea applies in Ultravox. When a call starts, you can think of it as running on the “UI” thread (“UI” is the ID used to identify this thread in Ultravox). If you block the UI thread (e.g. with a slow tool call), the conversation will be frozen much like a mobile application waiting on a network request. If the conversation can reasonably continue while that tool runs, you should put that work on a background thread to keep your conversation responsive. In Ultravox, only the UI thread can interface with the user. Only the UI thread receives audio input and produces audio output. In addition, only the UI thread may end the call or start a stage change. Background threads may pass messages to the UI thread to cause it to speak or perform other actions, but they cannot directly produce audio or end the call themselves.How Threads Work
When you spawn a thread, the system:- Forks the conversation history — The thread gets a copy of the parent’s full message history, including system prompt and tools, at the point of spawning. (It also gets a copy of the parent’s tool state if any.)
- Starts running the thread independently — The thread enters its own generate-and-act loop, processing messages and invoking tools on its own.
- IDLE → The thread has no work to do, typically because its last response was text with no tool calls. The thread will remain idle until it receives a new message.
- GENERATING → The thread is running LLM inference.
- CALLING_TOOL → The thread is executing one or more tool calls.
- FAILED → The thread encountered an error or exceeded its limits. This is a terminal state. A
threadTerminateddata message with the failure reason is sent when a thread enters this state.
Spawning a Thread
Threads are created by sending aspawnThread data message. Here’s a basic example:
threadSpawned message on success or a threadRejected message if the thread cannot be created.
Providing Initial Context
TheadditionalMessages field lets you give the thread its initial instructions. These are appended to the forked conversation history before the thread starts generating. You can use:
user_text_message— Adds a user message.forced_agent_message— Adds agent messages and tool call/result messages.
Setting Thread Limits
You can constrain a thread’s resource usage with thelimits field:
threadTerminated data message with the reason is sent.
Concurrency LimitThere is also a platform-enforced concurrency limit that cannot be set. A call can create as many threads as are useful, but the number
actively generating at the same time may be limited. If a thread attempts a generation when the concurrency limit is reached,
its generation request will be enqueued and will run once another thread finishes generating. (This concurrency limit is unrelated
to the number of calls that can occur at the same time - it only affects side thread generations within a single call.)
Filtering Available Tools
UsetoolFilter to restrict which tools a thread can access:
Thread Communication
Threads communicate with each other (including the UI thread) through tools. Other participants may communicate with threads using data messages.Pushing Messages from one Thread to Another
Tools may return asend-to-thread response type, which routes a message to the specified thread. This allows threads to:
- Report findings back to their parent
- Delegate tasks to other threads
- Coordinate multi-step workflows
- Synchronize state
- Effect a change on the UI thread (e.g., speak a message, end the call)
send-to-thread response includes a callingThreadResultText that determines the tool result message added to the calling thread’s
conversation history and a dataMessage field that specifies the message to send and the target thread.
The message can be a UserTextMessage, a ForcedAgentMessage, or a SpawnThreadMessage.
For spawn messages, the message is handled by the thread indicated by the parentThreadId. If that id is the same as the calling thread,
the conversation history is forked before the tool call and response are added.
For user and agent messages, the message is added to the relevant thread’s queue. For the UI thread, the message’s urgency determines whether the
message can interrupt (“immediate” urgency) and whether it can cause a generation on its own or should be deferred until the next natural generation
(“later” urgency). For side threads, messages are always treated as if they had “soon” urgency. They never interrupt what the thread is doing,
and they can always cause a new generation (or tool invocation). If the side thread is IDLE, it will transition to handle the message immediately.
Otherwise it will add the message to its conversation history at the next opportunity (e.g., after finishing its current generation or tool call).
The
_PARENT thread idTools invoked on side threads may use the special value _PARENT as the target thread id to send a message to their parent thread,
allowing messages to be pushed to the parent thread even if its id is not known by the child thread.Sending Data Messages to a Thread
You can send messages to a thread by includingthreadId on a user_text_message or forced_agent_message:
parentThreadId on a spawn_thread message:
thread_id (or
parentThreadId for spawning) is not provided, the message is handled by the UI thread by default.
Pulling Thread State
In addition to pushed messages, threads can pull the state of other threads by invoking a tool that uses the THREAD_STATES automatic parameter. This can be useful for monitoring progress or coordinating actions across threads. For example, if a user asks how some subtask is progressing, the UI thread may check the relevant thread’s state to let the user know how it’s doing. When the THREAD_STATES automatic parameter is requested, the tool call will include an object with an entry for each thread_id (excluding “UI”) and the thread’s current state. If the thread is idle, the tool will also receive the thread’s last response text, if any. For example:Observability
During a call
During a call, your client or data connection can listen for data messages to know what threads are doing. See Thread Messages for the full message reference.Thread Messages
| Message | Description |
|---|---|
threadSpawned | A thread was successfully spawned. |
threadRejected | A thread spawn was rejected (e.g., due to duplicate ID). |
threadTerminated | A thread was terminated due to failure or cancellation. |
Generation Messages
| Message | Description |
|---|---|
sideGenerationDelta | Streaming text deltas as the thread generates (similar to transcript deltas). |
sideGenerationCompleted | The full response text and any tool calls once generation finishes. |
After a call
Thread activity is recorded as call events that you can query after the call ends. These are useful for debugging and monitoring thread behavior.Thread Lifecycle Events
| Event | Description |
|---|---|
thread.canceled | The thread was canceled (e.g., call ended or stage changed). |
thread.limit_reached | The thread exceeded one of its configured resource limits. |
thread.failed | The thread failed for another reason. |
total_input_tokens, total_output_tokens, and total_generations in their extras.
Side Generation Events
| Event | Description |
|---|---|
side_generation.completed | A thread’s generation succeeded. |
side_generation.failed | A thread’s generation encountered an error. |
side_generation.canceled | A thread’s generation was canceled. |
Thread Communication Events
| Event | Description |
|---|---|
thread_comms.non_existent_thread | A message was sent to a thread that doesn’t exist. |
thread_comms.failed_thread | A message was sent to a thread that has already failed. |
thread_comms.unsupported_call_change | A thread attempted an unsupported call change (e.g., hang-up or stage change directly from a side thread). |
Thread Billing
Threads may increase the cost of a call based on their additional model usage. Invoking tools is free. Generations are charged using traditional token-based pricing, except that cached input tokens are free. Since threads are always forked from a parent, input tokens for existing messages will be cached, so only the additional messages added when spawning plus any messages created by the thread itself contribute to input token usage. See our pricing page for the cost of uncached input tokens and output tokens.Examples
Here are some concrete examples of specific use cases. When tools are involved, the examples use client tools to show the response succinctly, but the same principles work equivalently with data connection and http tools as well.Moving a tool to a background thread
Suppose you currently have a tool that is very slow but that the agent doesn’t necessarily need a response from to continue the conversation. You can build a better experience for your users by moving the tool’s work to a background thread so the conversation can continue while the tool runs.- Add the THREAD_ID automatic parameter to your tool definition so that your tool implementation knows which thread is invoking it.
-
When the tool is invoked, if the thread id is “UI”, immediately respond with a
send-to-threadresponse type with aSpawnThreadmessage that includes the same tool call. This will cause the tool call to be re-executed on a background thread. -
When your tool is called again, the thread id automatic parameter argument will now be your
newThreadIdinstead of “UI”, so you can execute the tool logic as normal without worrying about blocking the conversation. When the tool finishes, the thread can send its results back to the UI thread with anothersend-to-threadresponse type.
checkThreadState tool available, you may choose to
structure the response as a ForcedAgentMessage with a tool call to checkThreadState and a known result that includes the tool result. That way the model thinks
it just asked for the thread’s state and doesn’t have to reason about the message’s source. Alternatively, you could use a ForcedAgentMessage with the real tool
call and result, though that may lead the model to believe it can call this tool again and get an immediate response. It’s important to consider your agent’s
specific circumstances when evaluating these trade-offs.
Causing hang-up from a side thread
Suppose you have a background thread that is able to use slow tools or more reasoning to decide whether a call is worth continuing. At some point, this thread may determine that the call should be ended. Background threads aren’t allowed to end a call directly, but they can force the UI thread to do so. Instead of responding to a tool call withhang-up,
respond with send-to-thread and a ForcedAgentMessage with the relevant tool result:
Having a thread re-sync conversation state following a tool call
Suppose you want another observer to periodically check how a call is going and then call a tool to report progress or effect changes. Instead of spawning a new single-shot thread for each check, you can have the thread re-sync its conversation history after each tool invocation by responding with asend-to-thread response with a SpawnThread message that uses “UI” as the parentThreadId, sets its own threadId as the
newThreadId, and sets ifExists to “replace”. This will cause the thread to be terminated and re-spawned with a fresh copy of the UI thread’s
conversation history.
Next Steps
Data Messages Reference
Full reference for all thread-related data messages and their fields.
Async Tools
Learn about other ways to speed up your tools and keep your conversations responsive.