Threads - Ultravox Docs

Threads allow you to spawn independent, parallel conversations that run alongside the main call. Each thread forks from the parent’s conversation history and state then executes autonomously, generating responses and calling tools without interrupting or blocking the user-facing conversation.

Advanced FeatureThreads are designed for use cases that require background processing or parallel reasoning during a live call. For many applications, the main conversation is sufficient and you won’t need to worry about multi-threading. Make sure you’re comfortable with Tools and Data Messages before diving in.

When to Use Threads

Threads are useful when you need the agent to do work in the background without disrupting the flow of the conversation. Common scenarios include: Background Research → While the agent talks with the user, a thread can query APIs, search databases, or perform analysis and have results ready when needed. Delegated Tasks → Hand off subtasks like generating images or changing code to a thread while the conversation continues. Observer → Have a separate process monitor the conversation to provide real-time status updates or intervene if certain conditions are met.

How to Think About Threads

Ultravox uses the term “thread” to draw on familiar concepts: the communication app concept for a branch of a conversation and the computer science concept for parallel execution. Both are instructive here. Like conversation threads, Ultravox threads rely on context from the parent conversation, but once forked follow their own path. Like computer science threads (or even moreso Dart’s isolates), Ultravox threads run independently and don’t share state with each other except through well-defined channels. If you’re familiar with mobile app development, you’re probably used to a “main” or “UI” thread that handles user interactions and your primary code flow. The same idea applies in Ultravox. When a call starts, you can think of it as running on the “UI” thread (“UI” is the ID used to identify this thread in Ultravox). If you block the UI thread (e.g. with a slow tool call), the conversation will be frozen much like a mobile application waiting on a network request. If the conversation can reasonably continue while that tool runs, you should put that work on a background thread to keep your conversation responsive. In Ultravox, only the UI thread can interface with the user. Only the UI thread receives audio input and produces audio output. In addition, only the UI thread may end the call or start a stage change. Background threads may pass messages to the UI thread to cause it to speak or perform other actions, but they cannot directly produce audio or end the call themselves.

How Threads Work

When you spawn a thread, the system:

Forks the conversation history — The thread gets a copy of the parent’s full message history, including system prompt and tools, at the point of spawning. (It also gets a copy of the parent’s tool state if any.)
Starts running the thread independently — The thread enters its own generate-and-act loop, processing messages and invoking tools on its own.

As a thread runs, it progresses through a series of states:

IDLE → The thread has no work to do, typically because its last response was text with no tool calls. The thread will remain idle until it receives a new message.
GENERATING → The thread is running LLM inference.
CALLING_TOOL → The thread is executing one or more tool calls.
FAILED → The thread encountered an error or exceeded its limits. This is a terminal state. A threadTerminated data message with the failure reason is sent when a thread enters this state.

Unless a thread fails, it will remain available until the call ends or a stage change occurs. All side threads are canceled when the call ends or a stage change occurs.

Spawning a Thread

Threads are created by sending a spawnThread data message. Here’s a basic example:

// Spawn a thread to research a topic in the background
{
  type: "spawn_thread",
  newThreadId: "research-task-1",
  additionalMessages: [
    {
      type: "user_text_message",
      text: "Look up the latest pricing for the Enterprise plan and summarize it."
    }
  ]
}

The server responds with a threadSpawned message on success or a threadRejected message if the thread cannot be created.

Providing Initial Context

The additionalMessages field lets you give the thread its initial instructions. These are appended to the forked conversation history before the thread starts generating. You can use:

user_text_message — Adds a user message.
forced_agent_message — Adds agent messages and tool call/result messages.

If a spawned thread’s history ends with a user message or tool result, it will begin in GENERATING state and immediately start generating a response. If it ends with a tool call, the thread will being in CALLING_TOOL state and immediately start executing that tool. If it ends with an agent message (or an empty history), the thread will begin in IDLE state and wait for a new message before it does any work.

{
  type: "spawn_thread",
  additionalMessages: [
    {
      type: "forced_agent_message",
      toolCalls: [
        {
          toolName: "searchDatabase",
          parameters: { query: "customer billing history for the last 3 months" }
        }
      ]
    }
  ]
}

Setting Thread Limits

You can constrain a thread’s resource usage with the limits field:

{
  type: "spawn_thread",
  newThreadId: "bounded-task",
  limits: {
    generationLimit: 5,               // Max 5 LLM generation rounds
    threadOutputTokenLimit: 2000,      // Max 2000 output tokens total
    generationOutputTokenLimit: 500    // Max 500 output tokens per generation
  },
  additionalMessages: [
    { type: "user_text_message", text: "Summarize the conversation so far in 2-3 sentences." }
  ]
}

See the data message for all the limits that may be imposed. Note that the per-generation limits may be additionally capped by Ultravox. When any limit is exceeded, the thread is marked FAILED and a threadTerminated data message with the reason is sent.

Concurrency LimitThere is also a platform-enforced concurrency limit that cannot be set. A call can create as many threads as are useful, but the number actively generating at the same time may be limited. If a thread attempts a generation when the concurrency limit is reached, its generation request will be enqueued and will run once another thread finishes generating. (This concurrency limit is unrelated to the number of calls that can occur at the same time - it only affects side thread generations within a single call.)

Filtering Available Tools

Use toolFilter to restrict which tools a thread can access:

{
  type: "spawn_thread",
  toolFilter: {
    allowedTools: ["searchDatabase", "formatResponse"]  // Only these tools
  },
  additionalMessages: [
    { type: "user_text_message", text: "Search for recent orders." }
  ]
}

See the data message for filtering details. Note that since threads inherit their parent’s conversation history, they are also often aware of tools they aren’t allowed to use. If the model attempts to call a filtered tool, it will immediately get a response indicating the tool is unavailable so it can start a new generation while adjusting accordingly.

Thread Communication

Threads communicate with each other (including the UI thread) through tools. Other participants may communicate with threads using data messages.

Pushing Messages from one Thread to Another

Tools may return a send-to-thread response type, which routes a message to the specified thread. This allows threads to:

Report findings back to their parent
Delegate tasks to other threads
Coordinate multi-step workflows
Synchronize state
Effect a change on the UI thread (e.g., speak a message, end the call)

A send-to-thread response includes a callingThreadResultText that determines the tool result message added to the calling thread’s conversation history and a dataMessage field that specifies the message to send and the target thread. The message can be a UserTextMessage, a ForcedAgentMessage, or a SpawnThreadMessage. For spawn messages, the message is handled by the thread indicated by the parentThreadId. If that id is the same as the calling thread, the conversation history is forked before the tool call and response are added. For user and agent messages, the message is added to the relevant thread’s queue. For the UI thread, the message’s urgency determines whether the message can interrupt (“immediate” urgency) and whether it can cause a generation on its own or should be deferred until the next natural generation (“later” urgency). For side threads, messages are always treated as if they had “soon” urgency. They never interrupt what the thread is doing, and they can always cause a new generation (or tool invocation). If the side thread is IDLE, it will transition to handle the message immediately. Otherwise it will add the message to its conversation history at the next opportunity (e.g., after finishing its current generation or tool call).

The _PARENT thread idTools invoked on side threads may use the special value _PARENT as the target thread id to send a message to their parent thread, allowing messages to be pushed to the parent thread even if its id is not known by the child thread.

Sending Data Messages to a Thread

You can send messages to a thread by including threadId on a user_text_message or forced_agent_message:

// Send a follow-up message to an existing thread
{
  type: "user_text_message",
  text: "Now check for any pending refunds.",
  threadId: "research-task-1"
}

Similarly, you can spawn a new thread from a side thread by including parentThreadId on a spawn_thread message:

// Spawn a new thread from an existing thread
{
  type: "spawn_thread",
  parentThreadId: "research-task-1",
  newThreadId: "research-task-1-subtask",
  additionalMessages: [
    {
      type: "user_text_message",
      text: "Also look up the refund policy for our top 3 competitors then call the `onComplete` tool with your findings."
    }
  ]
}

In either case, the message is handled identically to if another thread had sent the message via a tool response. If thread_id (or parentThreadId for spawning) is not provided, the message is handled by the UI thread by default.

Pulling Thread State

In addition to pushed messages, threads can pull the state of other threads by invoking a tool that uses the THREAD_STATES automatic parameter. This can be useful for monitoring progress or coordinating actions across threads. For example, if a user asks how some subtask is progressing, the UI thread may check the relevant thread’s state to let the user know how it’s doing. When the THREAD_STATES automatic parameter is requested, the tool call will include an object with an entry for each thread_id (excluding “UI”) and the thread’s current state. If the thread is idle, the tool will also receive the thread’s last response text, if any. For example:

{
  "research-task-1": {
    "state": "GENERATING",
  },
  "research-task-1-subtask": {
    "state": "IDLE",
    "lastResponse": "Our competitors each offer 30-day no-fault returns."
  }
}

Observability

During a call

During a call, your client or data connection can listen for data messages to know what threads are doing. See Thread Messages for the full message reference.

Thread Messages

Message	Description
`threadSpawned`	A thread was successfully spawned.
`threadRejected`	A thread spawn was rejected (e.g., due to duplicate ID).
`threadTerminated`	A thread was terminated due to failure or cancellation.

Generation Messages

Message	Description
`sideGenerationDelta`	Streaming text deltas as the thread generates (similar to transcript deltas).
`sideGenerationCompleted`	The full response text and any tool calls once generation finishes.

After a call

Thread activity is recorded as call events that you can query after the call ends. These are useful for debugging and monitoring thread behavior.

Thread Lifecycle Events

Event	Description
`thread.canceled`	The thread was canceled (e.g., call ended or stage changed).
`thread.limit_reached`	The thread exceeded one of its configured resource limits.
`thread.failed`	The thread failed for another reason.

These events include total_input_tokens, total_output_tokens, and total_generations in their extras.

Side Generation Events

Event	Description
`side_generation.completed`	A thread’s generation succeeded.
`side_generation.failed`	A thread’s generation encountered an error.
`side_generation.canceled`	A thread’s generation was canceled.

These events include token usage in their extras.

Thread Communication Events

Event	Description
`thread_comms.non_existent_thread`	A message was sent to a thread that doesn’t exist.
`thread_comms.failed_thread`	A message was sent to a thread that has already failed.
`thread_comms.unsupported_call_change`	A thread attempted an unsupported call change (e.g., hang-up or stage change directly from a side thread).

Thread Billing

Threads may increase the cost of a call based on their additional model usage. Invoking tools is free. Generations are charged using traditional token-based pricing, except that cached input tokens are free. Since threads are always forked from a parent, input tokens for existing messages will be cached, so only the additional messages added when spawning plus any messages created by the thread itself contribute to input token usage. See our pricing page for the cost of uncached input tokens and output tokens.

Examples

Here are some concrete examples of specific use cases. When tools are involved, the examples use client tools to show the response succinctly, but the same principles work equivalently with data connection and http tools as well.

Moving a tool to a background thread

Suppose you currently have a tool that is very slow but that the agent doesn’t necessarily need a response from to continue the conversation. You can build a better experience for your users by moving the tool’s work to a background thread so the conversation can continue while the tool runs.

Add the THREAD_ID automatic parameter to your tool definition so that your tool implementation knows which thread is invoking it.

When the tool is invoked, if the thread id is “UI”, immediately respond with a send-to-thread response type with a SpawnThread message that includes the same tool call. This will cause the tool call to be re-executed on a background thread.

const newThreadId = crypto.randomUUID();
{
  "type": "client_tool_result",
  "invocationId": "matching-invocation-id",
  "result": JSON.stringify({
    "callingThreadResultText": `Tool started in background thread ${newThreadId}. The result will be added once it's ready.`,
    "dataMessage": {
      "type": "spawn_thread",
      "newThreadId": newThreadId,
      "additionalMessages": [
        {
          "type": "forced_agent_message",
          "toolCalls": [
            {
              "name": "myTool",
              "arguments": { /* same parameters as original tool call */ }
            }
          ]
        }
      ],
      "limits": {
        "generationLimit": 0 // Optional. Disallows any generations since this thread only exists to execute this tool call.
      }
    }
  }),
  "responseType": "send-to-thread",
  "agentReaction": "speaks"
}

When your tool is called again, the thread id automatic parameter argument will now be your newThreadId instead of “UI”, so you can execute the tool logic as normal without worrying about blocking the conversation. When the tool finishes, the thread can send its results back to the UI thread with another send-to-thread response type.

{
  "type": "client_tool_result",
  "invocationId": "new-matching-invocation-id",
  "result": JSON.stringify({
    "callingThreadResultText": "Sent result to UI thread.",
    "dataMessage": {
      "type": "user_text_message",
      "text": `<prior_tool_result toolName="myTool" threadId=${newTheadId}>The tool's real result.</prior_tool_result>`,
    }
  }),
  "responseType": "send-to-thread",
  "agentReaction": "listens" // Causes the thread to switch to IDLE instead of GENERATING since it doesn't need to respond to this tool result.
}

Note that there’s nothing special about the format for this user data message response. Any format you want may work, but ideally you should mention the format in your system prompt so that the model knows what to expect and how to handle it. If your UI thread has a checkThreadState tool available, you may choose to structure the response as a ForcedAgentMessage with a tool call to checkThreadState and a known result that includes the tool result. That way the model thinks it just asked for the thread’s state and doesn’t have to reason about the message’s source. Alternatively, you could use a ForcedAgentMessage with the real tool call and result, though that may lead the model to believe it can call this tool again and get an immediate response. It’s important to consider your agent’s specific circumstances when evaluating these trade-offs.

Causing hang-up from a side thread

Suppose you have a background thread that is able to use slow tools or more reasoning to decide whether a call is worth continuing. At some point, this thread may determine that the call should be ended. Background threads aren’t allowed to end a call directly, but they can force the UI thread to do so. Instead of responding to a tool call with hang-up, respond with send-to-thread and a ForcedAgentMessage with the relevant tool result:

{
  "type": "client_tool_result",
  "invocationId": "matching-invocation-id",
  "result": JSON.stringify({
    "callingThreadResultText": "Sent result to UI thread.",
    "dataMessage": {
      "type": "forced_agent_message",
      "urgency": "immediate", // Interrupt whatever is currently happening on the UI thread.
      "toolCalls": [
        {
          "id": "myId",
          "name": "hangUp",
          "arguments": {}
        }
      ],
      "knownToolResults": [
        {
          "invocationId": "myId",
          "result": {
            "strict": true,
            "message": "Goodbye."
          },
          "responseType": "hang-up"
        }
      ]
    }
  }),
  "responseType": "send-to-thread",
  "agentReaction": "listens"
}

The same pattern may be used to force the agent to speak a particular message or perform other actions that need to be synchronized on the UI thread, such as transfering the call or starting a stage change.

Having a thread re-sync conversation state following a tool call

Suppose you want another observer to periodically check how a call is going and then call a tool to report progress or effect changes. Instead of spawning a new single-shot thread for each check, you can have the thread re-sync its conversation history after each tool invocation by responding with a send-to-thread response with a SpawnThread message that uses “UI” as the parentThreadId, sets its own threadId as the newThreadId, and sets ifExists to “replace”. This will cause the thread to be terminated and re-spawned with a fresh copy of the UI thread’s conversation history.

{
  "type": "client_tool_result",
  "invocationId": "matching-invocation-id",
  "result": JSON.stringify({
    "callingThreadResultText": "OK",
    "dataMessage": {
      "type": "spawn_thread",
      "parentThreadId": "UI",
      "newThreadId": "monitor", // Same as existing thread id
      "ifExists": "replace", // Terminate existing thread and spawn a new one with the same ID
      "additionalMessages": [
        // If there are messages you want to retain from the thread's current history, include them here.
        // You can use the CONVERSATION_HISTORY automatic parameter as usual to get the thread's history.
        {
          "type": "user_text_message",
          // Restart the analysis to continue the loop
          "text": "Analyze the conversation and <do something>. Take your time and think through your response then invoke myTool to report in."
        }
      ]
    }
  }),
  "responseType": "send-to-thread",
  "agentReaction": "listens"
}

Next Steps

Data Messages Reference

Full reference for all thread-related data messages and their fields.

Async Tools

Learn about other ways to speed up your tools and keep your conversations responsive.

​When to Use Threads

​How to Think About Threads

​How Threads Work

​Spawning a Thread

​Providing Initial Context

​Setting Thread Limits

​Filtering Available Tools

​Thread Communication

​Pushing Messages from one Thread to Another

​Sending Data Messages to a Thread

​Pulling Thread State

​Observability

​During a call

​Thread Messages

​Generation Messages

​After a call

​Thread Lifecycle Events

​Side Generation Events

​Thread Communication Events

​Thread Billing

​Examples

​Moving a tool to a background thread

​Causing hang-up from a side thread

​Having a thread re-sync conversation state following a tool call

​Next Steps

Data Messages Reference

Async Tools

When to Use Threads

How to Think About Threads

How Threads Work

Spawning a Thread

Providing Initial Context

Setting Thread Limits

Filtering Available Tools

Thread Communication

Pushing Messages from one Thread to Another

Sending Data Messages to a Thread

Pulling Thread State

Observability

During a call

Thread Messages

Generation Messages

After a call

Thread Lifecycle Events

Side Generation Events

Thread Communication Events

Thread Billing

Examples

Moving a tool to a background thread

Causing hang-up from a side thread

Having a thread re-sync conversation state following a tool call

Next Steps