> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ultravox.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Threads

> Run parallel agent conversations alongside your main call.

Threads allow you to spawn independent, parallel conversations that run alongside the main call. Each thread forks from the parent's conversation history and state
then executes autonomously, generating responses and calling tools without interrupting or blocking the user-facing conversation.

<Note>
  <b>Advanced Feature</b>

  Threads are designed for use cases that require background processing or parallel reasoning during a live call. For many applications, the main
  conversation is sufficient and you won't need to worry about multi-threading. **Make sure you're comfortable with [Tools](/tools/overview)
  and [Data Messages](/apps/datamessages) before diving in.**
</Note>

## When to Use Threads

Threads are useful when you need the agent to do work in the background without disrupting the flow of the conversation. Common scenarios include:

**Background Research** → While the agent talks with the user, a thread can query APIs, search databases, or perform analysis and have results ready when needed.

**Delegated Tasks** → Hand off subtasks like generating images or changing code to a thread while the conversation continues.

**Observer** → Have a separate process monitor the conversation to provide real-time status updates or intervene if certain conditions are met.

## How to Think About Threads

Ultravox uses the term "thread" to draw on familiar concepts: the communication app concept for a branch of a conversation and the computer science
concept for parallel execution. Both are instructive here.

Like conversation threads, Ultravox threads rely on context from the parent conversation, but once forked follow their own path.
Like computer science threads (or even moreso Dart's [isolates](https://dart.dev/language/isolates)), Ultravox threads run independently and
don't share state with each other except through well-defined channels.

If you're familiar with mobile app development, you're probably used to a "main" or "UI" thread that handles user interactions and your primary
code flow. The same idea applies in Ultravox. When a call starts, you can think of it as running on the "UI" thread ("UI" is the ID used to
identify this thread in Ultravox). If you block the UI thread (e.g. with a slow tool call), the conversation will be frozen much like a mobile
application waiting on a network request. If the conversation can reasonably continue while that tool runs, you should put that work on a
background thread to keep your conversation responsive.

In Ultravox, only the UI thread can interface with the user. Only the UI thread receives audio input and produces audio output. In addition,
only the UI thread may end the call or start a stage change. Background threads may pass messages to the UI thread to cause it to speak or
perform other actions, but they cannot directly produce audio or end the call themselves.

## How Threads Work

When you spawn a thread, the system:

1. **Forks the conversation history** — The thread gets a copy of the parent's full message history, including system prompt and tools, at the point of spawning. (It also gets a copy of the parent's [tool state](/agents/guiding-agents#tool-state) if any.)
2. **Starts running the thread independently** — The thread enters its own generate-and-act loop, processing messages and invoking tools on its own.

As a thread runs, it progresses through a series of states:

* **IDLE** → The thread has no work to do, typically because its last response was text with no tool calls. The thread will remain idle until it receives a new message.
* **GENERATING** → The thread is running LLM inference.
* **CALLING\_TOOL** → The thread is executing one or more tool calls.
* **FAILED** → The thread encountered an error or exceeded its limits. This is a terminal state. A [`threadTerminated`](/apps/datamessages#threadterminated) data message with the failure reason is sent when a thread enters this state.

Unless a thread fails, it will remain available until the call ends or a stage change occurs. **All side threads
are canceled when the call ends or a stage change occurs.**

## Spawning a Thread

Threads are created by sending a [`spawnThread`](/apps/datamessages#spawnthread) data message. Here's a basic example:

```js theme={null}
// Spawn a thread to research a topic in the background
{
  type: "spawn_thread",
  newThreadId: "research-task-1",
  additionalMessages: [
    {
      type: "user_text_message",
      text: "Look up the latest pricing for the Enterprise plan and summarize it."
    }
  ]
}
```

The server responds with a [`threadSpawned`](/apps/datamessages#threadspawned) message on success or a [`threadRejected`](/apps/datamessages#threadrejected) message if the thread cannot be created.

### Providing Initial Context

The `additionalMessages` field lets you give the thread its initial instructions. These are appended to the forked conversation history before the thread starts generating. You can use:

* **`user_text_message`** — Adds a user message.
* **`forced_agent_message`** — Adds agent messages and tool call/result messages.

If a spawned thread's history ends with a user message or tool result, it will begin in GENERATING state and immediately start generating a response. If it ends with a tool call,
the thread will being in CALLING\_TOOL state and immediately start executing that tool. If it ends with an agent message (or an empty history), the thread will begin in IDLE state
and wait for a new message before it does any work.

```js theme={null}
{
  type: "spawn_thread",
  additionalMessages: [
    {
      type: "forced_agent_message",
      toolCalls: [
        {
          toolName: "searchDatabase",
          parameters: { query: "customer billing history for the last 3 months" }
        }
      ]
    }
  ]
}
```

### Setting Thread Limits

You can constrain a thread's resource usage with the `limits` field:

```js theme={null}
{
  type: "spawn_thread",
  newThreadId: "bounded-task",
  limits: {
    generationLimit: 5,               // Max 5 LLM generation rounds
    threadOutputTokenLimit: 2000,      // Max 2000 output tokens total
    generationOutputTokenLimit: 500    // Max 500 output tokens per generation
  },
  additionalMessages: [
    { type: "user_text_message", text: "Summarize the conversation so far in 2-3 sentences." }
  ]
}
```

See the [data message](/apps/datamessages#spawnthread) for all the limits that may be imposed. Note that the per-generation
limits may be additionally capped by Ultravox.

When any limit is exceeded, the thread is marked FAILED and a `threadTerminated` data message with the reason is sent.

<Note>
  <b>Concurrency Limit</b>

  There is also a platform-enforced concurrency limit that cannot be set. A call can create as many threads as are useful, but the number
  actively generating at the same time may be limited. If a thread attempts a generation when the concurrency limit is reached,
  its generation request will be enqueued and will run once another thread finishes generating. (This concurrency limit is unrelated
  to the number of calls that can occur at the same time - it only affects side thread generations within a single call.)
</Note>

### Filtering Available Tools

Use `toolFilter` to restrict which tools a thread can access:

```js theme={null}
{
  type: "spawn_thread",
  toolFilter: {
    allowedTools: ["searchDatabase", "formatResponse"]  // Only these tools
  },
  additionalMessages: [
    { type: "user_text_message", text: "Search for recent orders." }
  ]
}
```

See the [data message](/apps/datamessages#spawnthread) for filtering details. Note that since threads inherit their parent's
conversation history, they are also often aware of tools they aren't allowed to use. If the model attempts to call a filtered tool,
it will immediately get a response indicating the tool is unavailable so it can start a new generation while adjusting accordingly.

## Thread Communication

Threads communicate with each other (including the UI thread) through tools. Other participants may communicate
with threads using data messages.

### Pushing Messages from one Thread to Another

Tools may return a `send-to-thread` response type, which routes a message to the specified thread. This allows threads to:

* Report findings back to their parent
* Delegate tasks to other threads
* Coordinate multi-step workflows
* Synchronize state
* Effect a change on the UI thread (e.g., speak a message, end the call)

A `send-to-thread` response includes a `callingThreadResultText` that determines the tool result message added to the calling thread's
conversation history and a `dataMessage` field that specifies the message to send and the target thread.
The message can be a `UserTextMessage`, a `ForcedAgentMessage`, or a `SpawnThreadMessage`.

For spawn messages, the message is handled by the thread indicated by the `parentThreadId`. If that id is the same as the calling thread,
the conversation history is forked *before* the tool call and response are added.

For user and agent messages, the message is added to the relevant thread's queue. For the UI thread, the message's urgency determines whether the
message can interrupt ("immediate" urgency) and whether it can cause a generation on its own or should be deferred until the next natural generation
("later" urgency). For side threads, messages are always treated as if they had "soon" urgency. They never interrupt what the thread is doing,
and they can always cause a new generation (or tool invocation). If the side thread is IDLE, it will transition to handle the message immediately.
Otherwise it will add the message to its conversation history at the next opportunity (e.g., after finishing its current generation or tool call).

<Note>
  <b>The `_PARENT` thread id</b>

  Tools invoked on side threads may use the special value `_PARENT` as the target thread id to send a message to their parent thread,
  allowing messages to be pushed to the parent thread even if its id is not known by the child thread.
</Note>

### Sending Data Messages to a Thread

You can send messages to a thread by including `threadId` on a `user_text_message` or `forced_agent_message`:

```js theme={null}
// Send a follow-up message to an existing thread
{
  type: "user_text_message",
  text: "Now check for any pending refunds.",
  threadId: "research-task-1"
}
```

Similarly, you can spawn a new thread from a side thread by including `parentThreadId` on a `spawn_thread` message:

```js theme={null}
// Spawn a new thread from an existing thread
{
  type: "spawn_thread",
  parentThreadId: "research-task-1",
  newThreadId: "research-task-1-subtask",
  additionalMessages: [
    {
      type: "user_text_message",
      text: "Also look up the refund policy for our top 3 competitors then call the `onComplete` tool with your findings."
    }
  ]
}
```

In either case, the message is handled identically to if another thread had sent the message via a tool response. If `thread_id` (or
`parentThreadId` for spawning) is not provided, the message is handled by the UI thread by default.

### Pulling Thread State

In addition to pushed messages, threads can pull the state of other threads by invoking a tool that uses the THREAD\_STATES automatic parameter. This can
be useful for monitoring progress or coordinating actions across threads. For example, if a user asks how some subtask is progressing, the UI thread may
check the relevant thread's state to let the user know how it's doing.

When the THREAD\_STATES automatic parameter is requested, the tool call will include an object with an entry for each thread\_id (excluding "UI") and the
thread's current state. If the thread is idle, the tool will also receive the thread's last response text, if any. For example:

```json theme={null}
{
  "research-task-1": {
    "state": "GENERATING",
  },
  "research-task-1-subtask": {
    "state": "IDLE",
    "lastResponse": "Our competitors each offer 30-day no-fault returns."
  }
}
```

## Observability

### During a call

During a call, your client or data connection can listen for data messages to know what threads are doing.
See [Thread Messages](/apps/datamessages#thread-messages) for the full message reference.

#### Thread Messages

| Message            | Description                                              |
| ------------------ | -------------------------------------------------------- |
| `threadSpawned`    | A thread was successfully spawned.                       |
| `threadRejected`   | A thread spawn was rejected (e.g., due to duplicate ID). |
| `threadTerminated` | A thread was terminated due to failure or cancellation.  |

#### Generation Messages

| Message                   | Description                                                                   |
| ------------------------- | ----------------------------------------------------------------------------- |
| `sideGenerationDelta`     | Streaming text deltas as the thread generates (similar to transcript deltas). |
| `sideGenerationCompleted` | The full response text and any tool calls once generation finishes.           |

### After a call

Thread activity is recorded as [call events](/api-reference/calls/calls-events-list) that you can query after the call ends. These are useful for debugging and monitoring thread behavior.

### Thread Lifecycle Events

| Event                  | Description                                                  |
| ---------------------- | ------------------------------------------------------------ |
| `thread.canceled`      | The thread was canceled (e.g., call ended or stage changed). |
| `thread.limit_reached` | The thread exceeded one of its configured resource limits.   |
| `thread.failed`        | The thread failed for another reason.                        |

These events include `total_input_tokens`, `total_output_tokens`, and `total_generations` in their extras.

### Side Generation Events

| Event                       | Description                                 |
| --------------------------- | ------------------------------------------- |
| `side_generation.completed` | A thread's generation succeeded.            |
| `side_generation.failed`    | A thread's generation encountered an error. |
| `side_generation.canceled`  | A thread's generation was canceled.         |

These events include token usage in their extras.

### Thread Communication Events

| Event                                  | Description                                                                                                |
| -------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| `thread_comms.non_existent_thread`     | A message was sent to a thread that doesn't exist.                                                         |
| `thread_comms.failed_thread`           | A message was sent to a thread that has already failed.                                                    |
| `thread_comms.unsupported_call_change` | A thread attempted an unsupported call change (e.g., hang-up or stage change directly from a side thread). |

## Thread Billing

Threads may increase the cost of a call based on their additional model usage. Invoking tools is free. Generations
are charged using traditional token-based pricing, except that cached input tokens are free. Since threads are always
forked from a parent, input tokens for existing messages will be cached, so only the additional messages added when
spawning plus any messages created by the thread itself contribute to input token usage. See our
[pricing page](https://ultravox.ai/pricing) for the cost of uncached input tokens and output tokens.

## Examples

Here are some concrete examples of specific use cases. When tools are involved, the examples use client tools
to show the response succinctly, but the same principles work equivalently with data connection and http tools as well.

### Moving a tool to a background thread

Suppose you currently have a tool that is very slow but that the agent doesn't necessarily need a response from to continue
the conversation. You can build a better experience for your users by moving the tool's work to a background thread so the
conversation can continue while the tool runs.

1. Add the THREAD\_ID automatic parameter to your tool definition so that your tool implementation knows which thread is invoking it.

2. When the tool is invoked, if the thread id is "UI", immediately respond with a `send-to-thread` response type with a `SpawnThread` message that includes the same tool call. This will cause the tool call to be re-executed on a background thread.

   ```js theme={null}
   const newThreadId = crypto.randomUUID();
   {
     "type": "client_tool_result",
     "invocationId": "matching-invocation-id",
     "result": JSON.stringify({
       "callingThreadResultText": `Tool started in background thread ${newThreadId}. The result will be added once it's ready.`,
       "dataMessage": {
         "type": "spawn_thread",
         "newThreadId": newThreadId,
         "additionalMessages": [
           {
             "type": "forced_agent_message",
             "toolCalls": [
               {
                 "name": "myTool",
                 "arguments": { /* same parameters as original tool call */ }
               }
             ]
           }
         ],
         "limits": {
           "generationLimit": 0 // Optional. Disallows any generations since this thread only exists to execute this tool call.
         }
       }
     }),
     "responseType": "send-to-thread",
     "agentReaction": "speaks"
   }
   ```

3. When your tool is called again, the thread id automatic parameter argument will now be your `newThreadId` instead of "UI", so you can execute the tool logic as normal without worrying about blocking the conversation. When the tool finishes, the thread can send its results back to the UI thread with another `send-to-thread` response type.

   ```js theme={null}
   {
     "type": "client_tool_result",
     "invocationId": "new-matching-invocation-id",
     "result": JSON.stringify({
       "callingThreadResultText": "Sent result to UI thread.",
       "dataMessage": {
         "type": "user_text_message",
         "text": `<prior_tool_result toolName="myTool" threadId=${newTheadId}>The tool's real result.</prior_tool_result>`,
       }
     }),
     "responseType": "send-to-thread",
     "agentReaction": "listens" // Causes the thread to switch to IDLE instead of GENERATING since it doesn't need to respond to this tool result.
   }
   ```

Note that there's nothing special about the format for this user data message response. Any format you want may work, but ideally you should mention the format
in your system prompt so that the model knows what to expect and how to handle it. If your UI thread has a `checkThreadState` tool available, you may choose to
structure the response as a `ForcedAgentMessage` with a tool call to `checkThreadState` and a known result that includes the tool result. That way the model thinks
it just asked for the thread's state and doesn't have to reason about the message's source. Alternatively, you could use a `ForcedAgentMessage` with the real tool
call and result, though that may lead the model to believe it can call this tool again and get an immediate response. It's important to consider your agent's
specific circumstances when evaluating these trade-offs.

### Causing hang-up from a side thread

Suppose you have a background thread that is able to use slow tools or more reasoning to decide whether a call is worth continuing.
At some point, this thread may determine that the call should be ended.

Background threads aren't allowed to end a call directly, but they can force the UI thread to do so. Instead of responding to a tool call with `hang-up`,
respond with `send-to-thread` and a `ForcedAgentMessage` with the relevant tool result:

```js theme={null}
{
  "type": "client_tool_result",
  "invocationId": "matching-invocation-id",
  "result": JSON.stringify({
    "callingThreadResultText": "Sent result to UI thread.",
    "dataMessage": {
      "type": "forced_agent_message",
      "urgency": "immediate", // Interrupt whatever is currently happening on the UI thread.
      "toolCalls": [
        {
          "id": "myId",
          "name": "hangUp",
          "arguments": {}
        }
      ],
      "knownToolResults": [
        {
          "invocationId": "myId",
          "result": {
            "strict": true,
            "message": "Goodbye."
          },
          "responseType": "hang-up"
        }
      ]
    }
  }),
  "responseType": "send-to-thread",
  "agentReaction": "listens"
}
```

The same pattern may be used to force the agent to speak a particular message or perform other actions that need to be synchronized
on the UI thread, such as transfering the call or starting a stage change.

### Having a thread re-sync conversation state following a tool call

Suppose you want another observer to periodically check how a call is going and then call a tool to report progress or effect changes.
Instead of spawning a new single-shot thread for each check, you can have the thread re-sync its conversation history after each tool invocation by
responding with a `send-to-thread` response with a `SpawnThread` message that uses "UI" as the `parentThreadId`, sets its own threadId as the
`newThreadId`, and sets `ifExists` to "replace". This will cause the thread to be terminated and re-spawned with a fresh copy of the UI thread's
conversation history.

```js theme={null}
{
  "type": "client_tool_result",
  "invocationId": "matching-invocation-id",
  "result": JSON.stringify({
    "callingThreadResultText": "OK",
    "dataMessage": {
      "type": "spawn_thread",
      "parentThreadId": "UI",
      "newThreadId": "monitor", // Same as existing thread id
      "ifExists": "replace", // Terminate existing thread and spawn a new one with the same ID
      "additionalMessages": [
        // If there are messages you want to retain from the thread's current history, include them here.
        // You can use the CONVERSATION_HISTORY automatic parameter as usual to get the thread's history.
        {
          "type": "user_text_message",
          // Restart the analysis to continue the loop
          "text": "Analyze the conversation and <do something>. Take your time and think through your response then invoke myTool to report in."
        }
      ]
    }
  }),
  "responseType": "send-to-thread",
  "agentReaction": "listens"
}
```

## Next Steps

<CardGroup cols={2}>
  <Card title="Data Messages Reference" icon="network-wired" href="/apps/datamessages#thread-messages">
    Full reference for all thread-related data messages and their fields.
  </Card>

  <Card title="Async Tools" icon="clock" href="/tools/async-tools">
    Learn about other ways to speed up your tools and keep your conversations responsive.
  </Card>
</CardGroup>