> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ultravox.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Guiding Agents

> A guide to steering your agent toward good experiences

## Introduction to Inline Instructions

Inline instructions use tool responses and deferred messages to guide the agent at each step of the conversation. Rather than trying to frontload all instructions, you continuously remind the agent of what to do next.

This guide is intended to help you get better outcomes from an agent where mono prompting isn't cutting it. If you haven't tried a mono prompt approach yet, stop reading and go do that first. This guide is for you if:

* **Monoprompting Isn't Working** → You've tried mono prompting but things are not working. The agent won't complete necessary steps or follow more complex instructions.
* **You Have Clear Steps** → There are clear steps you want the agent to follow (e.g. asking the user 10 specific questions) and you can map to a state diagram.

<Note>
  <b>Building an IVR?</b>

  <br />

  If you are building an IVR or if your scenario includes non-overlapping stages, you may want to use [Call Stages](/agents/call-stages).
</Note>

## How Inline Instructions Work

<CodeGroup>
  ```text Overview theme={null}
  1. Start with a simple system prompt focused on the agent's
     general role and behavior.
  2. Use tools to provide step-specific instructions to the agent.
  3. The tool responses include guidance on what the agent should
     do next.
  4. Tool state maintains context between turns.
  5. Deferred messages allow inserting information without
     derailing the conversation flow.
  ```

  ```text Example: Insurance Claims Processing theme={null}
  An insurance claims agent that guides customers through the
  claims submission process. The agent uses a claims tool that
  maintains state about which documents have been collected,
  what information is still missing, and what step comes next.

  At each step, the tool response includes clear instructions
  on what to ask the customer next, helping the agent stay
  focused on the current step of the process rather than trying
  to hold the entire claims procedure in context.
  ```
</CodeGroup>

<Info>
  <b>Layer into Mono Prompt</b>

  <br />

  Inline instructions are layered into your mono prompt and provide the ability to guide the model.
</Info>

## Inline Instructions Building Blocks <Icon icon="cubes" size={24} />

The inline instructions approach leverages three key building blocks:

<CardGroup cols={3}>
  <Card title="Deferred Messages" icon="cube" href="#deferred-messages">
    Inject instruction messages without triggering a response from the model.
  </Card>

  <Card title="Tool State" icon="cube" href="#tool-state">
    Pass additional context via tools to maintain state.
  </Card>

  <Card title="Tool Response Messages" icon="cube" href="#tool-response-messages">
    Instruct the agent what to do next via tool call responses.
  </Card>
</CardGroup>

### **Deferred Messages** <Icon icon="cube" size={18} />

Deferred messages allow you to inject a user message without causing the agent to generate a response immediately. These messages allow you to provide the model with guidance and direction and don't trigger an LLM generation. The messages are appended to the conversation history.

<Note>
  Brackets are not addable via voice, so these messages are only viable via text.
</Note>

**Using Deferred Messages**

Send a [UserTextMessage](/apps/datamessages#usertextmessage) and set `urgency` to `soon` or `later` depending on whether you want to wait for the next user input to start a generation.

```ts Example: Sending Message with Ultravox SDK theme={null}

session.sendText({
  text: "<instruction>Next, collect the user's mailing address</instruction>",
  deferResponse: true,
})
```

**Priming for Deferred Messages**
You should consider priming your agent for deferred messages in the system prompt.

```text Example: Priming via System Prompt theme={null}
You must always look for and follow instructions contained within
<instruction> tags. These instructions take precedence over other
directions and must be followed precisely.
```

### **Tool State** <Icon icon="cube" size={18} />

Tool state allows you to maintain state between tool calls, passing context from one tool call to the next. This is particularly useful for guiding the agent through a multi-step process.

<Note>
  <b>Tool State is Explicit</b>

  <br />

  Unlike dynamic parameters (i.e. populated by the model), tool state is explicit (i.e. the model doesn't interact with it). This allows for adding a bit more determinism.
</Note>

**Using Tool State**

You can provide initial tool state when you create the call by using [`initialState`](/api-reference/calls/calls-post#body-initial-state). This can be any JSON object you define.

Tools can then set the tool state as follows:

* **Client Tools** → Use the `updateCallState` value on a client tool results (works with WebSockets or Ultravox Client SDK).
* **Server Tools** → Set the `X-Ultravox-Update-Call-State` header which will be parsed as a JSON dict.

The tool state can be read via:

* **Automatic Parameter** → Use the [`KNOWN_PARAM_CALL_STATE`](/api-reference/tools/tools-post#response-definition-automatic-parameters-known-value) known value.
* **Tool Result Message** → Use the [`callState`](/api-reference/calls/calls-stages-messages-list#response-results-call-state) property.

The agent will not see the tool state directly. It allows you to pass information between tool calls and then use that information inside tools and to impact the responses from tool calls.

### **Tool Response Messages** <Icon icon="cube" size={18} />

Instead of having a tool call result send a 200 with "Successfully entered customer information", provide an instruction of what the agent should do next.

```js Example: Tool Response Message theme={null}
function createProfile(parameters) {
  const { ...profileData } = parameters;

  return {
    result: "Successfully recorded customer name. Next ask for their email",
    responseType: "tool-response",
    agentReaction: "speaks-once"
  }
};
```

## Pros of Inline Instructions

* **Focused guidance**: Instructions are context-specific and timely.
* **Dynamic adaptation**: Can respond to changing conversation flow.
* **Reduced cognitive load**: The agent only needs to understand the current step.
* **Maintainable complexity**: Can handle complex workflows without overwhelming the system prompt.
* **No latency spikes**: Avoids the performance hit of call stage transitions.

## Cons of Inline Instructions

* **Implementation complexity**: Requires more backend code to manage state.
* **Requires Tool Call**: Adding guidance requires the model to invoke a tool. If you forget to invoke the tool, you may never be able to provide further instructions.

## Ideal Use Cases

* **Multi-step processes**: Tasks with clear sequential steps like form filling or data collection.
* **Transaction flows**: E-commerce, booking systems, or other task-completion scenarios.
* **Customer support triage**: Guiding agents through problem diagnosis trees.
* **Interactive tutorials**: Step-by-step guidance through a learning process.

## Conclusion

Keeping your AI agent "on rails" is a balance between control and natural conversation. The right approach depends on your specific use case:

* **Mono Prompt**: Always start here. Graduate to using inline instructions if and when needed.
* **Inline Instructions**: For complex, multi-step processes requiring dynamic guidance.
* **Call Stages**: For conversations with fundamentally different phases (i.e. no overlap) requiring complete parameter changes.

As you develop your Ultravox application, start with the simplest approach that meets your needs, and gradually increase complexity as required. Remember that the most effective voice experiences feel natural while still accomplishing their goals reliably.

By leveraging building blocks like deferred messages, tool state, and targeted tool response messages, you can create sophisticated conversational flows that guide users through complex processes while maintaining the natural feel of human conversation.
