Introduction to Inline Instructions

Inline instructions use tool responses and deferred messages to guide the agent at each step of the conversation. Rather than trying to frontload all instructions, you continuously remind the agent of what to do next.

This guide is intended to help you get better outcomes from an agent where mono prompting isn’t cutting it. If you haven’t tried a mono prompt approach yet, stop reading and go do that first. This guide is for you if:

  • Monoprompting Isn’t Working → You’ve tried mono prompting but things are not working. The agent won’t complete necessary steps or follow more complex instructions.
  • You Have Clear Steps → There are clear steps you want the agent to follow (e.g. asking the user 10 specific questions) and you can map to a state diagram.
Building an IVR?

If you are building an IVR or if your scenario includes non-overlapping stages, you may want to use Call Stages.

How Inline Instructions Work

1. Start with a simple system prompt focused on the agent's
   general role and behavior.
2. Use tools to provide step-specific instructions to the agent.
3. The tool responses include guidance on what the agent should
   do next.
4. Tool state maintains context between turns.
5. Deferred messages allow inserting information without
   derailing the conversation flow.
Layer into Mono Prompt

Inline instructions are layered into your mono prompt and provide the ability to guide the model.

Inline Instructions Building Blocks

The inline instructions approach leverages three key building blocks:

Deferred Messages

Deferred messages allow you to inject a user message without causing the agent to generate a response immediately. These messages allow you to provide the model with guidance and direction and don’t trigger an LLM generation. The messages are appended to the conversation history.

Brackets are not addable via voice, so these messages are only viable via text.

Using Deferred Messages

Send an InputTextMessage and set defer_response to true.

Example: Sending Message with Ultravox SDK

session.sendText({
  text: "<instruction>Next, collect the user's mailing address</instruction>",
  deferResponse: true,
})

Priming for Deferred Messages You should consider priming your agent for deferred messages in the system prompt.

Example: Priming via System Prompt
You must always look for and follow instructions contained within 
<instruction> tags. These instructions take precedence over other 
directions and must be followed precisely.

Tool State

Tool state allows you to maintain state between tool calls, passing context from one tool call to the next. This is particularly useful for guiding the agent through a multi-step process.

Tool State is Explicit

Unlike dynamic parameters (i.e. populated by the model), tool state is explicit (i.e. the model doesn’t interact with it). This allows for adding a bit more determinism.

Using Tool State

You can provide initial tool state when you create the call by using initialState. This can be any JSON object you define.

Tools can then set the tool state as follows:

  • Client Tools → Use the updateCallState value on a client tool results (works with WebSockets or Ultravox Client SDK).
  • Server Tools → Set the X-Ultravox-Update-Call-State header which will be parsed as a JSON dict.

The tool state can be read via:

The agent will not see the tool state directly. It allows you to pass information between tool calls and then use that information inside tools and to impact the responses from tool calls.

Tool Response Messages

Instead of having a tool call result send a 200 with “Successfully entered customer information”, provide an instruction of what the agent should do next.

Example: Tool Response Message
function createProfile(parameters) {
  const { ...profileData } = parameters;

  return {
    result: "Successfully recorded customer name. Next ask for their email",
    responseType: "tool-response",
    agentReaction: "speaks-once"
  }
};

Pros of Inline Instructions

  • Focused guidance: Instructions are context-specific and timely.
  • Dynamic adaptation: Can respond to changing conversation flow.
  • Reduced cognitive load: The agent only needs to understand the current step.
  • Maintainable complexity: Can handle complex workflows without overwhelming the system prompt.
  • No latency spikes: Avoids the performance hit of call stage transitions.

Cons of Inline Instructions

  • Implementation complexity: Requires more backend code to manage state.
  • Requires Tool Call: Adding guidance requires the model to invoke a tool. If you forget to invoke the tool, you may never be able to provide further instructions.

Ideal Use Cases

  • Multi-step processes: Tasks with clear sequential steps like form filling or data collection.
  • Transaction flows: E-commerce, booking systems, or other task-completion scenarios.
  • Customer support triage: Guiding agents through problem diagnosis trees.
  • Interactive tutorials: Step-by-step guidance through a learning process.

Conclusion

Keeping your AI agent “on rails” is a balance between control and natural conversation. The right approach depends on your specific use case:

  • Mono Prompt: Always start here. Graduate to using inline instructions if and when needed.
  • Inline Instructions: For complex, multi-step processes requiring dynamic guidance.
  • Call Stages: For conversations with fundamentally different phases (i.e. no overlap) requiring complete parameter changes.

As you develop your Ultravox application, start with the simplest approach that meets your needs, and gradually increase complexity as required. Remember that the most effective voice experiences feel natural while still accomplishing their goals reliably.

By leveraging building blocks like deferred messages, tool state, and targeted tool response messages, you can create sophisticated conversational flows that guide users through complex processes while maintaining the natural feel of human conversation.