Guiding Agents
A guide to steering your agent toward good experiences
Introduction to Inline Instructions
Inline instructions use tool responses and deferred messages to guide the agent at each step of the conversation. Rather than trying to frontload all instructions, you continuously remind the agent of what to do next.
This guide is intended to help you get better outcomes from an agent where mono prompting isn’t cutting it. If you haven’t tried a mono prompt approach yet, stop reading and go do that first. This guide is for you if:
- Monoprompting Isn’t Working → You’ve tried mono prompting but things are not working. The agent won’t complete necessary steps or follow more complex instructions.
- You Have Clear Steps → There are clear steps you want the agent to follow (e.g. asking the user 10 specific questions) and you can map to a state diagram.
If you are building an IVR or if your scenario includes non-overlapping stages, you may want to use Call Stages.
How Inline Instructions Work
Inline instructions are layered into your mono prompt and provide the ability to guide the model.
Inline Instructions Building Blocks
The inline instructions approach leverages three key building blocks:
Deferred Messages
Inject instruction messages without triggering a response from the model.
Tool State
Pass additional context via tools to maintain state.
Tool Response Messages
Instruct the agent what to do next via tool call responses.
Deferred Messages
Deferred messages allow you to inject a user message without causing the agent to generate a response immediately. These messages allow you to provide the model with guidance and direction and don’t trigger an LLM generation. The messages are appended to the conversation history.
Brackets are not addable via voice, so these messages are only viable via text.
Using Deferred Messages
Send an InputTextMessage and set defer_response
to true
.
Priming for Deferred Messages You should consider priming your agent for deferred messages in the system prompt.
Tool State
Tool state allows you to maintain state between tool calls, passing context from one tool call to the next. This is particularly useful for guiding the agent through a multi-step process.
Unlike dynamic parameters (i.e. populated by the model), tool state is explicit (i.e. the model doesn’t interact with it). This allows for adding a bit more determinism.
Using Tool State
You can provide initial tool state when you create the call by using initialState
. This can be any JSON object you define.
Tools can then set the tool state as follows:
- Client Tools → Use the
updateCallState
value on a client tool results (works with WebSockets or Ultravox Client SDK). - Server Tools → Set the
X-Ultravox-Update-Call-State
header which will be parsed as a JSON dict.
The tool state can be read via:
- Automatic Parameter → Use the
KNOWN_PARAM_CALL_STATE
known value. - Tool Result Message → Use the
callState
property.
The agent will not see the tool state directly. It allows you to pass information between tool calls and then use that information inside tools and to impact the responses from tool calls.
Tool Response Messages
Instead of having a tool call result send a 200 with “Successfully entered customer information”, provide an instruction of what the agent should do next.
Pros of Inline Instructions
- Focused guidance: Instructions are context-specific and timely.
- Dynamic adaptation: Can respond to changing conversation flow.
- Reduced cognitive load: The agent only needs to understand the current step.
- Maintainable complexity: Can handle complex workflows without overwhelming the system prompt.
- No latency spikes: Avoids the performance hit of call stage transitions.
Cons of Inline Instructions
- Implementation complexity: Requires more backend code to manage state.
- Requires Tool Call: Adding guidance requires the model to invoke a tool. If you forget to invoke the tool, you may never be able to provide further instructions.
Ideal Use Cases
- Multi-step processes: Tasks with clear sequential steps like form filling or data collection.
- Transaction flows: E-commerce, booking systems, or other task-completion scenarios.
- Customer support triage: Guiding agents through problem diagnosis trees.
- Interactive tutorials: Step-by-step guidance through a learning process.
Conclusion
Keeping your AI agent “on rails” is a balance between control and natural conversation. The right approach depends on your specific use case:
- Mono Prompt: Always start here. Graduate to using inline instructions if and when needed.
- Inline Instructions: For complex, multi-step processes requiring dynamic guidance.
- Call Stages: For conversations with fundamentally different phases (i.e. no overlap) requiring complete parameter changes.
As you develop your Ultravox application, start with the simplest approach that meets your needs, and gradually increase complexity as required. Remember that the most effective voice experiences feel natural while still accomplishing their goals reliably.
By leveraging building blocks like deferred messages, tool state, and targeted tool response messages, you can create sophisticated conversational flows that guide users through complex processes while maintaining the natural feel of human conversation.