Prompting Quickstart
A guide to prompting for great voice AI experiences
Introduction
As we say in our guide, it’s all about prompting. Prompting is how we get Voice AI agents to do what we want, but all models work a little bit differently. Under the hood, the Realtime platform runs a version of the Ultravox model built on Llama 3.3 70B, so we recommend looking at the Llama Prompting Guides as a good starting point.
Below, we try to lay out the core patterns that we see working well at scale, but you’ll probably need to customize these approaches based on your particular use case.
Remember that prompting is the most effective tool we have for controlling LLMs. In the majority of cases, the answer to “How do I get the model to do X?” is “You need to prompt it to do X.”
Default Prompting Note: Unlike many other Voice AI offerings, Ultravox does not append a default prompt to your input. This means that you should always provide a complete prompt, including any context or information that you want the model to consider. We do this to ensure you have full control over what the model does. We don’t want anything hidden from view.
Prompt As-If It’s a Text-Based LLM
It’s important to understand that during training, the underlying LLM (Llama 3.3 70B in our default case) is frozen. This means that you should prompt the model as though it’s a text model. For most scenarios, we recommend telling the model at the top of your prompt that you’re talking to it as a voice model.
Here’s an example that works well:
General Guidance
-
Start simple: It’s best to always start simple and then add complexity as needed. Begin by outlining in a few paragraphs what you want the model to do. Then have a chat with it, see where it needs work, and then iterate from there.
-
Be clear: Llama is a very literal instruction follower. So if you want the model to do something, you need to be very clear about it. If you’re trying to write a set of step-by-step flows, be sure to break them down into very clear, concise steps. Note that you don’t HAVE to provide clear step-by-step instructions. General guidance works very well if you’re looking for a more conversational output (but the model will exert more control in driving the conversation).
-
Use examples: The model learns very well from examples. So after describing the high-level flow you want the model to follow, it can be helpful to provide a few examples of what you’re looking for.
-
Iterate: Prompting is an iterative process. You’ll need to prompt the model, see how it does, and then adjust your prompts based on the results. It can take time to get things right, so be patient.
Common Prompting Patterns
This section includes patterns and example prompts for dealing with common challenges.
Tools
Tools are how your model interacts with the outside world, but you have to help the model understand when and how those tools should be used. Here’s a good pattern for prompting the model to use a tool:
Use Good Tool Definitions
It’s critical to remember that the entire tool definition is seen by the LLM, and the LLM will use those definitions to guide its behavior. Make sure that your tool definitions are clear and concise.
Prompt for More Context
Give the LLM additional context or guidance on whenever the tool should be used. For example, if you want the model to look up information from an address book, you might add something like this to your prompt:
Numbers
Text to speech engines can sometimes have trouble with numbers, but we can help them by asking the LLM to output numbers in a more voice-friendly format. A pattern that we see that works well is to ask the LLM to separate numbers into individual digits, separated by an ellipisis.
Dates & Times
Similar to numbers, dates and times can be tricky for speech generation, so it can be helpful to provide clearer guidance on how to produce the correct date/time format for effective speech generation.
Jailbreaks
Jailbreaking is where the user engaging with your agent tries to get the agent to do or say things outside the scope of what you’ve designed it to do. There is still no perfect system for preventing jailbreaking, but some simple prompting can make it much harder to jailbreak. Here’s a simple pattern that works well:
Creating More Natural Pauses
If you’d like to create more natural pauses, a simple but effective technique is to ask the model to add an ellipsis between sentences or after punctuation.
Step-by-Step Instructions
Often times in customer support scenarios, you want the LLM to give instructions one at a time. You can achieve this by providing an example or two.