Introduction

As we say in our guide, it’s all about prompting. Prompting is how we get Voice AI agents to do what we want, but all models work a little bit differently. Under the hood, the Realtime platform runs a version of the Ultravox model built on Llama 3.3 70B, so we recommend looking at the Llama Prompting Guides as a good starting point.

Below, we try to lay out the core patterns that we see working well at scale, but you’ll probably need to customize these approaches based on your particular use case.

Remember that prompting is the most effective tool we have for controlling LLMs. In the majority of cases, the answer to “How do I get the model to do X?” is “You need to prompt it to do X.”

Default Prompting Note: Unlike many other Voice AI offerings, Ultravox does not append a default prompt to your input. This means that you should always provide a complete prompt, including any context or information that you want the model to consider. We do this to ensure you have full control over what the model does. We don’t want anything hidden from view.

Prompt As-If It’s a Text-Based LLM

It’s important to understand that during training, the underlying LLM (Llama 3.3 70B in our default case) is frozen. This means that you should prompt the model as though it’s a text model. For most scenarios, we recommend telling the model at the top of your prompt that you’re talking to it as a voice model.

Here’s an example that works well:

Example: General Voice Prompt
You are [Name], a friendly AI [customer service agent / helper / etc].
You're interacting with the user over voice, so speak casually.
Keep your responses short and to the point, much like someone would in dialogue. 
Since this is a voice conversation, do not use lists, bullets, emojis, or other 
things that do not translate to voice. In addition, do not use stage directions 
or otherwise engage in action-based roleplay (e.g., "(pauses), "*laughs").

General Guidance

  • Start simple: It’s best to always start simple and then add complexity as needed. Begin by outlining in a few paragraphs what you want the model to do. Then have a chat with it, see where it needs work, and then iterate from there.

  • Be clear: Llama is a very literal instruction follower. So if you want the model to do something, you need to be very clear about it. If you’re trying to write a set of step-by-step flows, be sure to break them down into very clear, concise steps. Note that you don’t HAVE to provide clear step-by-step instructions. General guidance works very well if you’re looking for a more conversational output (but the model will exert more control in driving the conversation).

  • Use examples: The model learns very well from examples. So after describing the high-level flow you want the model to follow, it can be helpful to provide a few examples of what you’re looking for.

  • Iterate: Prompting is an iterative process. You’ll need to prompt the model, see how it does, and then adjust your prompts based on the results. It can take time to get things right, so be patient.

Common Prompting Patterns

This section includes patterns and example prompts for dealing with common challenges.

Tools

Tools are how your model interacts with the outside world, but you have to help the model understand when and how those tools should be used. Here’s a good pattern for prompting the model to use a tool:

1

Use Good Tool Definitions

It’s critical to remember that the entire tool definition is seen by the LLM, and the LLM will use those definitions to guide its behavior. Make sure that your tool definitions are clear and concise.

2

Prompt for More Context

Give the LLM additional context or guidance on whenever the tool should be used. For example, if you want the model to look up information from an address book, you might add something like this to your prompt:

Example: Providing More Tool Context
You have access to an address book that contains personnel information. 
If someone asks for information for a particular person, you MUST use 
the lookUpAddressBook tool to find that information before replying.

Numbers

Text to speech engines can sometimes have trouble with numbers, but we can help them by asking the LLM to output numbers in a more voice-friendly format. A pattern that we see that works well is to ask the LLM to separate numbers into individual digits, separated by an ellipisis.

Example: Speaking Numbers
When you're reading out numbers, please separate them into individual digits, 
separated by an ellipsis. For example, the number 1234 should be read 
as "1...2...3...4".

Dates & Times

Similar to numbers, dates and times can be tricky for speech generation, so it can be helpful to provide clearer guidance on how to produce the correct date/time format for effective speech generation.

Example: Reading Out Dates
When you're outputting dates, output them as individual components. For example, 
the date 12/25/2022 should be read as "December 25th 2022". For times, 
"10:00 AM" should be outputted as "10 AM".

Jailbreaks

Jailbreaking is where the user engaging with your agent tries to get the agent to do or say things outside the scope of what you’ve designed it to do. There is still no perfect system for preventing jailbreaking, but some simple prompting can make it much harder to jailbreak. Here’s a simple pattern that works well:

Example: Minimizing Jailbreaking
Your only job is to [primary job of your agent]. If someone asks you a question 
that is not related to [the thing you're asking the model to do], politely 
decline and redirect the conversation back to the task at hand.

Creating More Natural Pauses

If you’d like to create more natural pauses, a simple but effective technique is to ask the model to add an ellipsis between sentences or after punctuation.

Example: Speaking with Pauses
You want to speak slowly and clearly, so you must inject pauses between sentences. 
Do this by emitting "..." at the end of a sentence but before any final 
punctuation (e.g., “Wow, that's really interesting… can you tell me a bit more 
about that…?”. You should do this more when the topic is complex or requires 
special attention.

Step-by-Step Instructions

Often times in customer support scenarios, you want the LLM to give instructions one at a time. You can achieve this by providing an example or two.

Example: Step-by-Step Help
Example: User asks for help changing their password
- You will call the "searchArticle" tool
- Response from tool: {"content": "1. Click "Forgot Password" on the login screen 2. Enter your email address and click "Submit" 3. Check your email for the reset link 4. Click the link and enter your new password 5. Log in with your new password"}
- You will then use this information and proceed step-by-step with the user like this:
  * agent: "There are a few steps we need to go through."
  * agent: "The first step is Click on Forgot Password on the login screen. Let me know when you're there."
  * user: "OK done."
  * agent: "Great. next you need to enter your email address and click Submit."
  * user: "got it."
  * agent: "Now check your email for the reset link"
  * user: "uh huh."
- Repeat in this manner until you complete the entire process.