Telephony
Use Ultravox to make and receive calls using WebRTC, via Telnyx, Twilio, Plivo, or jambonz.
The Ultravox API allows you to create AI-powered voice applications that can interact through various protocols:
- WebRTC → Default protocol for browser and mobile applications.
- Regular Phone Numbers → Connect Ultravox to phone calls you make or receive (via Telnxy, Twilio, Plivo, or jambonz).
- WebSockets → Direct server-to-server integration.
Choosing a Protocol
Choose your integration method based on your needs:
- WebRTC: Best for most integrations, especially for any client deployment (for example, browsers or mobile clients). This is the default. Get started with the Ultravox client SDK.
- Phone: For traditional phone network integration. Ultravox integrates directly with Telnyx, Twilio, Plivo, and jambonz (bring your own carrier).
- WebSocket: For server-to-server integration, especially when you already have high-bandwidth connections between your server and clients. Check out the WebSocket Integration guide for more information.
Phone Integration
Ultravox integrates with multiple telephony providers, enabling you to create AI-powered voice applications that can interact through regular phone networks. You can build AI agents that can make outgoing calls and answer incoming calls, opening up possibilities for customer service, automated outreach, and other voice-based AI applications.
Supported Providers
- Twilio → Uses Twilio Media Streams.
- Telnyx → Uses Telnyx Media Streaming.
- Plivo → Uses Plivo AudioStream.
- jambonz → Uses jambonz llm verb and connects to the carrier of your choice.
- Voximplant → See the integration guide.
Prerequisites
Make sure you have:
- An active account with your chosen provider (Telnyx, Twilio, Plivo, or any carrier when connecting via jambonz)
- A phone number purchased from your provider
- A jambonz account (if connecting via jambonz)
Connecting Ultravox to a Phone Call
It can be a bit confusing because Ultravox Realtime uses the concept of a “call” to mean a voice session between an AI agent and another party. For phone calls, you will accept incoming or make outgoing calls via your chosen telephony provider and then connect those calls to an Ultravox call.
Creating an Ultravox call that you can connect with a phone call is similar to creating a WebRTC call, but requires specific parameters in the Create Call command:
Tells Ultravox which protocol to use. For phone calls, must be set to one of:
{"telnyx": {}}
, {"twilio": {}}
, or {"plivo": {}}
.
Defaults to {"webRtc": {}}
.
Tells Ultravox who should speak first.
For outgoing calls, typically set to "FIRST_SPEAKER_USER"
.
The default is "FIRST_SPEAKER_AGENT"
.
jambonz have integrated Ultravox into their llm
verb, so you won’t need to create an Ultravox call—this is all done for you. Just follow the instructions in the “Provider-Specific Integration” section below.
Provider-Specific Integration
Outgoing Calls with Telnyx
Create an Ultravox Call
Create a new call as shown above with medium: { "telnyx": {} }
, firstSpeaker: "FIRST_SPEAKER_USER"
, and get a joinUrl
.
Connect Ultravox to the Telnyx Phone Call
Use the joinUrl
with a TeXML <Stream>
:
Or using TeXML:
Incoming Calls with Telnyx
Create an Ultravox Call
Create a new call with medium: { "telnyx": {} }
and firstSpeaker
set to "FIRST_SPEAKER_AGENT"
.
Connect the Inbound Telnyx Call to Ultravox
Use the joinUrl
with a TeXML <Stream>
:
codec
Telnyx allows setting both codec
and bidirectionalCodec
. The former controls user audio while the latter controls agent audio. When using with Ultravox, these must have the same value because Telnyx only tells us about one of them! If your users are mostly in Europe, you’ll likely want to set both to PCMA. Otherwise setting both to PCMU is preferred but leaving them both unset is fine to get started.
For more details, see the Telnyx documentation.
Outgoing Calls with Telnyx
Create an Ultravox Call
Create a new call as shown above with medium: { "telnyx": {} }
, firstSpeaker: "FIRST_SPEAKER_USER"
, and get a joinUrl
.
Connect Ultravox to the Telnyx Phone Call
Use the joinUrl
with a TeXML <Stream>
:
Or using TeXML:
Incoming Calls with Telnyx
Create an Ultravox Call
Create a new call with medium: { "telnyx": {} }
and firstSpeaker
set to "FIRST_SPEAKER_AGENT"
.
Connect the Inbound Telnyx Call to Ultravox
Use the joinUrl
with a TeXML <Stream>
:
codec
Telnyx allows setting both codec
and bidirectionalCodec
. The former controls user audio while the latter controls agent audio. When using with Ultravox, these must have the same value because Telnyx only tells us about one of them! If your users are mostly in Europe, you’ll likely want to set both to PCMA. Otherwise setting both to PCMU is preferred but leaving them both unset is fine to get started.
For more details, see the Telnyx documentation.
Outgoing Calls with Twilio
Create an Ultravox Call
Create a new call as shown above with medium: { "twilio": {} }
, firstSpeaker: "FIRST_SPEAKER_USER"
, and get a joinUrl
.
Connect Ultravox to the Twilio Phone Call
Use the joinUrl
with a Twilio <Stream>
:
Incoming Calls with Twilio
Create an Ultravox Call
Create a new call with medium: { "twilio": {} }
and firstSpeaker
set to "FIRST_SPEAKER_AGENT"
.
Connect the Inbound Twilio Call to Ultravox
Use the joinUrl
with a Twilio <Stream>
:
For more details, see the Twilio documentation.
Outgoing Calls with Plivo
Create an Ultravox Call
Create a new call as shown above with medium: { "plivo": {} }
, firstSpeaker: "FIRST_SPEAKER_USER"
, and get a joinUrl
.
Connect Ultravox to the Plivo Phone Call
Use the joinUrl
with AudioStream:
The answer URL should return:
Note: For best audio quality, we recommend audio/x-l16;rate=16000
. However, any contentType supported by Plivo will work with Ultravox.
Incoming Calls with Plivo
Create an Ultravox Call
Create a new call with medium: { "plivo": {} }
and firstSpeaker
set to "FIRST_SPEAKER_AGENT"
.
Connect the Inbound Twilio Call to Ultravox
Use the joinUrl
with AudioStream:
For more details, see the Plivo documentation.
jambonz Portal Setup
jambonz is a “bring your own everything” open-source telephony platform that integrates Ultravox directly via their llm verb. This gives you the flexibility to choose your carrier of choice, you’ll just need to add it in your jambonz dashboard.
Add Your Carrier in jambonz
In jambonz, we use the terms “carrier” and “SIP trunk” interchangeably. jambonz is a “Bring your own carrier” platform, which means that you can connect any sip network provider or device. Add your carrier of choice in your jambonz dashboard to get started.
Add a Speech Provider in jambonz
Next, you need to add speech credentials for your chosen vendor.
Create a New jambonz Application
A jambonz application configured via the jambonz portal defines how calls are handled by linking them to your custom logic through webhooks or WebSocket endpoints. When you create an application, you specify:
- Call webhook URL: Where jambonz sends call events.
- Call status webhook URL: For receiving call status updates.
- Speech vendors: Your chosen TTS/STT providers.
Once saved, you can associate phone numbers or SIP trunks with this application, ensuring that incoming calls are routed to your specified logic. This setup allows you to implement features like speech recognition, text-to-speech, call routing, and integration with AI services.
Add a Phone Number in jambonz
Finally, you need to add a phone number provisioned from your carrier of choice. At the bottom of the page select the jambonz application you just created to link your new virtual number to that application.
Incoming Calls with jambonz
For more details see the llm
verb in the jambonz docs.
Outgoing Calls with jambonz
In addition to the inbound scenario, you’ll have to create a call that connects to the destination number (phoneNumber
) and points to the jambonz application that defines how the call should be handled. Find the APPLICATION_SID
in the jambonz portal by clicking on the application you created during the setup process.
For more details, see the jambonz documentation and example code.
We currently integrate with Telnyx, Twilio, Plivo, and jambonz (bring your own carrier voice platform). Please let us know if there’s another integration you’d like to see.
DTMF
Ultravox provides comprehensive support for DTMF (Dual-Tone Multi-Frequency) tones, enabling both sending and receiving tones during phone calls. This enables AI agents to interact with traditional phone systems and allows you to build voice applications that can respond to keypad inputs.
Due to the audio codec used in WebRTC connections, DTMF tones are inaudible when using WebRTC. The playDtmfSounds
tool is intended for use with telephony integrations.
Receiving DTMF Tones
Ultravox automatically converts incoming DTMF tones to text, making it easy to build interactive voice applications that respond to keypad input. When a caller presses keys on their phone keypad, the tones are converted to text that your AI agent can understand and respond to.
For example, if a caller presses “5” on their keypad, your agent will receive this as text and can respond accordingly:
Sending DTMF Tones
The built-in playDtmfSounds
tool allows your AI agent to send DTMF tones, which is useful for navigating Interactive Voice Response (IVR) systems or other phone trees. To enable the tool, add it to the selectedTools
array when creating a call or call stage:
The playDtmfSounds
tool accepts a string parameter named digits
and works with the following tones: 0-9, *, #, A-D.
For example:
Note: the playDtmfSounds
tool uses an automatic parameter that sends the proper sample rate of the source audio and should be treated as an implementation detail.
Common Use Cases
- Building interactive phone trees or IVR systems
- Creating agents that can navigate existing phone systems
- Enabling quick responses through keypad input
- Collecting numeric input (e.g., account numbers, PIN codes)
- Building hybrid voice/keypad interfaces