Telephony
Use Ultravox to make and receive calls using WebRTC, via Telnyx, Twilio, or Plivo.
The Ultravox API allows you to create AI-powered voice applications that can interact through various protocols:
- WebRTC → Default protocol for browser and mobile applications.
- Regular Phone Numbers → Connect Ultravox to phone calls you make or receive (via Telnxy, Twilio, or Plivo).
- WebSockets → Direct server-to-server integration.
Choosing a Protocol
Choose your integration method based on your needs:
- WebRTC: Best for most integrations, especially for any client deployment (for example, browsers or mobile clients). This is the default. Get started with the Ultravox client SDK.
- Phone: For traditional phone network integration. Ultravox integrates directly with Telnyx, Twilio, and Plivo.
- WebSocket: For server-to-server integration, especially when you already have high-bandwidth connections between your server and clients. Check out the WebSocket Integration guide for more information.
Phone Integration
Ultravox integrates with multiple telephony providers, enabling you to create AI-powered voice applications that can interact through regular phone networks. You can build AI agents that can make outgoing calls and answer incoming calls, opening up possibilities for customer service, automated outreach, and other voice-based AI applications.
Supported Providers
- Twilio → Uses Twilio Media Streams.
- Telnyx → Uses Telnyx Media Streaming.
- Plivo → Uses Plivo AudioStream.
Prerequisites
Make sure you have:
- An active account with your chosen provider (Telnyx, Twilio, or Plivo)
- A phone number purchased from your provider
Connecting Ultravox to a Phone Call
It can be a bit confusing because Ultravox Realtime uses the concept of a “call” to mean a voice session between an AI agent and another party. For phone calls, you will accept incoming or make outgoing calls via your chosen telephony provider and then connect those calls to an Ultravox call.
Creating an Ultravox call that you can connect with a phone call is similar to creating a WebRTC call, but requires specific parameters in the Create Call command:
Tells Ultravox which protocol to use. For phone calls, must be set to one of:
{"telnyx": {}}
, {"twilio": {}}
, or {"plivo": {}}
.
Defaults to {"webRtc": {}}
.
Tells Ultravox who should speak first.
For outgoing calls, typically set to "FIRST_SPEAKER_USER"
.
The default is "FIRST_SPEAKER_AGENT"
.
Provider-Specific Integration
Outgoing Calls with Telnyx
Create an Ultravox Call
Create a new call as shown above with medium: { "telnyx": {} }
, firstSpeaker: "FIRST_SPEAKER_USER"
, and get a joinUrl
.
Connect Ultravox to the Telnyx Phone Call
Use the joinUrl
with a TeXML <Stream>
:
Or using TeXML:
Incoming Calls with Telnyx
Create an Ultravox Call
Create a new call with medium: { "telnyx": {} }
and firstSpeaker
set to "FIRST_SPEAKER_AGENT"
.
Connect the Inbound Telnyx Call to Ultravox
Use the joinUrl
with a TeXML <Stream>
:
For more details, see the Telnyx documentation.
We currently integrate with Telnyx, Twilio, and Plivo. Please let us know if there’s another integration you’d like to see.
DTMF
Ultravox provides comprehensive support for DTMF (Dual-Tone Multi-Frequency) tones, enabling both sending and receiving tones during phone calls. This enables AI agents to interact with traditional phone systems and allows you to build voice applications that can respond to keypad inputs.
Due to the audio codec used in WebRTC connections, DTMF tones are inaudible when using WebRTC. The playDtmfSounds
tool is intended for use with telephony integrations.
Receiving DTMF Tones
Ultravox automatically converts incoming DTMF tones to text, making it easy to build interactive voice applications that respond to keypad input. When a caller presses keys on their phone keypad, the tones are converted to text that your AI agent can understand and respond to.
For example, if a caller presses “5” on their keypad, your agent will receive this as text and can respond accordingly:
Sending DTMF Tones
The built-in playDtmfSounds
tool allows your AI agent to send DTMF tones, which is useful for navigating Interactive Voice Response (IVR) systems or other phone trees. To enable the tool, add it to the selectedTools
array when creating a call or call stage:
The playDtmfSounds
tool accepts a string parameter named digits
and works with the following tones: 0-9, *, #, A-D.
For example:
Note: the playDtmfSounds
tool uses an automatic parameter that sends the proper sample rate of the source audio and should be treated as an implementation detail.
Common Use Cases
- Building interactive phone trees or IVR systems
- Creating agents that can navigate existing phone systems
- Enabling quick responses through keypad input
- Collecting numeric input (e.g., account numbers, PIN codes)
- Building hybrid voice/keypad interfaces