The Ultravox API allows you to create AI-powered voice applications that can interact through various protocols:

  • WebRTC → Default protocol for browser and mobile applications.
  • Regular Phone Numbers → Connect Ultravox to phone calls you make or receive (via Telnxy, Twilio, or Plivo).
  • WebSockets → Direct server-to-server integration.

Choosing a Protocol

Choose your integration method based on your needs:

  • WebRTC: Best for most integrations, especially for any client deployment (for example, browsers or mobile clients). This is the default. Get started with the Ultravox client SDK.
  • Phone: For traditional phone network integration. Ultravox integrates directly with Telnyx, Twilio, and Plivo.
  • WebSocket: For server-to-server integration, especially when you already have high-bandwidth connections between your server and clients. Check out the WebSocket Integration guide for more information.

Phone Integration

Ultravox integrates with multiple telephony providers, enabling you to create AI-powered voice applications that can interact through regular phone networks. You can build AI agents that can make outgoing calls and answer incoming calls, opening up possibilities for customer service, automated outreach, and other voice-based AI applications.

Supported Providers

Prerequisites

Prerequisites

Make sure you have:

  1. An active account with your chosen provider (Telnyx, Twilio, or Plivo)
  2. A phone number purchased from your provider

Connecting Ultravox to a Phone Call

”Calls” is Overloaded

It can be a bit confusing because Ultravox Realtime uses the concept of a “call” to mean a voice session between an AI agent and another party. For phone calls, you will accept incoming or make outgoing calls via your chosen telephony provider and then connect those calls to an Ultravox call.

Creating an Ultravox call that you can connect with a phone call is similar to creating a WebRTC call, but requires specific parameters in the Create Call command:

medium
object
default:
"{'webRtc': {}}"

Tells Ultravox which protocol to use. For phone calls, must be set to one of:

{"telnyx": {}}, {"twilio": {}}, or {"plivo": {}}.

Defaults to {"webRtc": {}}.

firstSpeaker
string
default:
"FIRST_SPEAKER_AGENT"

Tells Ultravox who should speak first.

For outgoing calls, typically set to "FIRST_SPEAKER_USER".

The default is "FIRST_SPEAKER_AGENT".

Example: Outgoing Call via Telnyx
{
  "systemPrompt": "You are a helpful assistant...",
  ...
  "medium": {
    "telnyx": {} // or "twilio": {} or "plivo": {}
  },
  "firstSpeaker": "FIRST_SPEAKER_USER"
}

Provider-Specific Integration

Outgoing Calls with Telnyx

1

Create an Ultravox Call

Create a new call as shown above with medium: { "telnyx": {} }, firstSpeaker: "FIRST_SPEAKER_USER", and get a joinUrl.

2

Connect Ultravox to the Telnyx Phone Call

Use the joinUrl with a TeXML <Stream>:

// Example using the telnyx node library
const call = await telnyx.calls.create({
  connection_id: "uuid",
  to: phoneNumber,
  from: telnyxPhoneNumber,
  stream_url: joinUrl,
  stream_track: "both_tracks"
});

Or using TeXML:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Connect>
    <Stream url="${joinUrl}" />
  </Connect>
</Response>

Incoming Calls with Telnyx

1

Create an Ultravox Call

Create a new call with medium: { "telnyx": {} } and firstSpeaker set to "FIRST_SPEAKER_AGENT".

2

Connect the Inbound Telnyx Call to Ultravox

Use the joinUrl with a TeXML <Stream>:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Connect>
    <Stream url="${joinUrl}" />
  </Connect>
</Response>

For more details, see the Telnyx documentation.

Provider Support

We currently integrate with Telnyx, Twilio, and Plivo. Please let us know if there’s another integration you’d like to see.

DTMF

Ultravox provides comprehensive support for DTMF (Dual-Tone Multi-Frequency) tones, enabling both sending and receiving tones during phone calls. This enables AI agents to interact with traditional phone systems and allows you to build voice applications that can respond to keypad inputs.

DTMF and WebRTC

Due to the audio codec used in WebRTC connections, DTMF tones are inaudible when using WebRTC. The playDtmfSounds tool is intended for use with telephony integrations.

Receiving DTMF Tones

Ultravox automatically converts incoming DTMF tones to text, making it easy to build interactive voice applications that respond to keypad input. When a caller presses keys on their phone keypad, the tones are converted to text that your AI agent can understand and respond to.

For example, if a caller presses “5” on their keypad, your agent will receive this as text and can respond accordingly:

// Example system prompt for an agent that handles DTMF input
{
  "systemPrompt": `You are an automated phone system.
    When a caller joins, say: "Welcome! Press 1 for sales, 2 for support, or 3 for billing."
    If they press 1, transfer them to sales using the transfer tool.
    If they press 2, transfer them to support.
    If they press 3, transfer them to billing.
    If they press any other key, ask them to try again with a valid option."`
}

Sending DTMF Tones

The built-in playDtmfSounds tool allows your AI agent to send DTMF tones, which is useful for navigating Interactive Voice Response (IVR) systems or other phone trees. To enable the tool, add it to the selectedTools array when creating a call or call stage:

// Example request body for creating a call with DTMF capability
{
  "systemPrompt": "You are a helpful assistant. When prompted to dial an extension, use the 'playDtmfSounds' tool to send the appropriate tones.",
  "selectedTools": [
    { "toolName": "playDtmfSounds" }
  ]
}

The playDtmfSounds tool accepts a string parameter named digits and works with the following tones: 0-9, *, #, A-D.

For example:

// Example of using the playDtmfSounds tool to dial an extension
{
  "digits": "123#"  // Will play tones for 1, 2, 3, and # in sequence
}

Note: the playDtmfSounds tool uses an automatic parameter that sends the proper sample rate of the source audio and should be treated as an implementation detail.

Common Use Cases

  • Building interactive phone trees or IVR systems
  • Creating agents that can navigate existing phone systems
  • Enabling quick responses through keypad input
  • Collecting numeric input (e.g., account numbers, PIN codes)
  • Building hybrid voice/keypad interfaces