Skip to content

Connection Options: WebRTC, Telephony, and WebSockets

The Ultravox API allows you to create AI-powered voice applications that can interact through various protocols:

  • WebRTC → Default protocol for browser and mobile applications.
  • Regular Phone Numbers → Receive incoming or make outgoing phone calls (via Telnxy, Twilio, or Plivo).
  • WebSockets → Direct server-to-server integration.

Choosing a Protocol

Choose your integration method based on your needs:

  • WebRTC: Best for most integrations, especially for any client deployment (for example, browsers or mobile clients). This is the default. Get started with the Ultravox client SDK.
  • Phone: For traditional phone network integration. Ultravox integrates directly with Telnyx, Twilio, and Plivo.
  • WebSocket: For server-to-server integration, especially when you already have high-bandwidth connections between your server and clients.

Phone Integration

Ultravox integrates with multiple telephony providers, enabling you to create AI-powered voice applications that can interact through regular phone networks. You can build AI agents that make outgoing calls and answer incoming calls, opening up possibilities for customer service, automated outreach, and other voice-based AI applications.

Supported Providers

Prerequisites

Creating a Phone Call

Creating an Ultravox call that works with phone integration is similar to creating a WebRTC call, but requires specific parameters in the Create Call command:

mediumobjectTells Ultravox which protocol to use. For phone calls, must be set to one of: {"telnyx": {}}, {"twilio": {}}, or {"plivo": {}}. Defaults to {"webRtc": {}}.
firstSpeakerstringTells Ultravox who should speak first. For outgoing calls, typically set to "FIRST_SPEAKER_USER". The default is "FIRST_SPEAKER_AGENT".

Example request body for an outgoing phone call:

{
"systemPrompt": "You are a helpful assistant...",
...
"medium": {
"telnyx": {} // or "twilio": {} or "plivo": {}
},
"firstSpeaker": "FIRST_SPEAKER_USER"
}

Provider-Specific Integration

Outgoing Calls with Telnyx

  1. Create an Ultravox Call → Create a new call as shown above with medium: { "telnyx": {} }, firstSpeaker: "FIRST_SPEAKER_USER", and get a joinUrl.

  2. Initiate Telnyx Phone call → Use the joinUrl with a TeXML <Stream>:

    // Example using the telnyx node library
    const call = await telnyx.calls.create({
    connection_id: "uuid",
    to: phoneNumber,
    from: telnyxPhoneNumber,
    stream_url: joinUrl,
    stream_track: "both_tracks"
    });

    Or using TeXML:

    <?xml version="1.0" encoding="UTF-8"?>
    <Response>
    <Connect>
    <Stream url="${joinUrl}" />
    </Connect>
    </Response>

Incoming Calls with Telnyx

  1. Create an Ultravox Call → Create a new call with medium: { "telnyx": {} } and firstSpeaker set to “FIRST_SPEAKER_AGENT”.

  2. Handle Inbound Call → Use the joinUrl with a TeXML <Stream>:

    <?xml version="1.0" encoding="UTF-8"?>
    <Response>
    <Connect>
    <Stream url="${joinUrl}" />
    </Connect>
    </Response>

For more details, see the Telnyx documentation.

WebSocket Integration

Creating a WebSocket Call

Creating a WebSocket-based call with Ultravox requires setting medium to serverWebSocket and passing in parameters for sample rates and buffer size.

  • inputSampleRate (required): Sample rate for input (user) audio (e.g., 48000).
  • outputSampleRate (optional): Sample rate for output (agent) audio (defaults to inputSampleRate).
  • clientBufferSizeMs (optional): Size of the client-side audio buffer in milliseconds. Smaller buffers allow for faster interruptions but may cause audio underflow if network latency fluctuates too greatly. For the best of both worlds, set this to some large value (e.g. 30000) and implement support for PlaybackClearBuffer messages. (Defaults to 60).

Example: Creating an Ultravox Call with WebSockets

const response = await fetch('https://api.ultravox.ai/api/calls', {
method: 'POST',
headers: {
'X-API-Key': 'your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
systemPrompt: "You are a helpful assistant...",
model: "fixie-ai/ultravox",
voice: "Mark",
medium: {
serverWebSocket: {
inputSampleRate: 48000,
outputSampleRate: 48000,
clientBufferSizeMs: 30000
}
}
})
});
const { joinUrl } = await response.json();

Example: Joining a Call with Websockets

See Data Messages for more information on all available messages.

import websockets
socket = await websockets.connect(join_url)
audio_send_task = asyncio.create_task(_send_audio(socket))
async for message in socket:
if isinstance(message, bytes):
# Handle agent audio data
else:
# Handle data message. See "Data Messages"
...
async def _send_audio(socket: websockets.WebSocketClientProtocol):
async for chunk in some_audio_source:
# chunk should be a bytes object containing s16le PCM audio from the user
self._socket.send(chunk)