How to scale call volume with Ultravox.
noun
Plan Type | Concurrency Cap | Priority Access |
---|---|---|
Free / PAYGO | 5 calls | ❌ |
Pro | No hard cap* | ❌ |
Scale | No hard cap* | ✅ Up to 100 |
*Still subject to infra limits under extreme load.
HTTP 429
status code (“Too Many Request”) so you know to try again after a short wait.
This system is designed to help customers scale without having to overpay for concurrency. Most customers don’t need the same amount of concurrency 24 hours a day. Ultravox Realtime is designed to scale with you, and we balance load with 429
s to keep the system fair for everyone. More on how to handle 429s below.
For customers that have high, sustained load, we offer priority call concurrency on our Scale plan. We also offer dedicated capacity as part of our enterprise plans.
Time | Active Calls | New Request | Status | Concurrent Count |
---|---|---|---|---|
0s | - | Create Call 1 | ✅ Success | 1/5 |
2s | Call 1 | Create Call 2 | ✅ Success | 2/5 |
3s | Call 1,2 | Create Call 3 | ✅ Success | 3/5 |
4s | Call 1,2,3 | Create Call 4 | ✅ Success | 4/5 |
5s | Call 1,2,3,4 | Create Call 5 | ✅ Success | 5/5 (At Limit) |
6s | Call 1,2,3,4,5 | Create Call 6 | ❌ HTTP 429 | 5/5 (Rejected) |
7s | Call 2,3,4,5 | Create Call 7 | ❌ HTTP 429 | 5/5 (Rejected) |
8s | Call 2,3,5 | Create Call 8 | ✅ Success | 4/5 |
Retry-After
header to implement a proper retry strategy and avoid overwhelming the system. The Retry-After
header is used to provide the number of seconds to wait before making any additional new requests.
Here’s an example of how to do that with an exponential backoff + retry handling:
Expected July 2025
)