Flutter
flutter add ultravox_client
The Ultravox REST API is used to create calls but you must use one of the Ultravox client SDKs to join and end calls. This page primarily uses examples in JavaScript. The concepts are the same across all the different SDK implementations.
The core of the SDK is the UltravoxSession
. The session is used to join and leave calls.
The UltravoxSession
contains methods for joining/leaving a call, for sending text messages to the model and for muting the microphone/speaker.
Joins a call. Requires a joinUrl (string). Returns an UltravoxSessionState
.
Leaves the current call. Returns a promise (with no return value) that resolves when the call has successfully been left.
Sends a message to the model. Model replies via text transcript. Requires inputting the text message (string).
Returns a boolen indicating if the end user’s microphone is muted. This is scoped to the Ultravox SDK and does not detect muting done by the user outside of your application.
Returns a boolen indicating if the speaker (the agent’s voice output) is muted. This is scoped to the Ultravox SDK and does not detect muting done by the user outside of your application.
Mutes the end user’s microphone. This is scoped to the Ultravox SDK.
Unmutes the end user’s microphone. This is scoped to the Ultravox SDK.
Mutes the end user’s speaker (the agent’s voice output). This is scoped to the Ultravox SDK.
Unmutes the end user’s speaker (the agent’s voice output). This is scoped to the Ultravox SDK.
When a call is joined, an UltravoxSessionState
is returned. This object returns the current status, can be used to get text transcripts of the call, and surfaces debug messages that are helpful when building your application.
The session status is based on the UltravoxSessionStatus
enum and can be one of the following:
status | description |
---|---|
disconnected | Session is not connected. This is the initial state prior to joinCall. |
disconnecting | Session is in the process of disconnecting. |
connecting | Session is establishing the connection. |
idle | Session is connected but not yet active. |
listening | Listening to the end user. |
thinking | The model is processing/thinking. |
speaking | The model is speaking. |
The status can be retrieved by adding an event listener to the state. Building on what we did above:
Sometimes you may want to augment the audio with text transcripts (e.g. if you want to show the end user the model’s output in real-time). Transcripts can be retrieved by adding an event listener to state:
Transcripts are an array of transcript objects. Each transcript has the following properties:
property | type | definition |
---|---|---|
text | string | Text transcript of the speech from the end user or the agent. |
isFinal | boolean | True if the transcript represents a complete utterance. False if it is a fragment of an utterance that is still underway. |
speaker | Role | Either “user” or “agent”. Denotes who was speaking. |
medium | Medium | Either “voice” or “text”. Denotes how the message was sent. |
The state object also provides debug messages. Debug messages must be enabled when creating the UltravoxSession
and then are available via an event listener similar to status and transcripts:
When the agent invokes a tool, the message contains the function, all arguments, and an invocation ID:
When the tool call completes, the message contains an array of messages. Multiple tools can be invoked by the model. This message array will conatain all the calls followed by all the results. These messages are also available via List Call Messages.
Here’s an example of what we might see from a single tool invocation:
Flutter
flutter add ultravox_client
JavaScript
npm install ultravox-client
Kotlin
Python
pip install ultravox-client