|
| 1 | +# Azure OpenAI Realtime client library for Java (experimental) |
| 2 | + |
| 3 | +This preview introduces a new `/realtime` API endpoint for the `gpt-4o-realtime-preview` model family. `/realtime`: |
| 4 | + |
| 5 | +- Supports low-latency, "speech in, speech out" conversational interactions |
| 6 | +- Works with text messages, function tool calling, and many other existing capabilities from other endpoints like `/chat/completions` |
| 7 | +- Is a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user |
| 8 | + |
| 9 | +`/realtime` is built on [the WebSockets API](https://developer.mozilla.org/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. It's designed to be used in the context of a trusted, intermediate service that manages both connections to end users and model endpoint connections; it **is not** designed to be used directly from untrusted end user devices, and device details like capturing and rendering audio data are outside the scope of the `/realtime` API. |
| 10 | + |
| 11 | +At a summary level, the architecture of an experience built atop `/realtime` looks something like the following (noting that the user interactions, as previously mentioned, are not part of the API itself): |
| 12 | + |
| 13 | +```mermaid |
| 14 | +sequenceDiagram |
| 15 | + actor User as End User |
| 16 | + participant MiddleTier as /realtime host |
| 17 | + participant AOAI as Azure OpenAI |
| 18 | + User->>MiddleTier: Begin interaction |
| 19 | + MiddleTier->>MiddleTier: Authenticate/Validate User |
| 20 | + MiddleTier--)User: audio information |
| 21 | + User--)MiddleTier: |
| 22 | + MiddleTier--)User: text information |
| 23 | + User--)MiddleTier: |
| 24 | + MiddleTier--)User: control information |
| 25 | + User--)MiddleTier: |
| 26 | + MiddleTier->>AOAI: connect to /realtime |
| 27 | + MiddleTier->>AOAI: configure session |
| 28 | + AOAI->>MiddleTier: session start |
| 29 | + MiddleTier--)AOAI: send/receive WS commands |
| 30 | + AOAI--)MiddleTier: |
| 31 | + AOAI--)MiddleTier: create/start conversation responses |
| 32 | + AOAI--)MiddleTier: (within responses) create/start/add/finish items |
| 33 | + AOAI--)MiddleTier: (within items) create/stream/finish content parts |
| 34 | +``` |
| 35 | + |
| 36 | +Note that `/realtime` is in **public preview**. API changes, code updates, and occasional service disruptions are expected. |
| 37 | + |
| 38 | +This client library is currently made available **only in our dev feed**. For detailed instructions see the [dev feed documentation.][dev_feed_instructions] |
| 39 | + |
| 40 | +## Getting started |
| 41 | + |
| 42 | +### Prerequisites |
| 43 | + |
| 44 | +- [Java Development Kit (JDK)][jdk] with version 8 or above |
| 45 | +- [Azure Subscription][azure_subscription] |
| 46 | +- [Azure OpenAI access][azure_openai_access] |
| 47 | +- [Quickstart: GPT-4o Realtime API for speech and audio (Preview)][quickstart] |
| 48 | + |
| 49 | +### Adding the package to your project |
| 50 | + |
| 51 | +This project is currently only available in the dev feed. For detailed instructions on how to set up your project to consume the dev feed |
| 52 | +please visit the [dev feed documentation page.][dev_feed_instructions] Here you can find the steps for the `maven` and `gradle setup. |
| 53 | + |
| 54 | +#### Maven dev feed setup |
| 55 | + |
| 56 | +##### Step 1: get a PAT (Personal Access Token) |
| 57 | + |
| 58 | +Generate a [Personal Access Token](https://dev.azure.com/azure-sdk/_details/security/tokens) with *Packaging* read & write scopes. |
| 59 | + |
| 60 | +##### Step 2: Project setup |
| 61 | + |
| 62 | +Add the repo to **both** your pom.xml's `<repositories>` and `<distributionManagement>` sections |
| 63 | + |
| 64 | +```xml |
| 65 | +<repository> |
| 66 | + <id>azure-sdk-for-java</id> |
| 67 | + <url>https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-java/maven/v1</url> |
| 68 | + <releases> |
| 69 | + <enabled>true</enabled> |
| 70 | + </releases> |
| 71 | + <snapshots> |
| 72 | + <enabled>true</enabled> |
| 73 | + </snapshots> |
| 74 | +</repository> |
| 75 | +``` |
| 76 | + |
| 77 | +Add or edit the `settings.xml` file in `${user.home}/.m2` |
| 78 | + |
| 79 | +```xml |
| 80 | + <server> |
| 81 | + <id>azure-sdk-for-java</id> |
| 82 | + <username>azure-sdk</username> |
| 83 | + <password>[PERSONAL_ACCESS_TOKEN]</password> |
| 84 | + </server> |
| 85 | +``` |
| 86 | + |
| 87 | +Replace `[PERSONAL_ACCESS_TOKEN]` in the `<password>` tag with the PAT you generated in [step 1.](#step-1-get-a-pat-personal-access-token) |
| 88 | + |
| 89 | +##### Step 3: Add project dependency |
| 90 | + |
| 91 | +Add to your project's pom.xml file |
| 92 | + |
| 93 | +[//]: # ({x-version-update-start;com.azure:azure-ai-openai-realtime;current}) |
| 94 | +```xml |
| 95 | +<dependency> |
| 96 | + <groupId>com.azure</groupId> |
| 97 | + <artifactId>azure-ai-openai-realtime</artifactId> |
| 98 | + <version>1.0.0-beta.1</version> |
| 99 | +</dependency> |
| 100 | +``` |
| 101 | +[//]: # ({x-version-update-end}) |
| 102 | + |
| 103 | +Then run: |
| 104 | + |
| 105 | +```commandline |
| 106 | +mvn install |
| 107 | +``` |
| 108 | + |
| 109 | +#### Gradle setup |
| 110 | + |
| 111 | +##### Step 1: get a PAT (Identical to the step for Maven setup) |
| 112 | + |
| 113 | +generate a [Personal Access Token](https://dev.azure.com/azure-sdk/_details/security/tokens) with *Packaging* read & write scopes. |
| 114 | + |
| 115 | +##### Step 2: Project setup |
| 116 | + |
| 117 | +Add this section to your `build.gradle` file in **both** the `repositories` and `publishing.repositories` containers. |
| 118 | + |
| 119 | +```groovy |
| 120 | +maven { |
| 121 | + url 'https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-java/maven/v1' |
| 122 | + name 'azure-sdk-for-java' |
| 123 | + credentials(PasswordCredentials) |
| 124 | + authentication { |
| 125 | + basic(BasicAuthentication) |
| 126 | + } |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +Add or edit the `gradle.properties` file in `${user.home}/.gradle` |
| 131 | + |
| 132 | +```groovy |
| 133 | +azure-sdk-for-javaUsername=azure-sdk |
| 134 | +azure-sdk-for-javaPassword=PERSONAL_ACCESS_TOKEN |
| 135 | +``` |
| 136 | + |
| 137 | +Replace `PERSONAL_ACCESS_TOKEN` being assign to `azure-sdk-for-javaPassword` with the PAT you generated in [step 1.](#step-1-get-a-pat-personal-access-token) |
| 138 | + |
| 139 | +##### Step 3: Add project dependency |
| 140 | + |
| 141 | +Add to your project setup |
| 142 | + |
| 143 | +```groovy |
| 144 | +compile(group: 'com.azure', name: 'azure-ai-openai-realtime', version: '1.0.0-beta.1') |
| 145 | +``` |
| 146 | +Then run: |
| 147 | + |
| 148 | +```commandline |
| 149 | +gradle build |
| 150 | +``` |
| 151 | + |
| 152 | +### Authentication |
| 153 | + |
| 154 | +In order to interact with the Azure OpenAI Service you'll need to create an instance of client class, |
| 155 | +[RealtimeAsyncClient][realtime_client_async] or [RealtimeClient][realtime_client_sync] by using |
| 156 | +[RealtimeClientBuilder][realtime_client_builder]. To configure a client for use with |
| 157 | +Azure OpenAI, provide a valid endpoint URI to an Azure OpenAI resource along with a corresponding key credential and |
| 158 | +token credential. |
| 159 | + |
| 160 | +#### Example: Azure OpenAI |
| 161 | + |
| 162 | +Get an Azure OpenAI `key` credential form the Azure Portal. |
| 163 | + |
| 164 | +```java readme-sample-createSyncAzureClientKeyCredential |
| 165 | +RealtimeClient client = new RealtimeClientBuilder() |
| 166 | + .credential(new AzureKeyCredential("{key}")) |
| 167 | + .endpoint("{endpoint}") |
| 168 | + .buildClient(); |
| 169 | +``` |
| 170 | + |
| 171 | +Alternatively, to build an async client: |
| 172 | + |
| 173 | +```java readme-sample-createAsyncAzureClientKeyCredential |
| 174 | +RealtimeAsyncClient client = new RealtimeClientBuilder() |
| 175 | + .credential(new KeyCredential("{key}")) |
| 176 | + .endpoint("{endpoint}") |
| 177 | + .buildAsyncClient(); |
| 178 | +``` |
| 179 | + |
| 180 | +#### Example: non-Azure OpenAI |
| 181 | + |
| 182 | +If we omit the `endpoint` parameter, the client built will assume we are operating agains the non-Azure OpenAI server |
| 183 | + |
| 184 | +```java readme-sample-createSyncNonAzureClientKeyCredential |
| 185 | +RealtimeClient client = new RealtimeClientBuilder() |
| 186 | + .credential(new KeyCredential("{key}")) |
| 187 | + .buildClient(); |
| 188 | +``` |
| 189 | + |
| 190 | +Alternatively, to build an async client: |
| 191 | + |
| 192 | +```java readme-sample-createAsyncNonAzureClientKeyCredential |
| 193 | +RealtimeAsyncClient client = new RealtimeClientBuilder() |
| 194 | + .credential(new KeyCredential("{key}")) |
| 195 | + .buildAsyncClient(); |
| 196 | +``` |
| 197 | + |
| 198 | +## Key concepts |
| 199 | + |
| 200 | +For a more detailed guide please refer to the [Azure OpenAI realtime][aoai_samples_readme] general API guide. |
| 201 | + |
| 202 | +- A caller establishes a connection to `/realtime`, which starts a new `session` |
| 203 | +- The `session` can be configured to customize input and output audio behavior, voice activity detection behavior, and other shared settings |
| 204 | +- A `session` automatically creates a default `conversation` |
| 205 | + - Note: in the future, multiple concurrent conversations may be supported -- this is not currently available |
| 206 | +- The `conversation` accumulates input signals until a `response` is started, either via a direct command by the caller or automatically by voice-activity-based turn detection |
| 207 | +- Each `response` consists of one or more `items`, which can encapsulate messages, function calls, and other information |
| 208 | +- Message `item`s have `content_part`s, allowing multiple modalities (text, audio) to be represented across a single item |
| 209 | +- The `session` manages configuration of caller input handling (e.g. user audio) and common output/generation handling |
| 210 | +- Each caller-initiated `response.create` can override some of the output `response` behavior, if desired |
| 211 | +- Server-created `item`s and the `content_part`s in messages can be populated asynchronously and in parallel, e.g. receiving audio, text, and function information concurrently (round-robin) |
| 212 | + |
| 213 | +## Examples |
| 214 | + |
| 215 | +We can setup the Realtime session to return both text and audio. |
| 216 | +```java readme-sample-sessionUpdate |
| 217 | +client.sendMessage(new SessionUpdateEvent( |
| 218 | + new RealtimeRequestSession() |
| 219 | + .setVoice(RealtimeVoice.ALLOY) |
| 220 | + .setTurnDetection( |
| 221 | + new RealtimeServerVadTurnDetection() |
| 222 | + .setThreshold(0.5) |
| 223 | + .setPrefixPaddingMs(300) |
| 224 | + .setSilenceDurationMs(200) |
| 225 | + ).setInputAudioTranscription(new RealtimeAudioInputTranscriptionSettings( |
| 226 | + RealtimeAudioInputTranscriptionModel.WHISPER_1) |
| 227 | + ).setModalities(Arrays.asList(RealtimeRequestSessionModality.AUDIO, RealtimeRequestSessionModality.TEXT)) |
| 228 | +)); |
| 229 | +``` |
| 230 | + |
| 231 | +With the Azure OpenAI Realtime Audio client library, one can provide a prompt as an audio file. |
| 232 | + |
| 233 | +```java readme-sample-uploadAudioFile |
| 234 | +RealtimeClient client = new RealtimeClientBuilder() |
| 235 | + .credential(new AzureKeyCredential("{key}")) |
| 236 | + .endpoint("{endpoint}") |
| 237 | + .buildClient(); |
| 238 | + |
| 239 | +String audioFilePath = "{path to audio file}"; |
| 240 | +byte[] audioBytes = Files.readAllBytes(Paths.get(audioFilePath)); |
| 241 | + |
| 242 | +client.addOnResponseDoneEventHandler(event -> { |
| 243 | + System.out.println("Response done"); |
| 244 | +}); |
| 245 | + |
| 246 | +client.start(); |
| 247 | +client.sendMessage(new InputAudioBufferAppendEvent(audioBytes)); |
| 248 | +``` |
| 249 | + |
| 250 | +To consume the text and audio produced by the server we setup the following callbacks in an async scenario. |
| 251 | + |
| 252 | +```java readme-sample-consumeSpecificEventsAsync |
| 253 | +RealtimeAsyncClient client = new RealtimeClientBuilder() |
| 254 | + .credential(new KeyCredential("{key}")) |
| 255 | + .buildAsyncClient(); |
| 256 | + |
| 257 | +Disposable.Composite disposables = Disposables.composite(); |
| 258 | + |
| 259 | +disposables.addAll(Arrays.asList( |
| 260 | + client.getServerEvents() |
| 261 | + .takeUntil(serverEvent -> serverEvent instanceof ResponseAudioDoneEvent) |
| 262 | + .ofType(ResponseAudioDeltaEvent.class) |
| 263 | + .subscribe(this::consumeAudioDelta, this::consumeError, this::onAudioResponseCompleted), |
| 264 | + client.getServerEvents() |
| 265 | + .takeUntil(serverEvent -> serverEvent instanceof ResponseAudioTranscriptDoneEvent) |
| 266 | + .ofType(ResponseAudioTranscriptDeltaEvent.class) |
| 267 | + .subscribe(this::consumeAudioTranscriptDelta, this::consumeError, this::onAudioResponseTranscriptCompleted) |
| 268 | +)); |
| 269 | +``` |
| 270 | + |
| 271 | +## Troubleshooting |
| 272 | + |
| 273 | +### Enable client logging |
| 274 | +You can set the `AZURE_LOG_LEVEL` environment variable to view logging statements made in the client library. For |
| 275 | +example, setting `AZURE_LOG_LEVEL=2` would show all informational, warning, and error log messages. The log levels can |
| 276 | +be found here: [log levels][log_levels]. |
| 277 | + |
| 278 | +### Default HTTP Client |
| 279 | +All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure |
| 280 | +the client library to use the Netty HTTP client. Configuring or changing the HTTP client is detailed in the |
| 281 | +[HTTP clients wiki](https://learn.microsoft.com/azure/developer/java/sdk/http-client-pipeline#http-clients). |
| 282 | + |
| 283 | +### Default SSL library |
| 284 | +All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL |
| 285 | +operations. The Boring SSL library is an uber jar containing native libraries for Linux / macOS / Windows, and provides |
| 286 | +better performance compared to the default SSL implementation within the JDK. For more information, including how to |
| 287 | +reduce the dependency size, refer to the [performance tuning][performance_tuning] section of the wiki. |
| 288 | + |
| 289 | +## Next steps |
| 290 | + |
| 291 | +- Samples are explained in detail [here][samples_readme]. |
| 292 | + |
| 293 | +## Contributing |
| 294 | + |
| 295 | +For details on contributing to this repository, see the [contributing guide](https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md). |
| 296 | + |
| 297 | +1. Fork it |
| 298 | +2. Create your feature branch (`git checkout -b my-new-feature`) |
| 299 | +3. Commit your changes (`git commit -am 'Add some feature'`) |
| 300 | +4. Push to the branch (`git push origin my-new-feature`) |
| 301 | +5. Create new Pull Request |
| 302 | + |
| 303 | +<!-- LINKS --> |
| 304 | +[aoai_samples_readme]: https://github.com/Azure-Samples/aoai-realtime-audio-sdk/blob/main/README.md |
| 305 | +[aoai_samples_readme_api_concepts]: https://github.com/Azure-Samples/aoai-realtime-audio-sdk/blob/main/README.md#api-concepts |
| 306 | +[azure_subscription]: https://azure.microsoft.com/free/ |
| 307 | +[azure_openai_access]: https://learn.microsoft.com/azure/cognitive-services/openai/overview#how-do-i-get-access-to-azure-openai |
| 308 | +[jdk]: https://docs.microsoft.com/java/azure/jdk/ |
| 309 | +[dev_feed_instructions]: https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#dev-feed |
| 310 | +[log_levels]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core/src/main/java/com/azure/core/util/logging/ClientLogger.java |
| 311 | +[performance_tuning]: https://github.com/Azure/azure-sdk-for-java/wiki/Performance-Tuning |
| 312 | +[samples_readme]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai-realtime/src/samples |
| 313 | +[quickstart]: https://learn.microsoft.com/azure/ai-services/openai/realtime-audio-quickstart |
| 314 | +[realtime_client_async]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeAsyncClient.java |
| 315 | +[realtime_client_sync]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeClient.java |
| 316 | +[realtime_client_builder]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeClientBuilder.java |
0 commit comments