Skip to content

Commit 832f06c

Browse files
jpalvarezlsrnagarglecaros
authored
[OpenAI] Azure OpenAI realtime client library for Java first version (#42707)
* WIP: trying to get code gen to work * added code gen classes * Project compiles * Added BaseOpenAIClient to be able to pass for feature clients * Re-structure packages * Adding classes for websocket comms * Added placeholder classes * Added .env to gitignore list * Client impls wired together * Added more required classes for ws protocol * Added clarifying comments * Cleaned up pointless interface * Added classes to provide service specific configurations * Adjusted ClientEndpointConfiguration for our usecase * Protocol correction in hardocoded URL * Added subprotocol and bearer prefix for header * Wired RealtimeClientBuilder with RealtimeAsyncClient * minor renames * small cleanup * Project compiles and fields are propagated to the async client * Finally getting a 404 * Got finally a 200 * Adding graceful stop and server message emission * Finished up handlers * WIP: added close implementation * WIP: sample is still in progress, serialization works * WIP: trying to send audio * WIP: sample works as in other languages * Restored Azure OpenAI Inference SDK * Added AOAI realtime SDK * Trying out solution for frames assumed to be contiguous in wss session * Restored doc with TODOs * Fixed config for nonAzure OAI * Refactored LowLevelSample * Added unit tests and cleaned up sample a bit * Cleaned up samples * Re-structured test packages * Trying to cleanup implementation for frame collection * Replaced print with log statements * Restored print statements * Added support for more generalized authentication types * Removed last references in inference of realtime client * Removed old readme file * Setup WebsocketFrameAggregator from Netty * Replaced ackId with String eventId * WIP: cleanup * Added license header to source files * Files for Readme and its samples * Added sync client bare bones impl * More cleanup * cleaned up versions * Adding documentation * adding more tests * Item manipulation test passing * Fixed textOnly test bad config * Trying to make a tool/audio test * Test green * Ported final test from .NET * Blocking on StepVerifier * Ported tests for nonAzure case * Added more eventHandler methods for the sync client * Added the remove handler analogous methods for the sync client * Added first blocking test * Updated netty and add exclussion directive for bannedDeps * style check * mvn packaging works (by skipping most things anyway) * ported canConfigureSession to blocking client tests * Added blocking version of textOnly test * ItemManipulation test sync passing * Added tests for tool with audio file * Finalized sync Azure tests * Added nonAzure blocking * initial tsp-location * Corrections * Commit hash update * Added some customizations to work around code gen issues * Added package info to utils package * Cleaned up imports * Reverting changes in other libraries * Reverted changes * Regen with problems * Fixed customization problems * WIP: main readme * Added to Flux.error mapper, convenience ctx, new sample * Customized VAD detection to pass ms values as numeric JSON * File upload * Introduced AudioFile to handle different sample rates, etc. * Corrected values for file in sample * Sample sends file correctly * ClienSample completed with audio verified * cleanup * method rename * Sample documentation * More documentation * More sample documentation * Added sync usage sample * Updated files according to APIView feedback * Re-run code gen * samples/README.md setup ready * Adding scaffoldings for the readme * Added dev feed instruction setup to README * Added readme samples and completed main readme * Update sdk/openai/azure-ai-openai-realtime/README.md Co-authored-by: Srikanta <[email protected]> * First round of feedback * Added traits to builder * Disabled tests * Renamed all events * Re-added code customizations lost in the code regen * mvn clean install run * Style checks passing * Disabling jacoco for now * Update sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/OpenAIRealtimeServiceVersion.java Co-authored-by: Srikanta <[email protected]> * bumping versions * bump * adding library * current version * temporary fix * cspell ignore * locale * metadata * workaround * fix * javadoc --------- Co-authored-by: Srikanta <[email protected]> Co-authored-by: Gerardo Lecaros <[email protected]>
1 parent 91d4103 commit 832f06c

File tree

152 files changed

+18614
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

152 files changed

+18614
-2
lines changed

.github/CODEOWNERS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -659,6 +659,9 @@
659659
# PRLabel: %OpenAI
660660
/sdk/openai/azure-ai-openai-assistants/ @brandom-msft @jpalvarezl @mssfang
661661

662+
# PRLabel: %OpenAI
663+
/sdk/openai/azure-ai-openai-realtime/ @brandom-msft @jpalvarezl @mssfang
664+
662665
# ServiceLabel: %Operational Insights
663666
# ServiceOwners: @AzmonLogA
664667

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ temp/
1515

1616
# Sensitive files
1717
*.json.env
18+
.env
1819

1920
#javadoc overview files generated from README.md
2021
readme_overview.html

.vscode/cspell.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,8 @@
214214
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/StemmerTokenFilterLanguage.java",
215215
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/SnowballTokenFilterLanguage.java",
216216
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/TextTranslationSkillLanguage.java",
217-
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/TokenFilterName.java"
217+
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/TokenFilterName.java",
218+
"sdk/openai/azure-ai-openai-realtime/tsp-location.yaml"
218219
],
219220
"words": [
220221
"adal",

eng/versioning/version_client.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ com.azure:azure-ai-metricsadvisor;1.2.3;1.3.0-beta.1
4747
com.azure:azure-ai-metricsadvisor-perf;1.0.0-beta.1;1.0.0-beta.1
4848
com.azure:azure-ai-openai;1.0.0-beta.12;1.0.0-beta.13
4949
com.azure:azure-ai-openai-assistants;1.0.0-beta.4;1.0.0-beta.5
50+
com.azure:azure-ai-openai-realtime;1.0.0-beta.1;1.0.0-beta.1
5051
com.azure:azure-ai-personalizer;1.0.0-beta.1;1.0.0-beta.2
5152
com.azure:azure-ai-textanalytics;5.5.3;5.6.0-beta.1
5253
com.azure:azure-ai-textanalytics-perf;1.0.0-beta.1;1.0.0-beta.1
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
### Other Changes
2+
3+
#### Dependency Updates
4+
5+
## 1.0.0-beta.1 (TBD)
6+
7+
- Azure OpenAI Realtime client library for Java.
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
# Azure OpenAI Realtime client library for Java (experimental)
2+
3+
This preview introduces a new `/realtime` API endpoint for the `gpt-4o-realtime-preview` model family. `/realtime`:
4+
5+
- Supports low-latency, "speech in, speech out" conversational interactions
6+
- Works with text messages, function tool calling, and many other existing capabilities from other endpoints like `/chat/completions`
7+
- Is a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user
8+
9+
`/realtime` is built on [the WebSockets API](https://developer.mozilla.org/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. It's designed to be used in the context of a trusted, intermediate service that manages both connections to end users and model endpoint connections; it **is not** designed to be used directly from untrusted end user devices, and device details like capturing and rendering audio data are outside the scope of the `/realtime` API.
10+
11+
At a summary level, the architecture of an experience built atop `/realtime` looks something like the following (noting that the user interactions, as previously mentioned, are not part of the API itself):
12+
13+
```mermaid
14+
sequenceDiagram
15+
actor User as End User
16+
participant MiddleTier as /realtime host
17+
participant AOAI as Azure OpenAI
18+
User->>MiddleTier: Begin interaction
19+
MiddleTier->>MiddleTier: Authenticate/Validate User
20+
MiddleTier--)User: audio information
21+
User--)MiddleTier:
22+
MiddleTier--)User: text information
23+
User--)MiddleTier:
24+
MiddleTier--)User: control information
25+
User--)MiddleTier:
26+
MiddleTier->>AOAI: connect to /realtime
27+
MiddleTier->>AOAI: configure session
28+
AOAI->>MiddleTier: session start
29+
MiddleTier--)AOAI: send/receive WS commands
30+
AOAI--)MiddleTier:
31+
AOAI--)MiddleTier: create/start conversation responses
32+
AOAI--)MiddleTier: (within responses) create/start/add/finish items
33+
AOAI--)MiddleTier: (within items) create/stream/finish content parts
34+
```
35+
36+
Note that `/realtime` is in **public preview**. API changes, code updates, and occasional service disruptions are expected.
37+
38+
This client library is currently made available **only in our dev feed**. For detailed instructions see the [dev feed documentation.][dev_feed_instructions]
39+
40+
## Getting started
41+
42+
### Prerequisites
43+
44+
- [Java Development Kit (JDK)][jdk] with version 8 or above
45+
- [Azure Subscription][azure_subscription]
46+
- [Azure OpenAI access][azure_openai_access]
47+
- [Quickstart: GPT-4o Realtime API for speech and audio (Preview)][quickstart]
48+
49+
### Adding the package to your project
50+
51+
This project is currently only available in the dev feed. For detailed instructions on how to set up your project to consume the dev feed
52+
please visit the [dev feed documentation page.][dev_feed_instructions] Here you can find the steps for the `maven` and `gradle setup.
53+
54+
#### Maven dev feed setup
55+
56+
##### Step 1: get a PAT (Personal Access Token)
57+
58+
Generate a [Personal Access Token](https://dev.azure.com/azure-sdk/_details/security/tokens) with *Packaging* read & write scopes.
59+
60+
##### Step 2: Project setup
61+
62+
Add the repo to **both** your pom.xml's `<repositories>` and `<distributionManagement>` sections
63+
64+
```xml
65+
<repository>
66+
<id>azure-sdk-for-java</id>
67+
<url>https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-java/maven/v1</url>
68+
<releases>
69+
<enabled>true</enabled>
70+
</releases>
71+
<snapshots>
72+
<enabled>true</enabled>
73+
</snapshots>
74+
</repository>
75+
```
76+
77+
Add or edit the `settings.xml` file in `${user.home}/.m2`
78+
79+
```xml
80+
<server>
81+
<id>azure-sdk-for-java</id>
82+
<username>azure-sdk</username>
83+
<password>[PERSONAL_ACCESS_TOKEN]</password>
84+
</server>
85+
```
86+
87+
Replace `[PERSONAL_ACCESS_TOKEN]` in the `<password>` tag with the PAT you generated in [step 1.](#step-1-get-a-pat-personal-access-token)
88+
89+
##### Step 3: Add project dependency
90+
91+
Add to your project's pom.xml file
92+
93+
[//]: # ({x-version-update-start;com.azure:azure-ai-openai-realtime;current})
94+
```xml
95+
<dependency>
96+
<groupId>com.azure</groupId>
97+
<artifactId>azure-ai-openai-realtime</artifactId>
98+
<version>1.0.0-beta.1</version>
99+
</dependency>
100+
```
101+
[//]: # ({x-version-update-end})
102+
103+
Then run:
104+
105+
```commandline
106+
mvn install
107+
```
108+
109+
#### Gradle setup
110+
111+
##### Step 1: get a PAT (Identical to the step for Maven setup)
112+
113+
generate a [Personal Access Token](https://dev.azure.com/azure-sdk/_details/security/tokens) with *Packaging* read & write scopes.
114+
115+
##### Step 2: Project setup
116+
117+
Add this section to your `build.gradle` file in **both** the `repositories` and `publishing.repositories` containers.
118+
119+
```groovy
120+
maven {
121+
url 'https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-java/maven/v1'
122+
name 'azure-sdk-for-java'
123+
credentials(PasswordCredentials)
124+
authentication {
125+
basic(BasicAuthentication)
126+
}
127+
}
128+
```
129+
130+
Add or edit the `gradle.properties` file in `${user.home}/.gradle`
131+
132+
```groovy
133+
azure-sdk-for-javaUsername=azure-sdk
134+
azure-sdk-for-javaPassword=PERSONAL_ACCESS_TOKEN
135+
```
136+
137+
Replace `PERSONAL_ACCESS_TOKEN` being assign to `azure-sdk-for-javaPassword` with the PAT you generated in [step 1.](#step-1-get-a-pat-personal-access-token)
138+
139+
##### Step 3: Add project dependency
140+
141+
Add to your project setup
142+
143+
```groovy
144+
compile(group: 'com.azure', name: 'azure-ai-openai-realtime', version: '1.0.0-beta.1')
145+
```
146+
Then run:
147+
148+
```commandline
149+
gradle build
150+
```
151+
152+
### Authentication
153+
154+
In order to interact with the Azure OpenAI Service you'll need to create an instance of client class,
155+
[RealtimeAsyncClient][realtime_client_async] or [RealtimeClient][realtime_client_sync] by using
156+
[RealtimeClientBuilder][realtime_client_builder]. To configure a client for use with
157+
Azure OpenAI, provide a valid endpoint URI to an Azure OpenAI resource along with a corresponding key credential and
158+
token credential.
159+
160+
#### Example: Azure OpenAI
161+
162+
Get an Azure OpenAI `key` credential form the Azure Portal.
163+
164+
```java readme-sample-createSyncAzureClientKeyCredential
165+
RealtimeClient client = new RealtimeClientBuilder()
166+
.credential(new AzureKeyCredential("{key}"))
167+
.endpoint("{endpoint}")
168+
.buildClient();
169+
```
170+
171+
Alternatively, to build an async client:
172+
173+
```java readme-sample-createAsyncAzureClientKeyCredential
174+
RealtimeAsyncClient client = new RealtimeClientBuilder()
175+
.credential(new KeyCredential("{key}"))
176+
.endpoint("{endpoint}")
177+
.buildAsyncClient();
178+
```
179+
180+
#### Example: non-Azure OpenAI
181+
182+
If we omit the `endpoint` parameter, the client built will assume we are operating agains the non-Azure OpenAI server
183+
184+
```java readme-sample-createSyncNonAzureClientKeyCredential
185+
RealtimeClient client = new RealtimeClientBuilder()
186+
.credential(new KeyCredential("{key}"))
187+
.buildClient();
188+
```
189+
190+
Alternatively, to build an async client:
191+
192+
```java readme-sample-createAsyncNonAzureClientKeyCredential
193+
RealtimeAsyncClient client = new RealtimeClientBuilder()
194+
.credential(new KeyCredential("{key}"))
195+
.buildAsyncClient();
196+
```
197+
198+
## Key concepts
199+
200+
For a more detailed guide please refer to the [Azure OpenAI realtime][aoai_samples_readme] general API guide.
201+
202+
- A caller establishes a connection to `/realtime`, which starts a new `session`
203+
- The `session` can be configured to customize input and output audio behavior, voice activity detection behavior, and other shared settings
204+
- A `session` automatically creates a default `conversation`
205+
- Note: in the future, multiple concurrent conversations may be supported -- this is not currently available
206+
- The `conversation` accumulates input signals until a `response` is started, either via a direct command by the caller or automatically by voice-activity-based turn detection
207+
- Each `response` consists of one or more `items`, which can encapsulate messages, function calls, and other information
208+
- Message `item`s have `content_part`s, allowing multiple modalities (text, audio) to be represented across a single item
209+
- The `session` manages configuration of caller input handling (e.g. user audio) and common output/generation handling
210+
- Each caller-initiated `response.create` can override some of the output `response` behavior, if desired
211+
- Server-created `item`s and the `content_part`s in messages can be populated asynchronously and in parallel, e.g. receiving audio, text, and function information concurrently (round-robin)
212+
213+
## Examples
214+
215+
We can setup the Realtime session to return both text and audio.
216+
```java readme-sample-sessionUpdate
217+
client.sendMessage(new SessionUpdateEvent(
218+
new RealtimeRequestSession()
219+
.setVoice(RealtimeVoice.ALLOY)
220+
.setTurnDetection(
221+
new RealtimeServerVadTurnDetection()
222+
.setThreshold(0.5)
223+
.setPrefixPaddingMs(300)
224+
.setSilenceDurationMs(200)
225+
).setInputAudioTranscription(new RealtimeAudioInputTranscriptionSettings(
226+
RealtimeAudioInputTranscriptionModel.WHISPER_1)
227+
).setModalities(Arrays.asList(RealtimeRequestSessionModality.AUDIO, RealtimeRequestSessionModality.TEXT))
228+
));
229+
```
230+
231+
With the Azure OpenAI Realtime Audio client library, one can provide a prompt as an audio file.
232+
233+
```java readme-sample-uploadAudioFile
234+
RealtimeClient client = new RealtimeClientBuilder()
235+
.credential(new AzureKeyCredential("{key}"))
236+
.endpoint("{endpoint}")
237+
.buildClient();
238+
239+
String audioFilePath = "{path to audio file}";
240+
byte[] audioBytes = Files.readAllBytes(Paths.get(audioFilePath));
241+
242+
client.addOnResponseDoneEventHandler(event -> {
243+
System.out.println("Response done");
244+
});
245+
246+
client.start();
247+
client.sendMessage(new InputAudioBufferAppendEvent(audioBytes));
248+
```
249+
250+
To consume the text and audio produced by the server we setup the following callbacks in an async scenario.
251+
252+
```java readme-sample-consumeSpecificEventsAsync
253+
RealtimeAsyncClient client = new RealtimeClientBuilder()
254+
.credential(new KeyCredential("{key}"))
255+
.buildAsyncClient();
256+
257+
Disposable.Composite disposables = Disposables.composite();
258+
259+
disposables.addAll(Arrays.asList(
260+
client.getServerEvents()
261+
.takeUntil(serverEvent -> serverEvent instanceof ResponseAudioDoneEvent)
262+
.ofType(ResponseAudioDeltaEvent.class)
263+
.subscribe(this::consumeAudioDelta, this::consumeError, this::onAudioResponseCompleted),
264+
client.getServerEvents()
265+
.takeUntil(serverEvent -> serverEvent instanceof ResponseAudioTranscriptDoneEvent)
266+
.ofType(ResponseAudioTranscriptDeltaEvent.class)
267+
.subscribe(this::consumeAudioTranscriptDelta, this::consumeError, this::onAudioResponseTranscriptCompleted)
268+
));
269+
```
270+
271+
## Troubleshooting
272+
273+
### Enable client logging
274+
You can set the `AZURE_LOG_LEVEL` environment variable to view logging statements made in the client library. For
275+
example, setting `AZURE_LOG_LEVEL=2` would show all informational, warning, and error log messages. The log levels can
276+
be found here: [log levels][log_levels].
277+
278+
### Default HTTP Client
279+
All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure
280+
the client library to use the Netty HTTP client. Configuring or changing the HTTP client is detailed in the
281+
[HTTP clients wiki](https://learn.microsoft.com/azure/developer/java/sdk/http-client-pipeline#http-clients).
282+
283+
### Default SSL library
284+
All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL
285+
operations. The Boring SSL library is an uber jar containing native libraries for Linux / macOS / Windows, and provides
286+
better performance compared to the default SSL implementation within the JDK. For more information, including how to
287+
reduce the dependency size, refer to the [performance tuning][performance_tuning] section of the wiki.
288+
289+
## Next steps
290+
291+
- Samples are explained in detail [here][samples_readme].
292+
293+
## Contributing
294+
295+
For details on contributing to this repository, see the [contributing guide](https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md).
296+
297+
1. Fork it
298+
2. Create your feature branch (`git checkout -b my-new-feature`)
299+
3. Commit your changes (`git commit -am 'Add some feature'`)
300+
4. Push to the branch (`git push origin my-new-feature`)
301+
5. Create new Pull Request
302+
303+
<!-- LINKS -->
304+
[aoai_samples_readme]: https://github.com/Azure-Samples/aoai-realtime-audio-sdk/blob/main/README.md
305+
[aoai_samples_readme_api_concepts]: https://github.com/Azure-Samples/aoai-realtime-audio-sdk/blob/main/README.md#api-concepts
306+
[azure_subscription]: https://azure.microsoft.com/free/
307+
[azure_openai_access]: https://learn.microsoft.com/azure/cognitive-services/openai/overview#how-do-i-get-access-to-azure-openai
308+
[jdk]: https://docs.microsoft.com/java/azure/jdk/
309+
[dev_feed_instructions]: https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#dev-feed
310+
[log_levels]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core/src/main/java/com/azure/core/util/logging/ClientLogger.java
311+
[performance_tuning]: https://github.com/Azure/azure-sdk-for-java/wiki/Performance-Tuning
312+
[samples_readme]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai-realtime/src/samples
313+
[quickstart]: https://learn.microsoft.com/azure/ai-services/openai/realtime-audio-quickstart
314+
[realtime_client_async]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeAsyncClient.java
315+
[realtime_client_sync]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeClient.java
316+
[realtime_client_builder]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeClientBuilder.java
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Troubleshooting OpenAI issues
2+
3+
This troubleshooting guide covers failure investigation techniques, common errors for the credential types in the Azure
4+
OpenAI Realtime Java client library, and mitigation steps to resolve these errors. The common best practice sample can be found
5+
in [Best Practice Samples][best_practice_samples].
6+
7+
## Get additional help
8+
9+
Additional information on ways to reach out for support can be found in the [SUPPORT.md][support] at the root of the repo.
10+
11+
<!-- Links -->
12+
[best_practice_samples]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/samples/README.md
13+
[support]: https://github.com/Azure/azure-sdk-for-java/blob/main/SUPPORT.md

0 commit comments

Comments
 (0)