Python: draft initial implementation of Realtime API #10127

eavanvalkenburg · 2025-01-08T16:04:34Z

Motivation and Context

Implements the OpenAI Realtime API with Semantic Kernel

Description

Implements a separate Service Client class with its own ExecutionSettings, but still based on ChatCompletionClientBase.
Only support streaming operations with additional public methods for sending data to the conversation.
TBD if that is the way to move forward with it.

TODO:

lots of comments
tests
cleanup

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

python/samples/concepts/audio/audio_player.py

python/pyproject.toml

python/samples/concepts/audio/audio_recorder_stream.py

...ernel/connectors/ai/open_ai/prompt_execution_settings/open_ai_realtime_execution_settings.py

python/semantic_kernel/connectors/ai/open_ai/services/open_ai_realtime_base.py

python/semantic_kernel/connectors/ai/realtime_client_base.py

python/semantic_kernel/connectors/ai/open_ai/services/open_ai_realtime_base.py

python/semantic_kernel/connectors/ai/open_ai/services/open_ai_realtime_utils.py

python/semantic_kernel/connectors/ai/realtime_client_base.py

python/tests/unit/contents/test_audio_content.py

markwallace-microsoft · 2025-01-09T15:58:44Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
semantic_kernel/connectors/ai
chat_completion_client_base.py	127	2	98%	408, 418
function_calling_utils.py	51	10	80%	156–181
realtime_client_base.py	31	9	71%	12, 41, 60–62, 134, 141–142, 146
semantic_kernel/connectors/ai/open_ai/services
open_ai_realtime.py	30	10	67%	28–30, 72–84
semantic_kernel/connectors/ai/open_ai/services/realtime
open_ai_realtime_base.py	97	49	49%	65–108, 118–134, 138–142, 148, 155–181, 187, 191, 197, 201
open_ai_realtime_webrtc.py	170	131	23%	67–68, 72–156, 167–215, 220–227, 232–263, 271–279, 283–302
open_ai_realtime_websocket.py	114	80	30%	57–86, 90–179, 189–192, 197–200
utils.py	7	4	43%	20–26, 36
semantic_kernel/connectors/ai/utils
__init__.py	2	2	0%	3–5
realtime_helpers.py	129	129	0%	3–218
semantic_kernel/contents
audio_content.py	25	2	92%	81, 86
binary_content.py	106	9	92%	80, 119, 137–138, 179–183
function_call_content.py	106	2	98%	197, 225
streaming_chat_message_content.py	71	1	99%	227
semantic_kernel/contents/utils
data_uri.py	101	4	96%	44–45, 63, 128
TOTAL	17434	2213	87%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
3010	4 💤	0 ❌	0 🔥	1m 10s ⏱️

python/semantic_kernel/connectors/ai/open_ai/services/realtime/open_ai_realtime_base.py

moonbox3 · 2025-01-27T03:42:19Z

python/semantic_kernel/connectors/ai/open_ai/services/realtime/open_ai_realtime_websocket.py

+                    content = data["item"]
+                    for item in content.items:
+                        match item:
+                            case TextContent():


There looks to be similar logic for handling the .CONVERSATION_ITEM_CREATE SK types in both this class and the webrtc class. Would it be worth it to create a shared helper (maybe in OpenAIRealtimeBase?) or in a utils module to remove some duplication?

thought about that, but since websockets call methods in the OpenAI package in the next line, while webrtc sends dicts to the data channel it will mostly complicate typing etc.

moonbox3 · 2025-01-27T03:45:40Z

python/semantic_kernel/connectors/ai/open_ai/services/realtime/open_ai_realtime_base.py

+        pass
+
+    @override
+    async def start_sending(self, **kwargs: Any) -> None:


In the start_sending child classes: is there anything we'd need to add to better handle shutdowns? or when there is no more data to handle?

both sending and listening are loops that execute until they are stopped and both just react to things coming in, we could look at a bit nicer propagation of a session close, but for sending that would just mean the queue is empty, but it can be empty during the session, while the service keeps sending until we call close, so we don't then want to keep going either (I think)

moonbox3 · 2025-01-27T04:04:34Z

docs/decisions/00XX-realtime-api-clients.md

+  - developer judgement needs to be made (or exposed with parameters) on what is returned through the async generator and what is passed to the event handlers
+
+### 2. Event buffers/queues that are exposed to the developer, start sending and start receiving methods, that just initiate the sending and receiving of events and thereby the filling of the buffers
+This would mean that the there are two queues, one for sending and one for receiving, and the developer can listen to the receiving queue and send to the sending queue. Internal things like parsing events to content types and auto-function calling are processed first, and the result is put in the queue, the content type should use inner_content to capture the full event and these might add a message to the send queue as well.


Suggested change

This would mean that the there are two queues, one for sending and one for receiving, and the developer can listen to the receiving queue and send to the sending queue. Internal things like parsing events to content types and auto-function calling are processed first, and the result is put in the queue, the content type should use inner_content to capture the full event and these might add a message to the send queue as well.

This would mean that there are two queues, one for sending and one for receiving, and the developer can listen to the receiving queue and send to the sending queue. Internal things like parsing events to content types and auto-function calling are processed first, and the result is put in the queue, the content type should use inner_content to capture the full event and these might add a message to the send queue as well.

docs/decisions/00XX-realtime-api-clients.md

moonbox3 · 2025-01-27T04:25:19Z

docs/decisions/00XX-realtime-api-clients.md

+
+# Content and Events
+
+## Considered Options - Content and Events


Should we call out whether the “control” versus “content” distinction is a fundamental part of real-time interaction or just an implementation detail? For example, OpenAI distinguishes control events (input_audio_buffer.committed) from content events (conversation.item.create), while Google appears to treat everything as part of a unified content stream (BidiGenerateContent*).

This distinction might influence our decision in a few ways:

If the distinction is inherent to real-time systems, separating control from content may result in a cleaner, more flexible design.

However, if it’s just a specific quirk of OpenAI’s API, enforcing it could complicate support for providers like Google that don’t make the same distinction.

On the other hand, ignoring OpenAI’s finer-grained controls might limit the ability to fully utilize other features in the future.

I think it would make sense to call this out explicitly in the doc and could provide additional context for why we’re choosing one approach over the other.

eavanvalkenburg requested a review from a team as a code owner January 8, 2025 16:04

eavanvalkenburg marked this pull request as draft January 8, 2025 16:04

markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Jan 8, 2025

TaoChenOSU reviewed Jan 8, 2025

View reviewed changes

eavanvalkenburg force-pushed the realtime branch from f83002c to 1b2eaaf Compare January 9, 2025 15:55

markwallace-microsoft added the documentation label Jan 10, 2025

eavanvalkenburg force-pushed the realtime branch 3 times, most recently from 4667bdc to 20f5270 Compare January 22, 2025 12:13

eavanvalkenburg and others added 19 commits January 23, 2025 11:04

draft initial implementation of Realtime API

8e8a19c

major update

345ad45

updated note

7468709

reverted some changes

469eca6

WIP ADR

2d31f96

small updates

09c7209

webrtc WIP

17a5062

updated ADR

44665b8

webrtc working!

6dc5775

added dependency

5f19c56

added dep

5b43294

added nd

2a202b3

renamed

5d6fa80

changed import

93b5b5c

binary content fix

bc7356f

restructured

c626d66

fix import

757e0bf

small optimization in code

3f16996

updates to the ADR

2afa19f

eavanvalkenburg force-pushed the realtime branch from 20f5270 to 2afa19f Compare January 23, 2025 10:04

import improvements

4302741

moonbox3 reviewed Jan 27, 2025

View reviewed changes

eavanvalkenburg added 3 commits January 28, 2025 16:02

updated code and ADR

732fce4

wip on redoing the api

90a0db5

WIP

6d117be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: draft initial implementation of Realtime API #10127

Python: draft initial implementation of Realtime API #10127

eavanvalkenburg commented Jan 8, 2025

markwallace-microsoft commented Jan 9, 2025 •

edited

Loading

moonbox3 Jan 27, 2025

eavanvalkenburg Jan 27, 2025

moonbox3 Jan 27, 2025

eavanvalkenburg Jan 27, 2025

moonbox3 Jan 27, 2025

moonbox3 Jan 27, 2025


		# Content and Events

		## Considered Options - Content and Events

Python: draft initial implementation of Realtime API #10127

Are you sure you want to change the base?

Python: draft initial implementation of Realtime API #10127

Conversation

eavanvalkenburg commented Jan 8, 2025

Motivation and Context

Description

Contribution Checklist

markwallace-microsoft commented Jan 9, 2025 • edited Loading

Python Unit Test Overview

moonbox3 Jan 27, 2025

Choose a reason for hiding this comment

eavanvalkenburg Jan 27, 2025

Choose a reason for hiding this comment

moonbox3 Jan 27, 2025

Choose a reason for hiding this comment

eavanvalkenburg Jan 27, 2025

Choose a reason for hiding this comment

moonbox3 Jan 27, 2025

Choose a reason for hiding this comment

moonbox3 Jan 27, 2025

Choose a reason for hiding this comment

markwallace-microsoft commented Jan 9, 2025 •

edited

Loading