Python: Introduce the chat history reducer #10190

moonbox3 · 2025-01-15T10:40:07Z

Motivation and Context

The SK Python framework has been missing the ability to configure a chat history reducer of type ChatHistoryTruncationReducer and ChatHistorySummarizationReducer which have existed in the .Net SK Agent framework for some time.

The goal of this PR is to introduce the chat history reducers and allow them for use for not only the agent framework, but also anything else that uses a chat history (chat completion, for example). The ChatHistoryReducer extends the ChatHistory class, and so it's simple to include a reducer and logic to reduce messages as one manages the chat history either in an agent framework setting or in a chat completion setting.

Description

This PR:

Introduces the chat history reducer functionality in Python -- both the ChatHistoryTruncationReducer and ChatHistorySummarizationReducer.
Add unit tests for code coverage.
Adds a sample Chat Completion History Reducer to show how to configure both reducers and what each parameter does.
Updates the Agent SelectionStrategy, KernelFunctionSelectionStrategy and KernelFunctionTermination strategy to use the reducer.
- Additionally updates the classes above to use a new select_agent abstract method so that one can define an initial agent to run in a particular scenario.
Removes the deprecated FunctionCallBehavior class, and removes some nasty circular dependencies that we had lurking in the code base for some time. This FunctionCallBehavior has been marked with a deprecation warning for 6+ months now. All samples and docs have moved over to use FunctionChoiceBehavior - developers using FunctionCallBehavior should have had enough time to switch.
Closes Python Agents: ChatHistoryReducer #7969

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

markwallace-microsoft · 2025-01-15T10:43:03Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
semantic_kernel/agents
agent.py	45	11	76%	48–59, 72–73
semantic_kernel/agents/chat_completion
chat_completion_agent.py	84	1	99%	81
semantic_kernel/agents/strategies/selection
kernel_function_selection_strategy.py	55	2	96%	62–63
selection_strategy.py	18	1	94%	41
semantic_kernel/agents/strategies/termination
kernel_function_termination_strategy.py	45	2	96%	54–55
semantic_kernel/connectors/ai
chat_completion_client_base.py	122	2	98%	391, 401
semantic_kernel/connectors/ai/anthropic/services
anthropic_chat_completion.py	162	8	95%	159, 165, 178, 184, 188, 245–247, 380
utils.py	45	3	93%	67, 100–103
semantic_kernel/connectors/ai/azure_ai_inference/services
azure_ai_inference_chat_completion.py	104	6	94%	110–113, 122, 144, 168
semantic_kernel/connectors/ai/bedrock/services
bedrock_chat_completion.py	135	14	90%	117, 139, 164, 168–171, 229, 247–266, 325
semantic_kernel/connectors/ai/bedrock/services/model_provider
utils.py	78	20	74%	70, 73, 104, 108–119, 136–154, 175–178
semantic_kernel/connectors/ai/google/google_ai/services
google_ai_chat_completion.py	118	4	97%	126, 153, 179, 181
semantic_kernel/connectors/ai/google/vertex_ai/services
vertex_ai_chat_completion.py	117	4	97%	123, 150, 176, 178
semantic_kernel/connectors/ai/mistral_ai/services
mistral_ai_chat_completion.py	120	38	68%	121–124, 134, 149–152, 167, 183–187, 202–210, 227–235, 248–261, 267, 276–280, 325–328
semantic_kernel/connectors/ai/ollama/services
ollama_chat_completion.py	137	34	75%	116, 141, 145–146, 156, 169, 186, 206–207, 211, 224–234, 245–247, 258–267, 279, 289–290, 312, 323–324, 350, 359–367
utils.py	47	28	40%	31, 46–54, 66–88, 100–104, 125–133
semantic_kernel/connectors/ai/open_ai/prompt_execution_settings
open_ai_prompt_execution_settings.py	82	1	99%	131
semantic_kernel/connectors/ai/open_ai/services
open_ai_chat_completion_base.py	124	7	94%	69, 79, 100, 120, 141, 177, 283
semantic_kernel/contents/history_reducer
chat_history_reducer_utils.py	36	2	94%	28, 75
chat_history_summarization_reducer.py	85	7	92%	104–106, 133–134, 142, 150, 179
chat_history_truncation_reducer.py	34	3	91%	52–55, 71
TOTAL	16802	1807	89%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
3005	4 💤	0 ❌	0 🔥	1m 11s ⏱️

eavanvalkenburg

Couple of questions raised, most importantly why this isn't part of the whole thing instead of part of agents!

python/semantic_kernel/agents/history/chat_history_truncation_reducer.py

python/semantic_kernel/agents/history/chat_history_summarization_reducer.py

python/semantic_kernel/agents/history/chat_history_reducer_extensions.py

python/samples/concepts/agents/chat_completion_history_reducer.py

python/semantic_kernel/agents/history/chat_history_summarization_reducer.py

…ndencies present in code.

TaoChenOSU · 2025-01-16T19:17:58Z

python/samples/concepts/agents/chat_completion_history_reducer.py

+                chat_history.messages.append(response)
+                print(f"# {response.role} - {response.name}: '{response.content}'")
+
+            index += 2


Why add 2 instead of 1? Is it because the message_count is the sum of user messages and assitant messages?

In the sample, the user inputs a number, the model responds in the next number (in Spanish, per the prompt), then we skip to the next...

user: 1
model: dos
user: 3
model: cuatro
...

Got it. A comment here will be nice. Or the expected output of the sample.

TaoChenOSU · 2025-01-16T19:19:08Z

python/samples/concepts/agents/chat_completion_history_reducer.py

+
+            # If history was reduced, print summaries
+            if is_reduced:
+                self._print_summaries_from_front(chat_history.messages)


If the reducer is an instance of ChatHistoryTruncationReducer, this would not print anything looks like,

The _print_summaries_from_xxx looks for the __summary__ attribute on the metadata of the messages. When it is the ChatHistoryTruncationReducer, it doesn't place the attribute in the metadata, so the function won't print anything.

python/samples/concepts/chat_completion/simple_chatbot_with_truncation_history_reducer.py

TaoChenOSU · 2025-01-16T19:25:47Z

python/semantic_kernel/agents/strategies/selection/selection_strategy.py


-    @abstractmethod
-    async def next(self, agents: list["Agent"], history: list["ChatMessageContent"]) -> "Agent":


This is a breaking change. Maybe we should deprecate it first?

Agent framework is experimental, breaking changes can occur.

This is also to align with .Net, which made the change months ago. We never did, so it's best to align now.

@TaoChenOSU, btw, why do you consider this a breaking change? The agent group chat still calls into next which exists as a concrete method in that base class. From there, it calls an overridden select_agent method to select the agent.

If someone has a subclass off of this for a custom selection strategy, then their implementation will break.

python/semantic_kernel/agents/strategies/selection/sequential_selection_strategy.py

python/semantic_kernel/contents/history_reducer/chat_history_reducer.py

TaoChenOSU · 2025-01-16T19:56:25Z

python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py

+        chat_history = ChatHistory(messages=messages)
+        chat_history.add_system_message(self.summarization_instructions)
+
+        settings = self.service.get_prompt_execution_settings_class()(service_id=self.service_id)


Question: why is the service_id required in this case?

We need to tie the execution settings to an AI service that was registered on the kernel. Passing in the service_id, which will either be DEFAULT or a specified value allows us to do that.

Since you're calling get_chat_message_content on the service directly, is there a need to specify the service id? the kernel doesn't play any role here.

python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py

TaoChenOSU · 2025-01-16T20:19:19Z

python/semantic_kernel/contents/history_reducer/chat_history_reducer_utils.py

+    message_index = total_count - target_count
+
+    # Move backward to avoid cutting through function call/results
+    while message_index >= offset_count:


Do we need to handle cases where the offset_count lands between a function call content and a function result content?

That is the purpose of this loop so that we don't separate the two content types.

Oh no, what I meant was cases where offset_count sits between a function call and a function result.
Say this is the history older to newer:
0: system message, 1: user message, 2: assistant message with function call, 3: assistant message with function result, ...
when offset_count is 3, the minimum value of message_index is also 3, which will cut right between the two assistant messages.

TaoChenOSU · 2025-01-16T20:40:12Z

python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py

+        older_range_start = 0 if self.use_single_summary else insertion_point
+        older_range_end = truncation_index
+
+        messages_to_summarize = extract_range(


Question:
Say my chat history currently has 11 messages, and my target count, threshold count, and offset count are 5, 5, and 1, respectively.

The insertion point will be 11, and the truncation index will be 6 assuming there is no function call and user message in the threshold window. Then the range will be [11, 6), which will create an empty list of messages, while we are trying to summarize messages in [1, 6]. Am I missing something?

This edge case could occur if all messages in the history are marked with the SUMMARY_METADATA_KEY. This is the logic that exists in .Net. Let me look at improving it in Python. I see two ways that could work:

We mark only the newest summary method:

# For each message in the history except the newly created summary: for i, msg in enumerate(self.messages[:-1]): if SUMMARY_METADATA_KEY in msg.metadata: del msg.metadata[SUMMARY_METADATA_KEY]

We can add a safe-guard before we summarize:

insertion_point = locate_summarization_boundary(history) if insertion_point == len(history): # fallback fix: force boundary to something reasonable insertion_point = 0

I've been thinking about this more and I'm not super satisfied with these options. Since this appears to be an unlikely edge case, I do want to move forward with this summarizer and later iterate on it to improve the scenario. I have some thoughts but we should discuss them outside of this PR as improvements. I will create a work item to track.

Is what I described an edge case? Seems like it going to be common in agent group chat where all the messages in the threshold window are assistant messages without function content.

Do we have test cases to verify?

python/semantic_kernel/contents/history_reducer/chat_history_reducer.py

python/semantic_kernel/contents/history_reducer/chat_history_truncation_reducer.py

eavanvalkenburg · 2025-01-17T07:59:01Z

python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py

+        logger.info("Performing chat history summarization check...")
+
+        # 1. Identify where existing summary messages end
+        insertion_point = locate_summarization_boundary(history)


shouldn't we always skip a SYSTEM/DEVELOPER message if it's the first one?

python/semantic_kernel/contents/history_reducer/chat_history_reducer_utils.py

eavanvalkenburg · 2025-01-17T08:03:27Z

python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py

+            self.threshold_count,
+            offset_count=insertion_point,
+        )
+        if truncation_index < 0:


Suggested change

if truncation_index < 0:

if truncation_index is None:

python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py

eavanvalkenburg · 2025-01-17T08:09:30Z

python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py

+    async def _summarize(self, messages: list[ChatMessageContent]) -> ChatMessageContent | None:
+        """Use the ChatCompletion service to generate a single summary message."""
+        chat_history = ChatHistory(messages=messages)
+        chat_history.add_system_message(self.summarization_instructions)


Given o1 not having System, should this be configurable?

Good point. Let me see.

eavanvalkenburg · 2025-01-17T08:12:44Z

python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py

+        older_range_start = 0 if self.use_single_summary else insertion_point
+        older_range_end = truncation_index
+
+        messages_to_summarize = extract_range(


Another question here is, why are function call and its output not used for summaries? They might contain relevant background that was not used in the directly following response, but it might have been used in subsequent responses for which it is then missing context? (think multiple sequential FCC's and the summary landing in the middle)

moonbox3 added 5 commits January 13, 2025 19:48

wip: chat history reducer

84fabe0

Merge branch 'main' into py-chat-history-reduce

8d9102f

Implement agent chat history reducer. Add unit tests. Add sample.

a5b712e

Update readme

14ab2f4

Merge branch 'main' into py-chat-history-reduce

98108bf

moonbox3 self-assigned this Jan 15, 2025

moonbox3 requested a review from a team as a code owner January 15, 2025 10:40

markwallace-microsoft added python Pull requests for the Python Semantic Kernel documentation labels Jan 15, 2025

moonbox3 added agents experimental Associated with an experimental feature labels Jan 15, 2025

eavanvalkenburg reviewed Jan 15, 2025

View reviewed changes

moonbox3 changed the title ~~Python: Introduce the agent chat history reducer~~ Python: Introduce the chat history reducer Jan 16, 2025

Move chat history reducer to extend chat history. Break circular depe…

6f132b9

…ndencies present in code.

TaoChenOSU reviewed Jan 16, 2025

View reviewed changes

moonbox3 added 5 commits January 17, 2025 12:44

Address PR feedback

6d4f617

Repair uv.lock

8564f23

Fix uv lock again

58f1218

Upgrade to uv 0.5.20 locally. Re sync uv lock

66abe04

Merge branch 'main' into py-chat-history-reduce

93f788f

eavanvalkenburg reviewed Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Introduce the chat history reducer #10190

Python: Introduce the chat history reducer #10190

moonbox3 commented Jan 15, 2025 •

edited

Loading

markwallace-microsoft commented Jan 15, 2025 •

edited

Loading

eavanvalkenburg left a comment

TaoChenOSU Jan 16, 2025

moonbox3 Jan 16, 2025

TaoChenOSU Jan 17, 2025 •

edited

Loading

TaoChenOSU Jan 16, 2025

moonbox3 Jan 16, 2025

TaoChenOSU Jan 17, 2025

TaoChenOSU Jan 16, 2025

moonbox3 Jan 16, 2025

moonbox3 Jan 16, 2025

moonbox3 Jan 17, 2025

TaoChenOSU Jan 17, 2025 •

edited

Loading

TaoChenOSU Jan 16, 2025

moonbox3 Jan 16, 2025

TaoChenOSU Jan 17, 2025 •

edited

Loading

TaoChenOSU Jan 16, 2025

moonbox3 Jan 16, 2025 •

edited

Loading

TaoChenOSU Jan 17, 2025

TaoChenOSU Jan 16, 2025

moonbox3 Jan 16, 2025

moonbox3 Jan 17, 2025

TaoChenOSU Jan 17, 2025

eavanvalkenburg Jan 17, 2025

eavanvalkenburg Jan 17, 2025

eavanvalkenburg Jan 17, 2025

moonbox3 Jan 17, 2025

eavanvalkenburg Jan 17, 2025


		@abstractmethod
		async def next(self, agents: list["Agent"], history: list["ChatMessageContent"]) -> "Agent":

Python: Introduce the chat history reducer #10190

Are you sure you want to change the base?

Python: Introduce the chat history reducer #10190

Conversation

moonbox3 commented Jan 15, 2025 • edited Loading

Motivation and Context

Description

Contribution Checklist

markwallace-microsoft commented Jan 15, 2025 • edited Loading

Python Unit Test Overview

eavanvalkenburg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaoChenOSU Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaoChenOSU Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaoChenOSU Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moonbox3 Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moonbox3 commented Jan 15, 2025 •

edited

Loading

markwallace-microsoft commented Jan 15, 2025 •

edited

Loading

TaoChenOSU Jan 17, 2025 •

edited

Loading

TaoChenOSU Jan 17, 2025 •

edited

Loading

TaoChenOSU Jan 17, 2025 •

edited

Loading

moonbox3 Jan 16, 2025 •

edited

Loading