Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Introduce the chat history reducer #10190

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

moonbox3
Copy link
Contributor

@moonbox3 moonbox3 commented Jan 15, 2025

Motivation and Context

The SK Python framework has been missing the ability to configure a chat history reducer of type ChatHistoryTruncationReducer and ChatHistorySummarizationReducer which have existed in the .Net SK Agent framework for some time.

The goal of this PR is to introduce the chat history reducers and allow them for use for not only the agent framework, but also anything else that uses a chat history (chat completion, for example). The ChatHistoryReducer extends the ChatHistory class, and so it's simple to include a reducer and logic to reduce messages as one manages the chat history either in an agent framework setting or in a chat completion setting.

Description

This PR:

  • Introduces the chat history reducer functionality in Python -- both the ChatHistoryTruncationReducer and ChatHistorySummarizationReducer.
  • Add unit tests for code coverage.
  • Adds a sample Chat Completion History Reducer to show how to configure both reducers and what each parameter does.
  • Updates the Agent SelectionStrategy, KernelFunctionSelectionStrategy and KernelFunctionTermination strategy to use the reducer.
    • Additionally updates the classes above to use a new select_agent abstract method so that one can define an initial agent to run in a particular scenario.
  • Removes the deprecated FunctionCallBehavior class, and removes some nasty circular dependencies that we had lurking in the code base for some time. This FunctionCallBehavior has been marked with a deprecation warning for 6+ months now. All samples and docs have moved over to use FunctionChoiceBehavior - developers using FunctionCallBehavior should have had enough time to switch.
  • Closes Python Agents: ChatHistoryReducer #7969

Contribution Checklist

@moonbox3 moonbox3 self-assigned this Jan 15, 2025
@moonbox3 moonbox3 requested a review from a team as a code owner January 15, 2025 10:40
@markwallace-microsoft markwallace-microsoft added python Pull requests for the Python Semantic Kernel documentation labels Jan 15, 2025
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Jan 15, 2025

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
semantic_kernel/agents
   agent.py451176%48–59, 72–73
semantic_kernel/agents/chat_completion
   chat_completion_agent.py84199%81
semantic_kernel/agents/strategies/selection
   kernel_function_selection_strategy.py55296%62–63
   selection_strategy.py18194%41
semantic_kernel/agents/strategies/termination
   kernel_function_termination_strategy.py45296%54–55
semantic_kernel/connectors/ai
   chat_completion_client_base.py122298%391, 401
semantic_kernel/connectors/ai/anthropic/services
   anthropic_chat_completion.py162895%159, 165, 178, 184, 188, 245–247, 380
   utils.py45393%67, 100–103
semantic_kernel/connectors/ai/azure_ai_inference/services
   azure_ai_inference_chat_completion.py104694%110–113, 122, 144, 168
semantic_kernel/connectors/ai/bedrock/services
   bedrock_chat_completion.py1351490%117, 139, 164, 168–171, 229, 247–266, 325
semantic_kernel/connectors/ai/bedrock/services/model_provider
   utils.py782074%70, 73, 104, 108–119, 136–154, 175–178
semantic_kernel/connectors/ai/google/google_ai/services
   google_ai_chat_completion.py118497%126, 153, 179, 181
semantic_kernel/connectors/ai/google/vertex_ai/services
   vertex_ai_chat_completion.py117497%123, 150, 176, 178
semantic_kernel/connectors/ai/mistral_ai/services
   mistral_ai_chat_completion.py1203868%121–124, 134, 149–152, 167, 183–187, 202–210, 227–235, 248–261, 267, 276–280, 325–328
semantic_kernel/connectors/ai/ollama/services
   ollama_chat_completion.py1373475%116, 141, 145–146, 156, 169, 186, 206–207, 211, 224–234, 245–247, 258–267, 279, 289–290, 312, 323–324, 350, 359–367
   utils.py472840%31, 46–54, 66–88, 100–104, 125–133
semantic_kernel/connectors/ai/open_ai/prompt_execution_settings
   open_ai_prompt_execution_settings.py82199%131
semantic_kernel/connectors/ai/open_ai/services
   open_ai_chat_completion_base.py124794%69, 79, 100, 120, 141, 177, 283
semantic_kernel/contents/history_reducer
   chat_history_reducer_utils.py36294%28, 75
   chat_history_summarization_reducer.py85792%104–106, 133–134, 142, 150, 179
   chat_history_truncation_reducer.py34391%52–55, 71
TOTAL16802180789% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3005 4 💤 0 ❌ 0 🔥 1m 11s ⏱️

@moonbox3 moonbox3 added agents experimental Associated with an experimental feature labels Jan 15, 2025
Copy link
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions raised, most importantly why this isn't part of the whole thing instead of part of agents!

@moonbox3 moonbox3 changed the title Python: Introduce the agent chat history reducer Python: Introduce the chat history reducer Jan 16, 2025
chat_history.messages.append(response)
print(f"# {response.role} - {response.name}: '{response.content}'")

index += 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add 2 instead of 1? Is it because the message_count is the sum of user messages and assitant messages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sample, the user inputs a number, the model responds in the next number (in Spanish, per the prompt), then we skip to the next...

user: 1
model: dos
user: 3
model: cuatro
...

Copy link
Contributor

@TaoChenOSU TaoChenOSU Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. A comment here will be nice. Or the expected output of the sample.


# If history was reduced, print summaries
if is_reduced:
self._print_summaries_from_front(chat_history.messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the reducer is an instance of ChatHistoryTruncationReducer, this would not print anything looks like,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _print_summaries_from_xxx looks for the __summary__ attribute on the metadata of the messages. When it is the ChatHistoryTruncationReducer, it doesn't place the attribute in the metadata, so the function won't print anything.


@abstractmethod
async def next(self, agents: list["Agent"], history: list["ChatMessageContent"]) -> "Agent":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change. Maybe we should deprecate it first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent framework is experimental, breaking changes can occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also to align with .Net, which made the change months ago. We never did, so it's best to align now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TaoChenOSU, btw, why do you consider this a breaking change? The agent group chat still calls into next which exists as a concrete method in that base class. From there, it calls an overridden select_agent method to select the agent.

Copy link
Contributor

@TaoChenOSU TaoChenOSU Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone has a subclass off of this for a custom selection strategy, then their implementation will break.

chat_history = ChatHistory(messages=messages)
chat_history.add_system_message(self.summarization_instructions)

settings = self.service.get_prompt_execution_settings_class()(service_id=self.service_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why is the service_id required in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to tie the execution settings to an AI service that was registered on the kernel. Passing in the service_id, which will either be DEFAULT or a specified value allows us to do that.

Copy link
Contributor

@TaoChenOSU TaoChenOSU Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're calling get_chat_message_content on the service directly, is there a need to specify the service id? the kernel doesn't play any role here.

message_index = total_count - target_count

# Move backward to avoid cutting through function call/results
while message_index >= offset_count:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle cases where the offset_count lands between a function call content and a function result content?

Copy link
Contributor Author

@moonbox3 moonbox3 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the purpose of this loop so that we don't separate the two content types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, what I meant was cases where offset_count sits between a function call and a function result.
Say this is the history older to newer:
0: system message, 1: user message, 2: assistant message with function call, 3: assistant message with function result, ...
when offset_count is 3, the minimum value of message_index is also 3, which will cut right between the two assistant messages.

older_range_start = 0 if self.use_single_summary else insertion_point
older_range_end = truncation_index

messages_to_summarize = extract_range(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:
Say my chat history currently has 11 messages, and my target count, threshold count, and offset count are 5, 5, and 1, respectively.

The insertion point will be 11, and the truncation index will be 6 assuming there is no function call and user message in the threshold window. Then the range will be [11, 6), which will create an empty list of messages, while we are trying to summarize messages in [1, 6]. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This edge case could occur if all messages in the history are marked with the SUMMARY_METADATA_KEY. This is the logic that exists in .Net. Let me look at improving it in Python. I see two ways that could work:

We mark only the newest summary method:

# For each message in the history except the newly created summary:
for i, msg in enumerate(self.messages[:-1]):
    if SUMMARY_METADATA_KEY in msg.metadata:
        del msg.metadata[SUMMARY_METADATA_KEY]

We can add a safe-guard before we summarize:

insertion_point = locate_summarization_boundary(history)
if insertion_point == len(history):
    # fallback fix: force boundary to something reasonable
    insertion_point = 0  

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this more and I'm not super satisfied with these options. Since this appears to be an unlikely edge case, I do want to move forward with this summarizer and later iterate on it to improve the scenario. I have some thoughts but we should discuss them outside of this PR as improvements. I will create a work item to track.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is what I described an edge case? Seems like it going to be common in agent group chat where all the messages in the threshold window are assistant messages without function content.

Do we have test cases to verify?

logger.info("Performing chat history summarization check...")

# 1. Identify where existing summary messages end
insertion_point = locate_summarization_boundary(history)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we always skip a SYSTEM/DEVELOPER message if it's the first one?

self.threshold_count,
offset_count=insertion_point,
)
if truncation_index < 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if truncation_index < 0:
if truncation_index is None:

async def _summarize(self, messages: list[ChatMessageContent]) -> ChatMessageContent | None:
"""Use the ChatCompletion service to generate a single summary message."""
chat_history = ChatHistory(messages=messages)
chat_history.add_system_message(self.summarization_instructions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given o1 not having System, should this be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let me see.

older_range_start = 0 if self.use_single_summary else insertion_point
older_range_end = truncation_index

messages_to_summarize = extract_range(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question here is, why are function call and its output not used for summaries? They might contain relevant background that was not used in the directly following response, but it might have been used in subsequent responses for which it is then missing context? (think multiple sequential FCC's and the summary landing in the middle)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agents documentation experimental Associated with an experimental feature python Pull requests for the Python Semantic Kernel
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Python Agents: ChatHistoryReducer
4 participants