Skip to content

Conversation

sunyuhan1998
Copy link
Contributor

@sunyuhan1998 sunyuhan1998 commented Jun 3, 2025

Fixes #3423
As #3423 mentioned, a ChatMemory that limits the window based on token size would be more practical. This PR provides such an implementation: TokenWindowChatMemory.

This implementation limits the chat memory based on the total number of tokens. When the total token count exceeds the limit, the oldest messages are evicted.

Messages are indivisible, meaning that if the token limit is exceeded, the entire oldest message is removed.

…ding window limit: TokenWindowChatMemory.

This implementation limits the chat memory based on the total number of tokens. When the total token count exceeds the limit, the oldest messages are evicted.

Messages are indivisible, meaning that if the token limit is exceeded, the entire oldest message is removed.

Signed-off-by: Sun Yuhan <[email protected]>
@sunyuhan1998
Copy link
Contributor Author

Hi @ilayaperumalg @markpollack , what do you think of this?

@sp-yang
Copy link

sp-yang commented Jul 3, 2025

@sunyuhan1998:
I have 2 questions:

  1. What should be done if the token size of one message exceeds the max token limit, e.g. different documents are returned and their content together is very large? Will the content of this single message be truncated to fit for the max token limit?
  2. Should a message pair instead of only one message be removed if the context exceeds the token limit? For chat, it is usually question/answer pair, which should be removed together, right?

@sunyuhan1998
Copy link
Contributor Author

@sunyuhan1998: I have 2 questions:

  1. What should be done if the token size of one message exceeds the max token limit, e.g. different documents are returned and their content together is very large? Will the content of this single message be truncated to fit for the max token limit?
  2. Should a message pair instead of only one message be removed if the context exceeds the token limit? For chat, it is usually question/answer pair, which should be removed together, right?

Regarding the first point, I think in the current design of this PR, messages will not be truncated. If adding a message causes the token limit to be exceeded, that entire message will be removed.

As for the second point, I believe the conversation context is the complete memory of a single interaction between us and the model. We all want the model to retain as much context as possible. Therefore, even if we only manage to keep an additional one response from the model (or one user question), the information contained in it might provide valuable reference material for generating the next response. Hence, I think that when the token limit is reached, we should remove a single message rather than an entire message pair.

However, this PR has been submitted for some time and hasn't been addressed. I'm wondering whether perhaps it doesn't align with the expectations of the Spring AI team?

@sp-yang
Copy link

sp-yang commented Jul 10, 2025

@sunyuhan1998 : If the entire message, which may be the first and only message, is removed, user will get no response. This is not the expectation of the user, is it?

@sunyuhan1998
Copy link
Contributor Author

@sunyuhan1998 : If the entire message, which may be the first and only message, is removed, user will get no response. This is not the expectation of the user, is it?

I understand what you're saying, and indeed, this could be confusing for users. However, I'd like to clarify one point: ChatMemory is not designed to be a "chat log" or "chat history" per se. It simply represents the token context during your interaction with the LLM. Therefore, I believe when considering its functionality and setting its window size, we should align it as closely as possible with the LLM's context token limit. For an LLM, if the generated content exceeds the token limit, generally the model itself will truncate the response to ensure it stays within the allowed token count.

@domipieg
Copy link

domipieg commented Sep 2, 2025

Is there any update on this feature? Do we know when and if it is going to be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A ChatMemory implementation based on token size control is required.
3 participants