An implementation of ChatMemory based on token size control #3424

sunyuhan1998 · 2025-06-03T11:12:38Z

Fixes #3423
As #3423 mentioned, a ChatMemory that limits the window based on token size would be more practical. This PR provides such an implementation: TokenWindowChatMemory.

This implementation limits the chat memory based on the total number of tokens. When the total token count exceeds the limit, the oldest messages are evicted.

Messages are indivisible, meaning that if the token limit is exceeded, the entire oldest message is removed.

…ding window limit: TokenWindowChatMemory. This implementation limits the chat memory based on the total number of tokens. When the total token count exceeds the limit, the oldest messages are evicted. Messages are indivisible, meaning that if the token limit is exceeded, the entire oldest message is removed. Signed-off-by: Sun Yuhan <[email protected]>

sunyuhan1998 · 2025-06-09T03:41:04Z

Hi @ilayaperumalg @markpollack , what do you think of this?

sp-yang · 2025-07-03T10:08:34Z

@sunyuhan1998:
I have 2 questions:

What should be done if the token size of one message exceeds the max token limit, e.g. different documents are returned and their content together is very large? Will the content of this single message be truncated to fit for the max token limit?
Should a message pair instead of only one message be removed if the context exceeds the token limit? For chat, it is usually question/answer pair, which should be removed together, right?

sunyuhan1998 · 2025-07-04T01:28:34Z

@sunyuhan1998: I have 2 questions:

What should be done if the token size of one message exceeds the max token limit, e.g. different documents are returned and their content together is very large? Will the content of this single message be truncated to fit for the max token limit?

Should a message pair instead of only one message be removed if the context exceeds the token limit? For chat, it is usually question/answer pair, which should be removed together, right?

Regarding the first point, I think in the current design of this PR, messages will not be truncated. If adding a message causes the token limit to be exceeded, that entire message will be removed.

As for the second point, I believe the conversation context is the complete memory of a single interaction between us and the model. We all want the model to retain as much context as possible. Therefore, even if we only manage to keep an additional one response from the model (or one user question), the information contained in it might provide valuable reference material for generating the next response. Hence, I think that when the token limit is reached, we should remove a single message rather than an entire message pair.

However, this PR has been submitted for some time and hasn't been addressed. I'm wondering whether perhaps it doesn't align with the expectations of the Spring AI team?

sp-yang · 2025-07-10T11:08:00Z

@sunyuhan1998 : If the entire message, which may be the first and only message, is removed, user will get no response. This is not the expectation of the user, is it?

sunyuhan1998 · 2025-07-10T11:16:13Z

@sunyuhan1998 : If the entire message, which may be the first and only message, is removed, user will get no response. This is not the expectation of the user, is it?

I understand what you're saying, and indeed, this could be confusing for users. However, I'd like to clarify one point: ChatMemory is not designed to be a "chat log" or "chat history" per se. It simply represents the token context during your interaction with the LLM. Therefore, I believe when considering its functionality and setting its window size, we should align it as closely as possible with the LLM's context token limit. For an LLM, if the generated content exceeds the token limit, generally the model itself will truncate the response to ensure it stays within the allowed token count.

domipieg · 2025-09-02T12:37:03Z

Is there any update on this feature? Do we know when and if it is going to be merged?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

An implementation of ChatMemory based on token size control #3424

An implementation of ChatMemory based on token size control #3424

Uh oh!

sunyuhan1998 commented Jun 3, 2025 •

edited

Loading

Uh oh!

sunyuhan1998 commented Jun 9, 2025

Uh oh!

sp-yang commented Jul 3, 2025 •

edited

Loading

Uh oh!

sunyuhan1998 commented Jul 4, 2025

Uh oh!

sp-yang commented Jul 10, 2025

Uh oh!

sunyuhan1998 commented Jul 10, 2025

Uh oh!

domipieg commented Sep 2, 2025

Uh oh!

Uh oh!

An implementation of ChatMemory based on token size control #3424

Are you sure you want to change the base?

An implementation of ChatMemory based on token size control #3424

Uh oh!

Conversation

sunyuhan1998 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunyuhan1998 commented Jun 9, 2025

Uh oh!

sp-yang commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunyuhan1998 commented Jul 4, 2025

Uh oh!

sp-yang commented Jul 10, 2025

Uh oh!

sunyuhan1998 commented Jul 10, 2025

Uh oh!

domipieg commented Sep 2, 2025

Uh oh!

Uh oh!

sunyuhan1998 commented Jun 3, 2025 •

edited

Loading

sp-yang commented Jul 3, 2025 •

edited

Loading