-
Notifications
You must be signed in to change notification settings - Fork 1.9k
An implementation of ChatMemory based on token size control #3424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
An implementation of ChatMemory based on token size control #3424
Conversation
…ding window limit: TokenWindowChatMemory. This implementation limits the chat memory based on the total number of tokens. When the total token count exceeds the limit, the oldest messages are evicted. Messages are indivisible, meaning that if the token limit is exceeded, the entire oldest message is removed. Signed-off-by: Sun Yuhan <[email protected]>
Hi @ilayaperumalg @markpollack , what do you think of this? |
@sunyuhan1998:
|
Regarding the first point, I think in the current design of this PR, messages will not be truncated. If adding a message causes the token limit to be exceeded, that entire message will be removed. As for the second point, I believe the conversation context is the complete memory of a single interaction between us and the model. We all want the model to retain as much context as possible. Therefore, even if we only manage to keep an additional one response from the model (or one user question), the information contained in it might provide valuable reference material for generating the next response. Hence, I think that when the token limit is reached, we should remove a single message rather than an entire message pair. However, this PR has been submitted for some time and hasn't been addressed. I'm wondering whether perhaps it doesn't align with the expectations of the Spring AI team? |
@sunyuhan1998 : If the entire message, which may be the first and only message, is removed, user will get no response. This is not the expectation of the user, is it? |
I understand what you're saying, and indeed, this could be confusing for users. However, I'd like to clarify one point: ChatMemory is not designed to be a "chat log" or "chat history" per se. It simply represents the token context during your interaction with the LLM. Therefore, I believe when considering its functionality and setting its window size, we should align it as closely as possible with the LLM's context token limit. For an LLM, if the generated content exceeds the token limit, generally the model itself will truncate the response to ensure it stays within the allowed token count. |
Is there any update on this feature? Do we know when and if it is going to be merged? |
Fixes #3423
As #3423 mentioned, a ChatMemory that limits the window based on token size would be more practical. This PR provides such an implementation: TokenWindowChatMemory.
This implementation limits the chat memory based on the total number of tokens. When the total token count exceeds the limit, the oldest messages are evicted.
Messages are indivisible, meaning that if the token limit is exceeded, the entire oldest message is removed.