Skip to content

Updated message trim to handle message better and support tool calling.#28

Open
cableman wants to merge 1 commit into
os2ai:developfrom
AarhusAI:feature/message-trim-update
Open

Updated message trim to handle message better and support tool calling.#28
cableman wants to merge 1 commit into
os2ai:developfrom
AarhusAI:feature/message-trim-update

Conversation

@cableman
Copy link
Copy Markdown
Contributor

@cableman cableman commented May 6, 2026

See https://github.com/AarhusAI/aarhusai-docker/blob/main/guardrails/README.md for more information about the message trim guardrail.

New configuration knobs

  • default_max_context_tokens (default 8192) and max_context_tokens_by_model map -- context-window size is now resolved per-model, falling back through: per-model map --> litellm.get_max_tokens --> global default. Removes the silent 8192 fallback footgun on fleets with mixed 8k/32k/128k models.
  • pop_trailing_tool_messages (global) and pop_trailing_tool_messages_by_model -- opt-in for strict chat templates that reject tool-terminal conversations. Defaults to preserving trailing tool messages, since User --> Asst{tool_calls} --> Tool is the normal agent-loop shape.

New tool-calling repair logic

  • _repair_tool_call_pairings -- after trim_messages truncates history, drops orphan role: tool messages and strips orphan tool_calls from assistant messages so vLLM doesn't reject the conversation.
  • _ensure_last_is_user / _sanitize_messages -- guarantees the conversation doesn't end on an assistant message; appends {"role": "user", "content": "Please continue"} when needed. Sanitize is now run on every request, not just trimmed ones.

Behavior changes

  • Removed dependency on litellm.litellm_core_utils.prompt_templates.common_utils.get_completion_messages and the ensure_alternating_roles path replaced by the in-house sanitize.
  • Token recount now happens after sanitize (whether or not trimming occurred), and max_tokens is recalculated accordingly.
  • Debug log line updated from max(512, … * 0.90) to max(256, … * 0.75) (note: this is only the log string -- the actual calculation lives in _calculate_safe_completion_tokens).

@cableman cableman force-pushed the feature/message-trim-update branch from c0b3259 to 2061b20 Compare May 6, 2026 13:26
@lilosti lilosti requested review from hypesystem and lasseborly May 8, 2026 07:36
@lilosti
Copy link
Copy Markdown

lilosti commented May 8, 2026

Dette er et potentielt fiks til os2ai/Feedback#1.
Der er ændringer i Guardrail for at imødekomme brugen af tool calls.
Disse ændringer ændrer også på den generelle brug af guardrail, og forhåbentlig også det oprindelige issue.
Vi kan ikke genskabe det oprindelige issue på andet end Holstebros installation, og kan derfor ikke teste om det virker.

@SigneA-hm
Copy link
Copy Markdown

@lasseborly vil du lave code review? Vi kører den udenom den igangværende RC som et hot fix og ser om det kan være med til at løse Holstebros udfordring

Copy link
Copy Markdown

@lasseborly lasseborly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time wrapping my head around the actual logic and how it solves the apparent problems. Not that it does not, I am just very removed from the problem space.

# Context-window resolution: per-model map wins, then litellm's
# built-in get_max_tokens, then this global default.
self.default_max_context_tokens = default_config.get(
"default_max_context_tokens", 8192
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document inline the reason for this magic number?

)
data["max_tokens"] = safe_completion_tokens

def _repair_tool_call_pairings(self, messages: list) -> list:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut tells me that we could make this at two levels flatter by going negative on the if conditions and using continue a bit more. Makes for a bit easier read.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe make use of the "new" match case instead of the if-else statements.

data, safe_completion_tokens, has_max_tokens, has_max_completion
)
current_tokens = final_tokens
except Exception as e:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result.append(msg)
return result

def _ensure_last_is_user(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but are we even using this anywhere?

self._log_debug(f"Safe completion tokens: {safe_completion_tokens}")
self._log_debug(
f"Calculation: min({requested_completion}, max(512, ({max_context_tokens} - {int(current_tokens)} - {self.safety_buffer}) * 0.90))"
f"Calculation: min({requested_completion}, max(256, ({max_context_tokens} - {int(current_tokens)} - {self.safety_buffer}) * 0.75))"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this change? Why?

"Dropping assistant message with no content and all tool_calls orphaned"
)
continue
new_msg = dict(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msg.copy() better reflects intent.

if tcs:
kept_tcs = [tc for tc in tcs if tc.get("id") in satisfied_ids]
content = msg.get("content")
if not kept_tcs and not (content or "").strip():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content_empty = not (content or "").strip()
if not kept_tcs and content_empty:

)
self.max_context_tokens_by_model = default_config.get(
"max_context_tokens_by_model", {}
) or {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The or {} seems redundant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants