Updated message trim to handle message better and support tool calling. by cableman · Pull Request #28 · os2ai/helm-deployments

cableman · 2026-05-06T12:29:45Z

See https://github.com/AarhusAI/aarhusai-docker/blob/main/guardrails/README.md for more information about the message trim guardrail.

New configuration knobs

default_max_context_tokens (default 8192) and max_context_tokens_by_model map -- context-window size is now resolved per-model, falling back through: per-model map --> litellm.get_max_tokens --> global default. Removes the silent 8192 fallback footgun on fleets with mixed 8k/32k/128k models.
pop_trailing_tool_messages (global) and pop_trailing_tool_messages_by_model -- opt-in for strict chat templates that reject tool-terminal conversations. Defaults to preserving trailing tool messages, since User --> Asst{tool_calls} --> Tool is the normal agent-loop shape.

New tool-calling repair logic

_repair_tool_call_pairings -- after trim_messages truncates history, drops orphan role: tool messages and strips orphan tool_calls from assistant messages so vLLM doesn't reject the conversation.
_ensure_last_is_user / _sanitize_messages -- guarantees the conversation doesn't end on an assistant message; appends {"role": "user", "content": "Please continue"} when needed. Sanitize is now run on every request, not just trimmed ones.

Behavior changes

Removed dependency on litellm.litellm_core_utils.prompt_templates.common_utils.get_completion_messages and the ensure_alternating_roles path replaced by the in-house sanitize.
Token recount now happens after sanitize (whether or not trimming occurred), and max_tokens is recalculated accordingly.
Debug log line updated from max(512, … * 0.90) to max(256, … * 0.75) (note: this is only the log string -- the actual calculation lives in _calculate_safe_completion_tokens).

See https://github.com/AarhusAI/aarhusai-docker/blob/main/guardrails/README.md

lilosti · 2026-05-08T07:41:38Z

Dette er et potentielt fiks til os2ai/Feedback#1.
Der er ændringer i Guardrail for at imødekomme brugen af tool calls.
Disse ændringer ændrer også på den generelle brug af guardrail, og forhåbentlig også det oprindelige issue.
Vi kan ikke genskabe det oprindelige issue på andet end Holstebros installation, og kan derfor ikke teste om det virker.

SigneA-hm · 2026-05-08T11:14:00Z

@lasseborly vil du lave code review? Vi kører den udenom den igangværende RC som et hot fix og ser om det kan være med til at løse Holstebros udfordring

lasseborly

I have a hard time wrapping my head around the actual logic and how it solves the apparent problems. Not that it does not, I am just very removed from the problem space.

lasseborly · 2026-05-08T20:09:10Z

+            # Context-window resolution: per-model map wins, then litellm's
+            # built-in get_max_tokens, then this global default.
+            self.default_max_context_tokens = default_config.get(
+                "default_max_context_tokens", 8192


Can we document inline the reason for this magic number?

lasseborly · 2026-05-08T20:28:28Z

                )
                data["max_tokens"] = safe_completion_tokens

+        def _repair_tool_call_pairings(self, messages: list) -> list:


My gut tells me that we could make this at two levels flatter by going negative on the if conditions and using continue a bit more. Makes for a bit easier read.

Also, maybe make use of the "new" match case instead of the if-else statements.

lasseborly · 2026-05-08T20:34:30Z

+                        data, safe_completion_tokens, has_max_tokens, has_max_completion
+                    )
+                    current_tokens = final_tokens
+                except Exception as e:


We can get more specific here it seems: https://github.com/BerriAI/litellm/blob/144279eb57edb6cc0a97ad47c9da33b910f70dfa/litellm/litellm_core_utils/token_counter.py#L360

lasseborly · 2026-05-08T20:51:47Z

+                    result.append(msg)
+            return result
+
+        def _ensure_last_is_user(


I might be missing something, but are we even using this anywhere?

lasseborly · 2026-05-08T20:54:50Z

                self._log_debug(f"Safe completion tokens: {safe_completion_tokens}")
                self._log_debug(
-                    f"Calculation: min({requested_completion}, max(512, ({max_context_tokens} - {int(current_tokens)} - {self.safety_buffer}) * 0.90))"
+                    f"Calculation: min({requested_completion}, max(256, ({max_context_tokens} - {int(current_tokens)} - {self.safety_buffer}) * 0.75))"


I do not understand this change? Why?

lasseborly · 2026-05-08T21:02:04Z

+                                "Dropping assistant message with no content and all tool_calls orphaned"
+                            )
+                            continue
+                        new_msg = dict(msg)


msg.copy() better reflects intent.

lasseborly · 2026-05-08T21:03:55Z

+                    if tcs:
+                        kept_tcs = [tc for tc in tcs if tc.get("id") in satisfied_ids]
+                        content = msg.get("content")
+                        if not kept_tcs and not (content or "").strip():


content_empty = not (content or "").strip() if not kept_tcs and content_empty:

lasseborly · 2026-05-08T21:05:34Z

+            )
+            self.max_context_tokens_by_model = default_config.get(
+                "max_context_tokens_by_model", {}
+            ) or {}


The or {} seems redundant?

Updated message trim to handle message better and support tool calling.

2061b20

See https://github.com/AarhusAI/aarhusai-docker/blob/main/guardrails/README.md

cableman force-pushed the feature/message-trim-update branch from c0b3259 to 2061b20 Compare May 6, 2026 13:26

lilosti requested review from hypesystem and lasseborly May 8, 2026 07:36

lilosti mentioned this pull request May 8, 2026

OS2ai Mistral modellen "stopper" med at svare os2ai/Feedback#1

Open

lasseborly approved these changes May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated message trim to handle message better and support tool calling.#28

Updated message trim to handle message better and support tool calling.#28
cableman wants to merge 1 commit into
os2ai:developfrom
AarhusAI:feature/message-trim-update

cableman commented May 6, 2026

Uh oh!

lilosti commented May 8, 2026 •

edited

Loading

Uh oh!

SigneA-hm commented May 8, 2026

Uh oh!

lasseborly left a comment

Uh oh!

lasseborly May 8, 2026

Uh oh!

lasseborly May 8, 2026

Uh oh!

lasseborly May 8, 2026

Uh oh!

lasseborly May 8, 2026

Uh oh!

lasseborly May 8, 2026

Uh oh!

lasseborly May 8, 2026

Uh oh!

lasseborly May 8, 2026

Uh oh!

lasseborly May 8, 2026

Uh oh!

lasseborly May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cableman commented May 6, 2026

New configuration knobs

New tool-calling repair logic

Behavior changes

Uh oh!

lilosti commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SigneA-hm commented May 8, 2026

Uh oh!

lasseborly left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lilosti commented May 8, 2026 •

edited

Loading