Enforce per-type safe-output max count at MCP invocation time (MCE4)#40348
Conversation
Add session-scoped per-type operation counters inside createHandlers that enforce the user-configured max at tool-invocation time (the MCP-server half of MCE4 dual enforcement). Previously the count cap was only applied downstream in collect_ndjson_output.cjs / safe_output_processor.cjs, so the agent received no in-loop feedback and silently over-produced outputs that were then dropped. Changes: - operationCounts Map tracks how many items of each type have been appended during the current MCP session. - getExplicitMax(type) reads the user's explicit max: setting via getSafeOutputsToolConfig; returns null when no explicit limit is set or when max: -1 (unlimited), so only intentional limits are enforced. - enforcePerTypeMax(type) throws a JSON-RPC -32602 error with E002 code and actionable guidance when the configured limit has already been reached. - appendSafeOutputCounted(entry) wraps appendSafeOutput: checks the limit, delegates to the real append, then increments the counter only on success (mirrors the inlineReviewCommentCount pattern). - All appendSafeOutput(entry) call sites within createHandlers replaced with appendSafeOutputCounted(entry). Downstream enforcement in collect_ndjson_output.cjs and safe_output_processor.cjs is preserved unchanged as the processor-time defence-in-depth layer (MCE4). 10 new tests added covering: limit enforcement, key normalisation, addCommentHandler, independent per-type budgets, unlimited (-1), unconfigured types, error message content, failed-write counter isolation, and fresh counter per createHandlers() call. Closes #40311 Co-authored-by: dsyme <7204669+dsyme@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the safe-outputs MCP server handlers to enforce per-safe-output-type max limits at tool invocation time (in addition to the existing downstream processor-time enforcement), aiming to prevent agents from over-emitting operations that will later be truncated.
Changes:
- Add a session-scoped per-type operation counter in
createHandlers()and enforce a configuredmaxbefore appending each safe-output entry. - Introduce
getExplicitMax,enforcePerTypeMax, and anappendSafeOutputCountedwrapper, and replace internalappendSafeOutput(...)call sites to route through the new enforcement. - Add a new test suite covering per-type max enforcement behavior, key normalization, unlimited semantics, and counter behavior on write failures / new sessions.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/safe_outputs_handlers.cjs | Adds invocation-time per-type max enforcement and routes handler writes through a counted append wrapper. |
| actions/setup/js/safe_outputs_handlers.test.cjs | Adds tests for per-type max enforcement, including normalization, isolation across types, and counter behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 4
| function getExplicitMax(type) { | ||
| const toolConfig = getSafeOutputsToolConfig(config, type); | ||
| if (!toolConfig || typeof toolConfig !== "object") return null; | ||
| if (!("max" in toolConfig)) return null; | ||
| const maxVal = toolConfig.max; | ||
| if (maxVal === -1) return null; // -1 means unlimited | ||
| if (typeof maxVal === "number" && Number.isInteger(maxVal) && maxVal > 0) { | ||
| return maxVal; |
| * Return the explicitly user-configured max for a safe-output type, or null if not set / unlimited. | ||
| * Uses getSafeOutputsToolConfig for consistent key-normalisation (hyphens → underscores). | ||
| * Does NOT fall back to validation-config defaults: MCP-time enforcement is only | ||
| * applied when the user has explicitly set a limit; downstream enforcement covers defaults. | ||
| * Per Safe Outputs Specification MCE5: the same config source as the processor. | ||
| * @param {string} type - normalised safe-output type name (e.g. "add_comment") | ||
| * @returns {number | null} | ||
| */ | ||
| function getExplicitMax(type) { | ||
| const toolConfig = getSafeOutputsToolConfig(config, type); | ||
| if (!toolConfig || typeof toolConfig !== "object") return null; | ||
| if (!("max" in toolConfig)) return null; | ||
| const maxVal = toolConfig.max; | ||
| if (maxVal === -1) return null; // -1 means unlimited | ||
| if (typeof maxVal === "number" && Number.isInteger(maxVal) && maxVal > 0) { | ||
| return maxVal; | ||
| } | ||
| return null; |
| guidance: | ||
| `You have used all ${maxAllowed} ${type} operations for this run. ` + | ||
| `Further ${type} calls will be ignored. Prioritize the most important items ` + | ||
| `(e.g. consolidate multiple updates into one), or call noop. ` + |
| it("rejects immediately when max is 0 and config uses hyphen-keyed type (key normalisation)", () => { | ||
| // Ensure getSafeOutputsToolConfig's hyphen→underscore lookup works for max checks | ||
| const h = createHandlers(mockServer, mockAppendSafeOutput, { | ||
| "add-labels": { max: 1 }, | ||
| }); |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #40348 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (≤100 threshold; requires_adr_by_default_volume=false). |
|
✅ Test Quality Sentinel completed test quality analysis. |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd, /diagnose, and /zoom-out — requesting changes on 2 correctness issues and 4 secondary concerns.
📋 Key Themes & Highlights
Key Themes
- Misleading guidance text (line 256): The E002 error tells agents "Further calls will be ignored" but they actually throw — this is a correctness issue in the agent-facing error contract.
- Undocumented
max: 0behaviour (line 228):max: 0is silently treated as unlimited (falls through themaxVal > 0guard). No comment or test documents this intent, and it creates a potential MCE4/MCE5 consistency gap with downstream enforcement. - Resource leak in large-content path (line 333):
writeLargeContentToFileruns beforeappendSafeOutputCountedenforces the limit — an orphaned file is left on disk if the budget is already exhausted. - Misleading test description (line 2770): The second test says "when max is 0" but configures
max: 1.
Positive Highlights
- ✅ Clean, minimal implementation:
operationCountsMap + three well-scoped helpers, all insidecreateHandlers()scope. - ✅ Counter increment only happens after a successful write — the write-error isolation test confirms this correctly.
- ✅ Strong test suite: independent budgets,
max: -1, empty config, fresh counters percreateHandlers()call are all covered. - ✅ All 13
appendSafeOutputcall sites were updated — mechanical but critical, and well-executed. - ✅
getExplicitMaxcorrectly defers togetSafeOutputsToolConfigfor hyphen→underscore normalisation, ensuring MCE5 config-source consistency.
Blocking Issues
- Guidance text —
"Further calls will be ignored"should be"Further calls will fail with this error"to accurately describe the thrown exception behaviour. - Large-content path ordering — The file write should happen after (or be guarded by) the enforcement check to avoid orphaned temp files.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 100.4 AIC · ⌖ 12.9 AIC · ⊞ 6.9K
| limit: maxAllowed, | ||
| guidance: | ||
| `You have used all ${maxAllowed} ${type} operations for this run. ` + | ||
| `Further ${type} calls will be ignored. Prioritize the most important items ` + |
There was a problem hiding this comment.
[/tdd] Guidance says "will be ignored" but calls actually throw — misleading error message.
The phrase Further ${type} calls will be ignored implies silent no-ops, but every subsequent call to a type-enforced handler also throws -32602. An agent reading this guidance and assuming its next call will be silently dropped will be surprised when it also fails.
💡 Suggested fix
Change:
`Further ${type} calls will be ignored. Prioritize the most important items ` +To:
`Further ${type} calls will fail with this error. Prioritize the most important items ` +This accurately describes the runtime behavior and removes the risk of the agent misunderstanding the enforcement mode.
| if (!("max" in toolConfig)) return null; | ||
| const maxVal = toolConfig.max; | ||
| if (maxVal === -1) return null; // -1 means unlimited | ||
| if (typeof maxVal === "number" && Number.isInteger(maxVal) && maxVal > 0) { |
There was a problem hiding this comment.
[/tdd] max: 0 is silently treated as unlimited — undocumented edge case.
The maxVal > 0 guard causes getExplicitMax to return null for max: 0, which disables invocation-time enforcement for that type. This is surprising if a user sets max: 0 expecting to block a tool entirely. The downstream processor may handle it differently, creating an MCE4/MCE5 consistency gap.
💡 Suggested options
Option A — Document the intent with an inline comment:
// max: 0 is treated as unlimited here; downstream processor handles 0-limit separately
if (typeof maxVal === "number" && Number.isInteger(maxVal) && maxVal > 0) {Option B — Add a test that explicitly asserts max: 0 is treated as unlimited:
it("treats max: 0 as unlimited (no invocation-time enforcement)", () => {
const h = createHandlers(mockServer, mockAppendSafeOutput, { add_labels: { max: 0 } });
// Should NOT throw — downstream processor is the only guard for max: 0
for (let i = 0; i < 3; i++) {
expect(h.defaultHandler("add_labels")({ labels: ["ok"] })).not.toHaveProperty("isError");
}
});Either way, the current silent behavior should be intentional and documented.
| @@ -258,7 +332,7 @@ function createHandlers(server, appendSafeOutput, config = {}) { | |||
|
|
|||
| const fileInfo = writeLargeContentToFile(largeContent); | |||
There was a problem hiding this comment.
[/diagnose] File is written to disk before the enforcement check runs — orphaned file on limit hit.
When maybeHandleLargeContent is invoked, writeLargeContentToFile(largeContent) runs on line 333 before appendSafeOutputCounted(entry) on line 335 enforces the limit. If the per-type count is already at max, the file write succeeds, the enforcement throws, and the temporary file is left on disk with no corresponding safe-output entry.
💡 Suggested fix
Enforce the limit before writing the file:
// Check limit before committing the expensive file write
if (entry?.type) enforcePerTypeMax(entry.type);
const fileInfo = writeLargeContentToFile(largeContent);
entry[largeFieldName] = `[Content too large, saved to file: ${fileInfo.filename}]`;
appendSafeOutputCounted(entry);Alternatively, call getExplicitMax and check the counter before entering the file-write path.
This also needs a regression test: a test that verifies the file is NOT written when the limit is already reached.
| expect(mockAppendSafeOutput).toHaveBeenCalledTimes(2); | ||
| }); | ||
|
|
||
| it("rejects immediately when max is 0 and config uses hyphen-keyed type (key normalisation)", () => { |
There was a problem hiding this comment.
[/tdd] Test description says "when max is 0" but the config uses max: 1 — misleading spec.
The test title references max: 0 which suggests a zero-limit test, but the body sets "add-labels": { max: 1 }. The test is actually verifying hyphen→underscore key normalisation with max: 1. A developer reading the description would expect this test to document max: 0 behaviour (which is currently silently unlimited — see related comment on getExplicitMax).
💡 Suggested fix
Rename the test to match what it actually asserts:
it("enforces max via hyphen-keyed config (key normalisation: hyphen→underscore)", () => {If you also want to cover max: 0, add a separate test that documents the explicit intent (e.g., max: 0 is treated as unlimited at invocation time).
| if (maxAllowed === null) return; // no explicit limit configured | ||
| const current = operationCounts.get(type) || 0; | ||
| if (current >= maxAllowed) { | ||
| throw { |
There was a problem hiding this comment.
[/zoom-out] Throwing a plain object instead of an Error instance — potential catch-block mismatch.
enforcePerTypeMax throws { code: -32602, message: ..., data: ... } — a plain object, not an Error. Any error handling middleware that checks err instanceof Error or accesses err.stack will silently fail to match. The existing inlineReviewCommentCount pattern (which this PR mirrors) should be checked for consistency.
💡 Context
This is likely intentional for the JSON-RPC protocol (the MCP framework converts thrown objects directly to JSON-RPC error responses). If so, a brief comment would confirm the intent:
// Throw a plain JSON-RPC error object (not Error) — the MCP framework
// catches this and returns it directly as the JSON-RPC error payload.
throw {
code: -32602,
...
};If the framework does rely on instanceof Error, a custom class (class SafeOutputLimitError extends Error) would be safer.
| * @param {Record<string, any>} entry | ||
| */ | ||
| const appendSafeOutputCounted = entry => { | ||
| const type = entry?.type; |
There was a problem hiding this comment.
[/tdd] Entries without a type field silently bypass enforcement and counting.
if (type) enforcePerTypeMax(type) means any entry where entry.type is falsy skips both the limit check and the counter increment. In the current handlers this is safe because all call sites set type, but there's no test confirming this assumption, and a future handler that forgets to set type would silently evade MCE4 enforcement.
💡 Suggested test or guard
Add a defensive assertion, or at least a test:
it("does not count or enforce when entry.type is missing", () => {
const h = createHandlers(mockServer, mockAppendSafeOutput, { add_labels: { max: 1 } });
// Direct call to appendSafeOutputCounted with no type (internal/unusual)
// Verify it appends without throwing and does not consume budget
// ... (requires exposing appendSafeOutputCounted or testing via a handler that passes a typeless entry)
});Alternatively, throw an internal error when type is missing to catch regressions early.
🧪 Test Quality Sentinel Report✅ Test Quality Score: 80/100 — Excellent
📊 Metrics & Test Classification (10 tests analyzed)
Go: 0 ( Score breakdown: Behavioral 40/40 · Edge/error 30/30 · Duplication 20/20 · Inflation −10 (2.19:1 ratio) i️ Test inflation noteThe ratio of 191 new test lines to 87 new production lines (2.19:1) marginally exceeds the 2:1 threshold, triggering the binary inflation penalty (−10 pts). In context this is acceptable: the production change introduces a non-trivial enforcement subsystem ( Verdict
References: §27842529116
|
There was a problem hiding this comment.
Two new issues found beyond existing review comments. One is a correctness bug that re-opens the silent-truncation hole this PR was written to close.
### Blocking issues
max: 0 bypasses MCP-time enforcement (HIGH) — getExplicitMax uses maxVal > 0, so max: 0 returns null (unlimited). If downstream treats max: 0 as "block all", the MCE4 dual enforcement gap is exactly the one described in the PR motivation: agents emit freely at MCP time and discover truncation only after the run. Fix: change > 0 → >= 0. The test named "rejects immediately when max is 0" (line 2770) actually uses max: 1, so there is zero CI coverage for this value.
appendSafeOutputCounted silently skips enforcement for typeless entries (MEDIUM) — both enforcePerTypeMax and the counter increment are gated on if (type). Missing entry.type silently bypasses all limits. All current callers set type, but the contract is invisible; a future handler can silently degrade enforcement with no test failure.
🔎 Code quality review by PR Code Quality Reviewer · 120.3 AIC · ⌖ 7.65 AIC · ⊞ 5.1K
| if (!("max" in toolConfig)) return null; | ||
| const maxVal = toolConfig.max; | ||
| if (maxVal === -1) return null; // -1 means unlimited | ||
| if (typeof maxVal === "number" && Number.isInteger(maxVal) && maxVal > 0) { |
There was a problem hiding this comment.
max: 0 silently falls through to unlimited — MCE4 dual enforcement is broken for this value. The maxVal > 0 guard means max: 0 returns null (no limit), so MCP-time allows unlimited calls while the downstream processor would block them all.
💡 Impact and fix
getExplicitMax is supposed to be the MCP-side mirror of downstream enforcement. If downstream treats max: 0 as "block all", the two layers are inconsistent: agents emit entries freely at MCP time and discover they are all truncated only after the run — the exact silent-truncation problem this PR was written to prevent.
There is also no regression test for this: the test named "rejects immediately when max is 0" (line 2770 in the test file) uses max: 1, not max: 0, so the bug would slip through CI undetected.
// current — max: 0 returns null (unlimited)
if (typeof maxVal === "number" && Number.isInteger(maxVal) && maxVal > 0) {
// fix — include 0 so max: 0 blocks all calls
if (typeof maxVal === "number" && Number.isInteger(maxVal) && maxVal >= 0) {Also add a dedicated test:
it("blocks all calls when max is 0", () => {
const h = createHandlers(mockServer, mockAppendSafeOutput, {
add_labels: { max: 0 },
});
expect(() => h.defaultHandler("add_labels")({ labels: ["x"] })).toThrow(
expect.objectContaining({ code: -32602, message: expect.stringContaining("E002") })
);
expect(mockAppendSafeOutput).not.toHaveBeenCalled();
});| * @param {Record<string, any>} entry | ||
| */ | ||
| const appendSafeOutputCounted = entry => { | ||
| const type = entry?.type; |
There was a problem hiding this comment.
appendSafeOutputCounted silently bypasses all enforcement when entry.type is falsy — limit checks and counter increments are both skipped, making the enforcement contract invisible to callers that forget to set type.
💡 Details and suggested hardening
Currently all callers set type (either via defaultHandler spreading {type: toolName} or explicitly in named handlers). But the function signature accepts any Record<string, any> and silently no-ops when type is missing. A future handler that calls appendSafeOutputCounted without first setting entry.type will bypass all per-type limits with no error or warning — a silent regression.
The minimal fix is a debug-level warning:
const appendSafeOutputCounted = entry => {
const type = entry?.type;
if (!type) {
// enforcement impossible without a type — log and fall through
server.debug("appendSafeOutputCounted: entry.type is missing; per-type max not enforced");
appendSafeOutput(entry);
return;
}
enforcePerTypeMax(type);
appendSafeOutput(entry);
operationCounts.set(type, (operationCounts.get(type) || 0) + 1);
};A stronger fix would be to throw:
if (!type) throw new Error("appendSafeOutputCounted: entry.type is required for limit enforcement");This would surface integration bugs immediately in tests rather than silently degrading enforcement in production.
|
``
|
|
``
|
…I failure Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in 2e7e62b — ran Prettier on |
Per-type
maxlimits (e.g.add-comment.max: 10) were only enforced downstream after the agent finished, meaning every MCP tool call returnedsuccessregardless of how many of that type had already been emitted. This violated MCE4 (Dual Enforcement) and caused silent truncation — agents would over-produce output, discover nothing at call time, and end up in inconsistent state (e.g. labels applied without companion comments).Changes
safe_outputs_handlers.cjsoperationCounts: Map<string, number>— session-scoped per-type counter, one percreateHandlers()instance.getExplicitMax(type)— reads the user's explicitmax:from workflow config viagetSafeOutputsToolConfig(handles hyphen/underscore key variants). Returnsnullfor unset ormax: -1(unlimited), so only intentional limits are enforced; no regression for unconfigured types.enforcePerTypeMax(type)— throws-32602withE002code and actionable guidance when the limit is already reached.appendSafeOutputCounted(entry)— wrapsappendSafeOutput: checks limit → appends → increments counter only on successful write (mirrors the existinginlineReviewCommentCountpattern).appendSafeOutput(entry)call sites insidecreateHandlersreplaced withappendSafeOutputCounted(entry).Downstream enforcement in
collect_ndjson_output.cjsandsafe_output_processor.cjsis unchanged — it remains as the processor-time defence-in-depth layer.Example error thrown on the (max+1)th call:
{ "code": -32602, "message": "E002: add_comment limit reached — 10 of 10 already used this run", "data": { "constraint": "max", "type": "add_comment", "limit": 10, "guidance": "You have used all 10 add_comment operations for this run. ..." } }safe_outputs_handlers.test.cjs— 10 new tests covering: limit enforcement viadefaultHandler, hyphen→underscore key normalisation,addCommentHandler, independent per-type budgets,max: -1unlimited, unconfigured types, error message shape, write-error counter isolation, and fresh counters percreateHandlers()call.