[safeoutputs] submit_pull_request_review: live pull_request_number warning is insufficient — agents now substitute item_number

### Summary

Analysis of the last 24 hours (2026-06-02, 60 runs) found **5 schema errors**, all on the `submit_pull_request_review` safe-output tool, across **2 reviewer workflows**. The previously-filed fix is now fully deployed and live — yet the error persists and has **mutated into a new variant**. This is a tool-description problem (the reviewer prompts are correct), and it is **not** a duplicate of the closed #35579 (which added the warning) or #36000 (which fixed the deployment gap).

The key new evidence: the warning currently names a single forbidden field (`pull_request_number`). Agents either ignore it or route around it to the next plausible identifier (`item_number`). Blocking one field name without enumerating all of them just shifts the mistake.

### What changed since the last report

- The 2026-05-30 deployment gap (#36000, now closed) is **resolved**: `actions/setup/js/safe_outputs_tools.json` is in sync with `pkg/workflow/js/safe_outputs_tools.json`.
- The warning *"do NOT pass pull_request_number ... this tool will silently strip it"* is **verified live** in today's agent prompts (present in `workflow-logs/3_agent.txt` of the affected runs).
- Despite that, `submit_pull_request_review` still received a stray PR-targeting field 5 times today.

<details>
<summary>🔍 Error analysis details</summary>

#### Variant 1 — `pull_request_number` (known field, persists despite the live warning)

**Occurrences**: 3 — workflow `Matt Pocock Skills Reviewer` (copilot)

| Run | PR |
|---|---|
| [§26849261648](https://github.com/github/gh-aw/actions/runs/26849261648) | #36527 |
| [§26842959523](https://github.com/github/gh-aw/actions/runs/26842959523) | #36507 |
| [§26844205505](https://github.com/github/gh-aw/actions/runs/26844205505) | #36506 |

The agent passes `pull_request_number` on its inner `create_pull_request_review_comment` payloads — which is **valid** for that tool — then carries the same field onto the final `submit_pull_request_review` payload, where it is silently stripped:

```json
{ "pull_request_number": 36527, "event": "COMMENT", "body": "### Skills-Based Review ..." }
```

#### Variant 2 — `item_number` (NEW substitution, not covered by the warning)

**Occurrences**: 2 — workflow `Test Quality Sentinel` (copilot)

| Run | PR | Result |
|---|---|---|
| [§26849261660](https://github.com/github/gh-aw/actions/runs/26849261660) | #36527 | APPROVE — stripped, "PR review submitted as APPROVE", exit 0 |
| [§26844204022](https://github.com/github/gh-aw/actions/runs/26844204022) | #36506 | APPROVE |

Actual invocation (run 26849261660):

```
safeoutputs submit_pull_request_review --item_number 36527 --event APPROVE --body "✅ Test Quality Sentinel: 70/100 ..."
```

The agent **correctly avoided** `pull_request_number` (it heeded the warning) but substituted `item_number` — the field used by `add_comment` / `update_issue`. `item_number` is not in the `submit_pull_request_review` schema, so it is silently stripped too. The review still auto-targets the triggering PR and posts, so no error surfaces.

</details>

### Current tool description

<details>
<summary>submit_pull_request_review description (pkg/ and actions/setup/js — identical)</summary>

```
Submit a pull request review with a status decision. This tool auto-targets the pull request that triggered the workflow — do NOT pass pull_request_number (unlike create_pull_request_review_comment and reply_to_pull_request_review_comment, which accept it; this tool will silently strip it). REQUIRED: ...
```

inputSchema accepts only: `body`, `event`, `secrecy`, `integrity` (`additionalProperties: false`).

</details>

### Root cause

1. The prohibition names exactly one field (`pull_request_number`). It does not tell the agent that **no** targeting parameter is accepted, so the agent reaches for the next plausible one (`item_number`).
2. The prohibition is buried mid-sentence after the auto-target clause, easy to overlook — `pull_request_number` still recurs even when present.
3. Sibling tools (`create_pull_request_review_comment`, `reply_to_pull_request_review_comment`, `add_comment`) all accept a PR/item identifier, so parity assumption is strong; agents also carry the field over from their valid review-comment payloads.

### Recommended description improvements

In `pkg/workflow/js/safe_outputs_tools.json` (and propagate to the runtime `actions/setup/js/safe_outputs_tools.json` copy in the same change — that copy is what runs):

1. **Lead with the prohibition and generalize it to all identifiers.** Suggested opening:
   > "Submit a pull request review. This tool ALWAYS auto-targets the pull request that triggered the workflow and accepts NO targeting parameter — do not pass `pull_request_number`, `item_number`, `pr_number`, or `issue_number`; any such field is silently stripped. Only `body` and `event` are accepted (plus optional `secrecy`/`integrity`)."
2. **Explicitly contrast the sibling tools** so the parity assumption is addressed: name that `create_pull_request_review_comment` / `reply_to_pull_request_review_comment` DO accept `pull_request_number`, but this tool does not.
3. Keep the existing REQUIRED-body / inline-comment guidance.

### Secondary observation (out of scope for description-only fix)

The validator **silently strips** the unknown field rather than returning `ERR_VALIDATION`. Because the review still posts, the agent never receives corrective feedback, which is why a deployed description fix has now twice failed to stop the pattern. If description changes alone do not move the needle, consider surfacing a soft validation warning when a known-stray identifier is stripped from `submit_pull_request_review`. (Validator change, not a description change — noted for whoever picks this up.)

### Affected workflows

- `Matt Pocock Skills Reviewer` — 3 (`pull_request_number`)
- `Test Quality Sentinel` — 2 (`item_number`)

All other write safe-outputs in the window were schema-clean (`add_comment` with `item_number`/`pr_number`, `create_issue`, `create_pull_request_review_comment` with valid `pull_request_number`, smoke-test outputs).

### Implementation checklist

- [ ] Generalize and front-load the `submit_pull_request_review` prohibition in `pkg/workflow/js/safe_outputs_tools.json`
- [ ] Propagate the identical description to `actions/setup/js/safe_outputs_tools.json` in the same PR
- [ ] Confirm `TestSafeOutputsToolsJSONInSync` compares full description content (added per #36000) so the two copies cannot drift again
- [ ] `make build` / `make recompile` / `make test`
- [ ] Monitor reviewer-workflow runs for 2–3 days for recurrence of either variant

**References:** [§26849261660](https://github.com/github/gh-aw/actions/runs/26849261660), [§26849261648](https://github.com/github/gh-aw/actions/runs/26849261648), [§26844204022](https://github.com/github/gh-aw/actions/runs/26844204022) · related: #35579, #36000




> Generated by [⚡ Daily Safe Output Tool Optimizer](https://github.com/github/gh-aw/actions/runs/26849880526) · opus48 3.1M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-safe-output-optimizer%22&type=issues)
> - [x] expires  on Jun 4, 2026, 9:57 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[safeoutputs] submit_pull_request_review: live pull_request_number warning is insufficient — agents now substitute item_number #36539

Summary

What changed since the last report

Variant 1 — `pull_request_number` (known field, persists despite the live warning)

Variant 2 — `item_number` (NEW substitution, not covered by the warning)

Current tool description

Root cause

Recommended description improvements

Secondary observation (out of scope for description-only fix)

Affected workflows

Implementation checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Run	PR	Result
§26849261660	#36527	APPROVE — stripped, "PR review submitted as APPROVE", exit 0
§26844204022	#36506	APPROVE

[safeoutputs] submit_pull_request_review: live pull_request_number warning is insufficient — agents now substitute item_number #36539

Description

Summary

What changed since the last report

Variant 1 — pull_request_number (known field, persists despite the live warning)

Variant 2 — item_number (NEW substitution, not covered by the warning)

Current tool description

Root cause

Recommended description improvements

Secondary observation (out of scope for description-only fix)

Affected workflows

Implementation checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Variant 1 — `pull_request_number` (known field, persists despite the live warning)

Variant 2 — `item_number` (NEW substitution, not covered by the warning)