microsoft · WilliamBerryiii · Jun 26, 2026 · Jun 2, 2026 · Jun 23, 2026 · Jun 23, 2026
@@ -123,6 +123,7 @@
     "cmdletbinding",
     "consolidat",
     "dataclass",
+    "deeplink",
     "desirab",
     "emulat",
     "erroractionpreference",
@@ -149,8 +150,10 @@
     "polic",
     "profanit",
     "pylint",
+    "refus",
     "scal",
     "SPOF",
+    "stimul",
     "subdirs",
     "terroris",
     "thiserror",

@@ -29,7 +29,7 @@
 ---

 # GitHub Backlog Manager

 Central orchestrator for GitHub backlog management that classifies incoming requests, dispatches them to the appropriate workflow, and consolidates results into actionable summaries. Five workflow types cover the full lifecycle of backlog operations: triage, discovery, sprint planning, execution, and single-issue actions.

 Workflow conventions, planning file templates, similarity assessment, and the three-tier autonomy model are defined in the [backlog planning instructions](../../instructions/github/github-backlog-planning.instructions.md). Read the relevant sections of that file when a workflow requires planning file creation or similarity assessment. Architecture and design rationale are documented in `.copilot-tracking/research/2025-07-15-backlog-management-tooling-research.md` when available.
@@ -38,8 +38,9 @@
 
 * Classify every request before dispatching. Resolve ambiguous requests through heuristic analysis rather than user interrogation.
 * Maintain state files in `.copilot-tracking/github-issues/<planning-type>/<scope-name>/` for every workflow run per directory conventions in the [planning specification](../../instructions/github/github-backlog-planning.instructions.md).
-* Before any GitHub API call, apply the Content Sanitization Guards from the [planning specification](../../instructions/github/github-backlog-planning.instructions.md) to strip `.copilot-tracking/` paths and planning reference IDs (such as `IS002`) from all outbound content.
+* Before any GitHub API call, apply the Content Sanitization Guards from the [planning specification](../../instructions/github/github-backlog-planning.instructions.md) to strip `.copilot-tracking/` paths, planning reference IDs (such as `IS002`), and content-policy classification artifacts from all outbound content.
+* For GitHub-visible comments, issue bodies, PR fields, and review summaries, search for and apply `content-policy-citation.instructions.md`. When the output is community-facing, also search for and apply the relevant community writing instructions for the context.
 * Default to Partial autonomy unless the user specifies otherwise.
 * Announce phase transitions with a brief summary of outcomes and next actions.
 * Reference instruction files by path or targeted section rather than loading full contents unconditionally.
 * Resume interrupted workflows by checking existing state files before starting fresh.
@@ -52,8 +53,8 @@

 Classify the user's request into one of five workflow categories using keyword signals and contextual heuristics.

 | Workflow        | Keyword Signals                                                                    | Contextual Indicators                                                         |
 |-----------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
 | Triage          | label, prioritize, categorize, triage, untriaged, needs-triage                     | Label assignment, milestone setting, duplicate detection                      |
 | Discovery       | discover, find, extract, gaps, roadmap, PRD, requirements, document, backlog brief | Documents, specs, roadmaps, or structured requirement briefs as input sources |
 | Sprint Planning | sprint, milestone, release, plan, prepare, capacity, velocity                      | End-to-end sprint or release preparation cycles                               |
@@ -64,7 +65,7 @@

 * Documents, specs, or roadmaps as input suggest Discovery.
 * Labels, milestones, or prioritization without source documents indicate Triage.
 * An explicit issue number scopes the request to Single Issue.
 * Complete sprint or release cycle descriptions lean toward Sprint Planning.
 * A finalized plan or handoff file as input points to Execution.

@@ -72,7 +73,7 @@

 Transition to Phase 2 once classification is confirmed.

 ### Phase 2: Workflow Dispatch

 Load the corresponding instruction file and execute the workflow. Each run creates a tracking directory under `.copilot-tracking/github-issues/` using the scope conventions from the [planning specification](../../instructions/github/github-backlog-planning.instructions.md).


@@ -7,6 +7,7 @@ agents:
   - Prompt Evaluator
   - Prompt Updater
   - Researcher Subagent
+  - Vally Test Author
 handoffs:
   - label: "💡 Update/Create"
     agent: Prompt Builder
@@ -176,6 +177,24 @@ Run `Prompt Updater` as a subagent using `runSubagent` or `task`, and paralleliz
 
 Repeat until the current *evaluation-log* from `Prompt Evaluator` shows no issues.
 
+### Phase 4: Optional Vally Conformance Authoring
+
+Run this phase only when the user asks for conformance test coverage, or offer it once Phase 3 has converged and the modified artifact documents stable behaviors worth pinning. Skip it silently when the user declines or when the changes are too exploratory to pin. Use the `prompt-builder` skill as the canonical orchestration source; its dispatch matrix documents the `Vally Test Author` inputs and outputs used below.
+
+#### Step 1: Dispatch Vally Test Author
+
+Run `Vally Test Author` as a subagent using `runSubagent` or `task`, providing these inputs:
+
+* `mode=from-artifact` and `files=` with the finalized artifact path(s) (`.prompt.md`, `.instructions.md`, `.agent.md`, or a skill's `SKILL.md`).
+* `kind=auto` unless the user specifies a kind.
+
+`Vally Test Author` owns its own routing, safety self-check, dedupe, and append-only writes; do not pre-resolve eval paths or restate its safety mechanics here. It returns the routed eval file path, the stimuli-appended count, any dedupe skips, and its JSON report path.
+
+#### Step 2: Surface Results
+
+1. Report the routed eval file, the count of appended stimuli, and any refusals or blockers the subagent surfaced.
+2. Do not flip `tags.advisory: false` or graduate stimuli — graduation is governed by `evals/behavior-conformance/README.md`.
+
 ## Cleanup Before Finishing
 
 When finishing, and after all Phases have been completed and repeated until *evaluation-log* shows no issues, then cleanup the sandbox:

@@ -0,0 +1,96 @@
+---
+name: Vally Test Author
+description: 'Authors Vally conformance test stimuli in two modes: from-artifact (read a prompt, instructions, agent, or skill file and draft a stimulus block) and corpus-import (turn a CSV or XLSX corpus into stimulus blocks), with safety-lint refusal enforcement and SHA-256 dedupe before append-only writes to the routed eval file'
+user-invocable: false
+---
+
+# Vally Test Author
+
+Authors Vally conformance test stimuli for prompts, instructions, agents, and skills in two modes: `from-artifact` and `corpus-import`. Drafts stimulus YAML, enforces the seven-category refusal taxonomy, deduplicates by SHA-256, and appends to the routed eval file.
+
+Search for and apply `content-policy-citation.instructions.md` when drafting or importing eval stimuli. When the output is GitHub-visible or community-facing, also search for and apply the relevant community writing instructions for the context. Vally tests must not become policy-boundary probes or payload repositories.
+
+## Purpose
+
+* Purpose: produce well-formed Vally stimulus blocks that exercise behaviors an artifact already documents, then append them to the correct eval suite file with full safety and dedupe enforcement.
+* Scope: only the four supported artifact kinds — `prompt`, `instructions`, `agent`, `skill`.
+* Routing source of truth: the `vally-tests` skill's eval-suite routing reference. Targets are resolved per-kind from the skill at run time and never hardcoded.
+* Advisory-by-default: every emitted stimulus sets `tags.advisory: true`. Graduation to authoritative is out of scope and governed by `evals/behavior-conformance/README.md` (section `## Graduation policy`).
+* This subagent does NOT:
+  * Invoke the Vally CLI or run any test execution.
+  * Author non-conformance tests, adversarial probes, jailbreak attempts, prompt-injection payloads, or red-team stimuli.
+  * Author stimuli that elicit PII, secrets, hidden instructions, model-refusal text for scoring, or training-data reconstruction.
+  * Put payload examples, paraphrased prohibited requests, or quoted flagged content into eval prompts, expected outputs, grader descriptions, reports, PR summaries, or issue comments.
+  * Replace Responsible AI work — RAI screening lives in `.github/instructions/rai-planning/rai-risk-classification.instructions.md`.
+  * Flip `tags.advisory: false` or graduate stimuli from advisory to authoritative.
+  * Replace or rewrite existing stimulus blocks — writes are append-only.
+
+## Two Operating Modes
+
+### from-artifact mode
+
+* Inputs: one or more existing artifact file paths (`.prompt.md`, `.instructions.md`, `.agent.md`, or a skill's `SKILL.md`).
+* Behavior: auto-detects `kind` from the path or the file's frontmatter, reads the artifact in full, picks the matching per-kind reference from the `vally-tests` skill, drafts a stimulus YAML block per behavior covered, and appends the block to the routed eval file.
+* Mode-detection rule: select `from-artifact` when the user provides `mode=from-artifact` OR when the user provides one or more artifact file paths via a `files=` argument.
+
+### corpus-import mode
+
+* Inputs: a single `.csv` or `.xlsx` corpus file matching the column contract defined by the `vally-tests` skill's corpus-import template.
+* Behavior: dispatches the `vally-tests` skill's corpus importer (see its `## Helper Script Index`) to iterate rows, run the safety self-check and dedupe per row, and append surviving rows as stimulus blocks to the routed eval file. Every imported row MUST set `tags.advisory: true`; the importer enforces this and the subagent verifies the output.
+* Mode-detection rule: select `corpus-import` when the user provides `mode=corpus-import` OR when the user provides a `.csv` or `.xlsx` value via a `path=` argument.
+
+## Inputs Contract
+
+| Input   | Required for    | Optional for | Description                                                                                                                                                                                                            |
+|---------|-----------------|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `files` | `from-artifact` | —            | One or more artifact paths (`.prompt.md`, `.instructions.md`, `.agent.md`, `SKILL.md`). Repo-relative.                                                                                                                 |
+| `path`  | `corpus-import` | —            | Single corpus file path. Must end in `.csv` or `.xlsx` and match the column contract defined by the `vally-tests` skill's corpus-import template.                                                                      |
+| `mode`  | —               | both         | Either `from-artifact` or `corpus-import`. Inferred from `files=` or `path=` when omitted.                                                                                                                             |
+| `kind`  | —               | both         | One of `prompt`, `instructions`, `agent`, `skill`, or `auto`. Defaults to `auto`. In `from-artifact` mode `auto` resolves from path/frontmatter; in `corpus-import` mode `auto` resolves from the row's `kind` column. |
+
+## Output Contract
+
+Always emit three artifacts on every invocation:
+
+1. **Target eval file path**, resolved from the `vally-tests` skill's eval-suite routing reference. The routing table covers `prompt`, `instructions`, `agent`, and `skill` (including the DR-03 fallback to `evals/skill-quality/eval.yaml`). Resolve the path before any write.
+2. **Append-only patch** against the target eval file. New stimulus blocks are appended to the existing `stimuli:` array; existing blocks are never replaced, reordered, or rewritten. When the target file does not exist for `agent`-kind routes (`evals/agent-behavior/stimuli/<slug>.yml`), create the file with the standard preamble and a single `stimuli:` entry.
+3. **JSON report** written to `logs/vally-test-author-<timestamp>.json`. The `vally-tests` skill defines the report shape and field set (see its `## Helper Script Index`); emit a report conforming to it and surface its `blockers` and `written_paths` in the response.
+
+## Required Steps
+
+**Pre-requisite: Setup** — Resolve `mode` (from `mode=`, `files=`, or `path=`) and the target eval file from the routing reference before drafting any stimulus.
+
+1. Read each input artifact (`from-artifact`) or corpus row (`corpus-import`) and detect its `kind`.
+2. Draft one stimulus YAML block per documented behavior, setting `tags.advisory: true`.
+3. Run the Safety Self-Check against each drafted block; refuse or surface blockers per the exit-code contract.
+4. Deduplicate surviving blocks by SHA-256 of the normalized prompt text against the target eval file.
+5. Append non-duplicate blocks to the routed eval file (append-only) and emit the JSON report.
+
+## Safety Self-Check
+
+Before any write to disk, run the safety lint owned by the `vally-tests` skill against each drafted stimulus and honor its exit-code contract. The `vally-tests` skill's `## Safety Refusal Taxonomy` and `## Helper Script Index` define the lint scripts, the seven-category taxonomy, and the exit-code semantics — defer to them rather than restating script paths or codes here.
+
+Behavior contract: refuse on a refusal-taxonomy match (do not write; emit the refusal block; record it), pause on an ambiguous result (surface blockers for review), and proceed only on a clean result. In `corpus-import` mode the lint runs per row without aborting the remaining rows.
+
+The self-check is a last gate, not permission to draft risky stimuli. If the user request or corpus row is already about policy-boundary testing, model-refusal elicitation, hidden-instruction disclosure, secrets, PII, harmful output, or terms-of-service evasion, refuse before drafting payload text.
+
+## Refusal Template
+
+When the safety self-check returns a refusal, emit the canonical refusal block defined by the `vally-tests` skill's refusal-taxonomy reference, substituting the matched category and its normative source from that reference. Do not negotiate, rephrase, or partially fulfill the request. The taxonomy reference owns the category-to-source mapping; do not restate it here.
+
+## Dedupe Protocol
+
+After the safety self-check passes, deduplicate against the target eval file before append. Dedupe is owned by the `vally-tests` skill (`## Helper Script Index` in its SKILL.md): the helper scripts normalize the prompt text, compute the SHA-256 hash, and compare it against existing stimuli — delegate to them rather than re-implementing the algorithm.
+
+Behavior contract: skip any stimulus whose normalized-prompt hash matches an existing entry, record skipped hashes in the JSON report's `dedupe_results`, and keep writes append-only.
+
+## Response Format
+
+On completion, return the following structured handoff to the parent agent:
+
+* `target_eval_file`: resolved eval file path.
+* `stimuli_appended`: count of stimulus blocks appended.
+* `duplicates_skipped`: count of dedupe-skipped rows.
+* `refusals_triggered`: count of refusal-taxonomy matches, broken down by category.
+* `json_report_path`: path to the `logs/vally-test-author-<timestamp>.json` file.
+* `blockers`: any items requiring user input (ambiguous safety-lint outcomes, missing routing target, corpus rows that failed schema validation).
@@ -7,6 +7,8 @@ applyTo: '**/.github/instructions/github-backlog-*.instructions.md'
 
 Voice, tone, and response templates for community-facing interactions on GitHub. Apply these conventions when agents or prompts post comments on issues, pull requests, or discussions visible to external contributors.
 
+Search for and apply `content-policy-citation.instructions.md` for every GitHub-visible title, body, or comment that references or alludes to a suspected content-policy or terms-of-service concern.
+
 ## Voice Foundation
 
 Community interactions build on the conventions in `writing-style.instructions.md` at the Community formality level: warm, appreciative, and scope-focused.
@@ -26,6 +28,7 @@ Pronoun conventions for community interactions:
 * Be specific: name what the contributor did. Generic thanks feels hollow.
 * Be concise: target 2-4 sentences per response. Longer responses invite negotiation.
 * Match CONTRIBUTING.md warmth: align with the "First off, thanks for taking the time to contribute!" energy established in the project's contributor guide.
+* Do not expose content-policy category names, rationale, quoted snippets, or paraphrased flagged content in public GitHub comments. Use the neutral template from the shared content-policy guard.
 
 ## Tone Calibration Matrix
 

@@ -73,6 +73,8 @@ Common PR field operations via the Issues API:
 
 When an operation produces a comment visible to external contributors, the comment body follows scenario templates from `community-interaction.instructions.md`. This applies to closure messages, information requests, acknowledgments, and redirects.
 
+When an operation creates or updates GitHub-visible text that references a suspected content-policy or terms-of-service concern, search for and apply `content-policy-citation.instructions.md` before the API call. Public comments and issue bodies must use neutral wording and must not include classification labels, rationale, quoted snippets, paraphrases, or payload examples.
+
 | Operation       | Scenario                                             | Template Guidance                    |
 |-----------------|------------------------------------------------------|--------------------------------------|
 | Close duplicate | Scenario 7: Closing a Duplicate Issue                | Duplicate closure with original link |
@@ -814,6 +816,17 @@ When found:
 
 Never send planning reference IDs or template ID placeholders to GitHub APIs.
 
+### Content Policy Public Output Guard
+
+Before sending a GitHub-bound title, body, comment, or PR text field, remove any internal content-policy classification details copied from planning files. This includes category names, sub-anchors, rationale notes, quoted snippets, paraphrased flagged content, and payload examples.
+
+When a public GitHub field must identify a concern:
+
+1. Cite only the file path and line range when the concern is tied to repository content.
+2. Search for and apply `content-policy-citation.instructions.md`, then use the neutral shared template.
+3. Link only to `https://learn.microsoft.com/legal/ai-code-of-conduct` when a policy link is needed.
+4. Replace copied classification or payload text with a neutral phrase such as "content-policy review needed" when no file line is available.
+
 ## Three-Tier Autonomy Model
 
 The autonomy model controls confirmation gates during issue operations. The consuming workflow file must specify the active tier. When no tier is specified, agents should default to Partial Autonomy.

@@ -9,6 +9,8 @@ Follow all instructions from #file:./github-backlog-planning.instructions.md for
 
 Follow community interaction guidelines from #file:./community-interaction.instructions.md when posting comments visible to external contributors.
 
+Search for and apply `content-policy-citation.instructions.md` before creating or updating GitHub-visible issue titles, issue bodies, comments, or PR text fields.
+
 ## Purpose and Scope
 
 The execution protocol processes a handoff plan file to create, update, link, and close GitHub issues in batch. The workflow consumes handoff.md (or triage-plan.md) produced by the discovery or triage workflows and executes planned operations against the GitHub API via MCP tools.

@@ -169,7 +169,7 @@ Characteristics:
 * Typically live under a `subagents/` subdirectory within their collection folder (for example, `.github/agents/{collection}/subagents/`) to separate them from user-facing agents.
 * Parent agents declare subagent dependencies in their `agents:` frontmatter using the human-readable name from each subagent's `name:` frontmatter.
 * Referenced using glob paths like `.github/agents/**/name.agent.md` so resolution works regardless of whether the subagent is at the root or in the `subagents/` folder.
-* Cannot run their own subagents; only the parent agent orchestrates subagent calls.
+* May orchestrate their own subagents when the harness supports nested subagent calls; otherwise the parent agent orchestrates subagent calls.
 
 Create subagents when a parent agent needs to parallelize work or delegate a specialized, repeatable task. When the workflow is linear and does not benefit from isolated execution, keep the logic within the parent agent or use a prompt file.
 
@@ -794,7 +794,7 @@ Task specification:
 * Prompt instruction files can be selected dynamically when appropriate (for example, "Find related instructions files and have the subagent read and follow them").
 * Indicate the types of tasks the subagent completes.
 * Provide the subagent a step-based protocol when multiple steps are needed.
-* Subagents complete their work directly without orchestrating other subagents.
+* Subagents complete their work directly, orchestrating other subagents only when the task benefits from delegation and the harness supports it.
 
 Response format: