feat(agents): add Vally evaluation agents and prompts by WilliamBerryiii · Pull Request #1834 · microsoft/hve-core

WilliamBerryiii · 2026-06-02T03:43:11Z

Pull Request

Description

Added the Vally-facing AI artifacts plus the runner and collection-tooling support behind them.

The content-policy-citation agent (content-policy-citation.agent.md) provides citation discretion rules for the CI agentic PR-review workflow.
The vally-test-author subagent (vally-test-author.agent.md) drives test authoring against the vally-tests skill, supporting from-artifact and corpus-import modes with a seven-category refusal taxonomy (including tos-violation).
Two companion prompts, evals-import.prompt.md and vally-test-write.prompt.md, expose import and authoring entry points.
The Vally runner (VallyRunner.psm1, Invoke-VallyEvals.ps1) gains tag-aware run plans, spec backlink counting, and -Tag filtering for Invoke-VallySpec.
Collection tooling (CollectionHelpers.psm1, Validate-Collections.ps1) gains strict-safe maturity propagation via Get-CollectionMaturityVocabulary, Get-CollectionMaturityRank, and Resolve-StrictSafeMaturity.

Related Issue(s)

Closes #1819

Type of Change

Select all that apply:

Code & Documentation:

Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality)
Breaking change (fix or feature causing existing functionality to change)
Documentation update

Infrastructure & Configuration:

AI Artifacts:

Reviewed contribution with prompt-builder agent and addressed all feedback
Copilot instructions (.github/instructions/*.instructions.md)
Copilot prompt (.github/prompts/*.prompt.md)
Copilot agent (.github/agents/*.agent.md)
Copilot skill (.github/skills/*/SKILL.md)

Other:

Script/automation (.ps1, .sh, .py)
Other (please describe):

Sample Prompts (for AI Artifact Contributions)

User Request: "Write Vally tests for the prompt-builder agent."

Execution Flow: The vally-test-write.prompt.md entry point dispatches the vally-test-author subagent, which loads the vally-tests skill, scaffolds stimuli and expectations, selects graders, and runs the safety linter.

Output Artifacts: Stimulus and expectation YAML files under the relevant eval corpus.

Success Indicators: Generated specs pass Test-EvalSpec.ps1 and the safety linter reports no findings.

Testing

Validated via npm run lint:all (exit 0), including npm run lint:frontmatter, npm run lint:ps, and npm run lint:ai-artifacts. PowerShell coverage added and passing via npm run test:ps:

CollectionHelpers.Tests.ps1 — strict-safe maturity vocabulary, rank, and resolution (+267 lines).
Invoke-VallyEvals.Tests.ps1 — tag-aware run-plan and backlink behavior (+187 lines), with a new stub-vally.ps1 fixture.
Lint-VallyTestSafety.Tests.ps1 — tos-violation refusal category coverage.

Checklist

Required Checks

Documentation is updated (if applicable)
Files follow existing naming conventions
Changes are backwards compatible (if applicable)
Tests added for new functionality (if applicable)

AI Artifact Contributions

Used /prompt-analyze to review contribution
Addressed all feedback from prompt-builder review
Verified contribution follows common standards and type-specific requirements

Required Automated Checks

Markdown linting: npm run lint:md
Spell checking: npm run spell-check
Frontmatter validation: npm run lint:frontmatter
Skill structure validation: npm run validate:skills
Link validation: npm run lint:md-links
PowerShell analysis: npm run lint:ps
Plugin freshness: npm run plugin:generate
Docusaurus tests: npm run docs:test

Security Considerations

This PR does not contain any sensitive or NDA information
Any new dependencies have been reviewed for security issues
Security-related scripts follow the principle of least privilege

Additional Notes

Seventh PR in the #1637 stack. Base branch: feat/1637-l5-corpora. New agent, subagent, and prompts at stable maturity. Collection registration and plugin regeneration land in a later PR in this stack (#1821).

- add vally-test-author subagent and content-policy-citation agent - add evals-import and vally-test-write prompts ✨ - Generated by Copilot

The base branch was changed.

- align vally-test-author subagent with canonical template sections and JSON report path - whitelist Vally Test Author in prompt-builder and allow nested subagent calls - decouple skill paths and unify JSON output path in evals-import and vally-test-write prompts - drop attribution suffix and set disable-model-invocation on content-policy-citation 🔧 - Generated by Copilot

github-actions · 2026-06-23T05:17:37Z

Eval Execution

✅ Status: Passed — no merge-blocking failures (31 advisory assertion failure(s) present)

Artifacts evaluated: 14
Specs run: 14
Assertions passed: 37
Assertions failed (blocking): 0
Assertions failed (advisory): 31
Failed specs (merge-blocking): 0

Artifact	Kind	Status	Specs	Passed	Failed (advisory)
`github-backlog-manager`	agent	⚠️ advisory-fail	1	2	2
`prompt-builder`	agent	⚠️ advisory-fail	1	3	1
`vally-test-author`	agent	⚠️ advisory-fail	1	2	5
`community-interaction`	instruction	⚠️ advisory-fail	1	0	4
`github-backlog-planning`	instruction	⚠️ advisory-fail	1	3	1
`github-backlog-update`	instruction	⚠️ advisory-fail	1	2	2
`prompt-builder`	instruction	⚠️ advisory-fail	1	3	1
`pull-request`	instruction	⚠️ advisory-fail	1	3	1
`content-policy-citation`	instruction	⚠️ advisory-fail	1	0	7
`evals-import`	prompt	⚠️ advisory-fail	1	3	1
`pull-request`	prompt	⚠️ advisory-fail	1	3	1
`vally-test-write`	prompt	⚠️ advisory-fail	1	3	1
`prompt-builder`	skill	⚠️ advisory-fail	1	3	1
`vally-tests`	skill	⚠️ advisory-fail	1	7	3

Legend — ✅ clean · ⚠️ advisory failures only (non-blocking) · ⏭️ skipped · ❌ merge-blocking failure

Only Failed specs (merge-blocking) gates this PR. Advisory assertion failures are signal-quality checks captured during iteration; review them, but they do not block merge and may be acceptable.

codecov-commenter · 2026-06-23T05:20:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.26%. Comparing base (aaef669) to head (7740baa).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1834      +/-   ##
==========================================
+ Coverage   81.25%   81.26%   +0.01%     
==========================================
  Files         127      127              
  Lines       18839    18850      +11     
  Branches       12       12              
==========================================
+ Hits        15308    15319      +11     
  Misses       3528     3528              
  Partials        3        3

Flag	Coverage Δ
docusaurus	`61.84% <ø> (ø)`
pester	`86.09% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
scripts/collections/Modules/CollectionHelpers.psm1	`99.52% <100.00%> (+0.56%)`	⬆️
scripts/collections/Validate-Collections.ps1	`93.88% <100.00%> (ø)`

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

🔧 - Generated by Copilot

- add Vally Test Author subagent with from-artifact and corpus-import modes - add vally-test-write and evals-import commands - enforce tos-violation safety lint coverage in vally-tests - update collections, plugins, and evals/collections scripts 🧪 - Generated by Copilot

- capture error message in a variable for clarity - conditionally call Write-CIAnnotation if the command exists 🔧 - Generated by Copilot

…tions - move content-policy-citation from agent to shared instructions - wire references across github backlog agents, prompts, and skills - regenerate collections and plugins; add vally-test-author stimulus ✨ - Generated by Copilot

- tally totals per spec to avoid double-counting and add Specs column - index only specs declaring a top-level stimuli key - add regression tests for summary totals and stimulus indexing 🧪 - Generated by Copilot

# Conflicts: # .github/instructions/hve-core/prompt-builder.instructions.md # evals/behavior-conformance/skill-behavior.eval.yaml

- re-import CIHelpers in Prepare-Extension tests so Write-CIAnnotation resolves - add deeplink/refus/stimul stems to cspell words - add community-interaction instruction eval stimulus for coverage 🔧 - Generated by Copilot

- wire Vally Test Author into prompt-builder skill and dispatch matrix - move routing, safety lint, dedupe, and report ownership into vally-tests skill - split eval failures into gating vs advisory and surface them in the PR comment ✅ - Generated by Copilot

📐 - Generated by Copilot

…re absent - reconcile unattributed failures by the spec's overall advisory posture - guard the exit-code fallback so all-advisory specs never block merge - parse quoted advisory tag values so a \\\\alse\\\\ graduates correctly - add stub fail-noname mode and a Pester case for the empty perStimulus path 🐛 - Generated by Copilot

… specs - demote per-trial failures to advisory when vally exits 0 (aggregate met threshold) - keeps real aggregate failures (exit non-zero) gating as before - add Pester coverage and run Integration-tagged eval tests 🐛 - Generated by Copilot

- implement TagFilter parameter for scoped advisory mapping - enhance logic to return filtered advisory stimuli based on tags - add unit tests for tag scoping behavior 🔍 - Generated by Copilot

…e-core into feat/1637-l6-agents

- allow loading of helper functions without running the main workflow 🔧 - Generated by Copilot

## Description Resolves a batch of documentation drift issues where prose docs and READMEs fell behind code, agent, and path changes from recent PRs. Each fix realigns stale references with the current source of truth: * **BRD/PRD output path move** (PR #2098): updated lifecycle, role, and planning docs to point at `docs/project-planning/` instead of the retired `docs/brds/` and `docs/prds/`. * **Eval CI behavior** (PRs #1834, #1949): documented shared-spec tag-aware run behavior, the missing eval npm commands, and the `mixed` stub mode plus sub-threshold `advisory-fail` status. * **Copyright tooling** (PR #2169): documented the new `CopyrightHeader.psm1` module and the `Test-CopyrightHeaders.ps1` `-Fix` switch and canonical 2026 header format. * **Collection helpers** (PR #1834): documented the new strict-safe maturity vocabulary functions in `CollectionHelpers.psm1`. * **rai-license-posture.instructions.md‎**: fix dead link and added rai-license-posture instruction conformance stimulus to `instructions.eval.yml` ## Related Issue(s) Closes #2179 Closes #2180 Closes #2181 Closes #2182 Closes #2183 Closes #2187 Closes #2188 Closes #2191 ## Type of Change Select all that apply: **Code & Documentation:** * [ ] Bug fix (non-breaking change fixing an issue) * [ ] New feature (non-breaking change adding functionality) * [ ] Breaking change (fix or feature causing existing functionality to change) * [x] Documentation update **Infrastructure & Configuration:** * [ ] GitHub Actions workflow * [ ] Linting configuration (markdown, PowerShell, etc.) * [ ] Security configuration * [ ] DevContainer configuration * [ ] Dependency update **AI Artifacts:** * [ ] Reviewed contribution with `prompt-builder` agent and addressed all feedback * [ ] Copilot instructions (`.github/instructions/*.instructions.md`) * [ ] Copilot prompt (`.github/prompts/*.prompt.md`) * [ ] Copilot agent (`.github/agents/*.agent.md`) * [ ] Copilot skill (`.github/skills/*/SKILL.md`) * [ ] Copilot hook (`.github/hooks/*/*.json`) * [x] Eval spec added/updated for changed AI artifacts (`evals/`) **Other:** * [ ] Script/automation (`.ps1`, `.sh`, `.py`) * [ ] Other (please describe): ## Testing  ## Checklist ### Required Checks * [x] Documentation is updated (if applicable) * [x] Files follow existing naming conventions * [x] Changes are backwards compatible (if applicable) * [ ] Tests added for new functionality (if applicable) (N/A — documentation-only) ### Required Automated Checks The following validation commands must pass before merging: * [x] Markdown linting: `npm run lint:md` * [x] Spell checking: `npm run spell-check` * [x] Frontmatter validation: `npm run lint:frontmatter` * [ ] Skill structure validation: `npm run validate:skills` * [x] Link validation: `npm run lint:md-links` * [ ] PowerShell analysis: `npm run lint:ps` * [ ] Plugin freshness: `npm run plugin:generate` * [ ] Docusaurus tests: `npm run docs:test` ## Security Considerations * [x] This PR does not contain any sensitive or NDA information * [ ] Any new dependencies have been reviewed for security issues (N/A — no dependency changes) * [ ] Security-related scripts follow the principle of least privilege (N/A — no security scripts modified) ## Additional Notes Documentation-only change. #2186 (DT Coach session-path rename) is deliberately excluded from this batch. --------- Co-authored-by: Bill Berry <WilliamBerryiii@users.noreply.github.com>

WilliamBerryiii requested a review from a team as a code owner June 2, 2026 03:43

rezatnoMsirhC previously approved these changes Jun 8, 2026

View reviewed changes

agreaves-ms reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/agents/hve-core/subagents/vally-test-author.agent.md Outdated

agreaves-ms reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/agents/content-policy-citation.agent.md Outdated

agreaves-ms reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/prompts/hve-core/evals-import.prompt.md Outdated

agreaves-ms reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/prompts/hve-core/vally-test-write.prompt.md Outdated

agreaves-ms previously approved these changes Jun 9, 2026

View reviewed changes

feat(agents): add Vally evaluation agents and prompts

4afaf12

- add vally-test-author subagent and content-policy-citation agent - add evals-import and vally-test-write prompts ✨ - Generated by Copilot

WilliamBerryiii force-pushed the feat/1637-l6-agents branch from d4901bf to 4afaf12 Compare June 23, 2026 02:17

WilliamBerryiii changed the base branch from feat/1637-l5-corpora to main June 23, 2026 02:17

WilliamBerryiii and others added 9 commits June 23, 2026 09:59

fix(agents): correct user-invokable typo to user-invocable in schema

d8304f0

🔧 - Generated by Copilot

Merge branch 'main' into feat/1637-l6-agents

ba2ee67

Merge branch 'main' into feat/1637-l6-agents

2ec7df8

fix(plugins): improve error handling in plugin generation process

90156b2

- capture error message in a variable for clarity - conditionally call Write-CIAnnotation if the command exists 🔧 - Generated by Copilot

fix(evals): count eval summary totals by unique spec runs

f246469

- tally totals per spec to avoid double-counting and add Specs column - index only specs declaring a top-level stimuli key - add regression tests for summary totals and stimulus indexing 🧪 - Generated by Copilot

Merge remote-tracking branch 'origin/main' into feat/1637-l6-agents

ffb84c2

# Conflicts: # .github/instructions/hve-core/prompt-builder.instructions.md # evals/behavior-conformance/skill-behavior.eval.yaml

fix(ci): resolve PR validation and eval build failures

97ef1fa

- re-import CIHelpers in Prepare-Extension tests so Write-CIAnnotation resolves - add deeplink/refus/stimul stems to cspell words - add community-interaction instruction eval stimulus for coverage 🔧 - Generated by Copilot