feat(agents): add Vally evaluation agents and prompts#1834
Merged
Conversation
rezatnoMsirhC
previously approved these changes
Jun 8, 2026
agreaves-ms
reviewed
Jun 9, 2026
agreaves-ms
reviewed
Jun 9, 2026
agreaves-ms
reviewed
Jun 9, 2026
agreaves-ms
reviewed
Jun 9, 2026
agreaves-ms
previously approved these changes
Jun 9, 2026
- add vally-test-author subagent and content-policy-citation agent - add evals-import and vally-test-write prompts ✨ - Generated by Copilot
d4901bf to
4afaf12
Compare
The base branch was changed.
- align vally-test-author subagent with canonical template sections and JSON report path - whitelist Vally Test Author in prompt-builder and allow nested subagent calls - decouple skill paths and unify JSON output path in evals-import and vally-test-write prompts - drop attribution suffix and set disable-model-invocation on content-policy-citation 🔧 - Generated by Copilot
Contributor
Eval Execution✅ Status: Passed — no merge-blocking failures (31 advisory assertion failure(s) present)
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1834 +/- ##
==========================================
+ Coverage 81.25% 81.26% +0.01%
==========================================
Files 127 127
Lines 18839 18850 +11
Branches 12 12
==========================================
+ Hits 15308 15319 +11
Misses 3528 3528
Partials 3 3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
🔧 - Generated by Copilot
- add Vally Test Author subagent with from-artifact and corpus-import modes - add vally-test-write and evals-import commands - enforce tos-violation safety lint coverage in vally-tests - update collections, plugins, and evals/collections scripts 🧪 - Generated by Copilot
- capture error message in a variable for clarity - conditionally call Write-CIAnnotation if the command exists 🔧 - Generated by Copilot
…tions - move content-policy-citation from agent to shared instructions - wire references across github backlog agents, prompts, and skills - regenerate collections and plugins; add vally-test-author stimulus ✨ - Generated by Copilot
- tally totals per spec to avoid double-counting and add Specs column - index only specs declaring a top-level stimuli key - add regression tests for summary totals and stimulus indexing 🧪 - Generated by Copilot
# Conflicts: # .github/instructions/hve-core/prompt-builder.instructions.md # evals/behavior-conformance/skill-behavior.eval.yaml
- re-import CIHelpers in Prepare-Extension tests so Write-CIAnnotation resolves - add deeplink/refus/stimul stems to cspell words - add community-interaction instruction eval stimulus for coverage 🔧 - Generated by Copilot
agreaves-ms
reviewed
Jun 25, 2026
agreaves-ms
reviewed
Jun 25, 2026
agreaves-ms
reviewed
Jun 25, 2026
agreaves-ms
reviewed
Jun 25, 2026
agreaves-ms
reviewed
Jun 25, 2026
agreaves-ms
approved these changes
Jun 25, 2026
- wire Vally Test Author into prompt-builder skill and dispatch matrix - move routing, safety lint, dedupe, and report ownership into vally-tests skill - split eval failures into gating vs advisory and surface them in the PR comment ✅ - Generated by Copilot
📐 - Generated by Copilot
…re absent - reconcile unattributed failures by the spec's overall advisory posture - guard the exit-code fallback so all-advisory specs never block merge - parse quoted advisory tag values so a \\\\alse\\\\ graduates correctly - add stub fail-noname mode and a Pester case for the empty perStimulus path 🐛 - Generated by Copilot
… specs - demote per-trial failures to advisory when vally exits 0 (aggregate met threshold) - keeps real aggregate failures (exit non-zero) gating as before - add Pester coverage and run Integration-tagged eval tests 🐛 - Generated by Copilot
- implement TagFilter parameter for scoped advisory mapping - enhance logic to return filtered advisory stimuli based on tags - add unit tests for tag scoping behavior 🔍 - Generated by Copilot
…e-core into feat/1637-l6-agents
- allow loading of helper functions without running the main workflow 🔧 - Generated by Copilot
jkim323
approved these changes
Jun 26, 2026
This was referenced Jun 26, 2026
This was referenced Jun 26, 2026
33 tasks
WilliamBerryiii
added a commit
that referenced
this pull request
Jun 29, 2026
## Description Resolves a batch of documentation drift issues where prose docs and READMEs fell behind code, agent, and path changes from recent PRs. Each fix realigns stale references with the current source of truth: * **BRD/PRD output path move** (PR #2098): updated lifecycle, role, and planning docs to point at `docs/project-planning/` instead of the retired `docs/brds/` and `docs/prds/`. * **Eval CI behavior** (PRs #1834, #1949): documented shared-spec tag-aware run behavior, the missing eval npm commands, and the `mixed` stub mode plus sub-threshold `advisory-fail` status. * **Copyright tooling** (PR #2169): documented the new `CopyrightHeader.psm1` module and the `Test-CopyrightHeaders.ps1` `-Fix` switch and canonical 2026 header format. * **Collection helpers** (PR #1834): documented the new strict-safe maturity vocabulary functions in `CollectionHelpers.psm1`. * **rai-license-posture.instructions.md**: fix dead link and added rai-license-posture instruction conformance stimulus to `instructions.eval.yml` ## Related Issue(s) Closes #2179 Closes #2180 Closes #2181 Closes #2182 Closes #2183 Closes #2187 Closes #2188 Closes #2191 ## Type of Change Select all that apply: **Code & Documentation:** * [ ] Bug fix (non-breaking change fixing an issue) * [ ] New feature (non-breaking change adding functionality) * [ ] Breaking change (fix or feature causing existing functionality to change) * [x] Documentation update **Infrastructure & Configuration:** * [ ] GitHub Actions workflow * [ ] Linting configuration (markdown, PowerShell, etc.) * [ ] Security configuration * [ ] DevContainer configuration * [ ] Dependency update **AI Artifacts:** * [ ] Reviewed contribution with `prompt-builder` agent and addressed all feedback * [ ] Copilot instructions (`.github/instructions/*.instructions.md`) * [ ] Copilot prompt (`.github/prompts/*.prompt.md`) * [ ] Copilot agent (`.github/agents/*.agent.md`) * [ ] Copilot skill (`.github/skills/*/SKILL.md`) * [ ] Copilot hook (`.github/hooks/*/*.json`) * [x] Eval spec added/updated for changed AI artifacts (`evals/`) **Other:** * [ ] Script/automation (`.ps1`, `.sh`, `.py`) * [ ] Other (please describe): ## Testing <!-- Add manual testing descriptions when applicable. Run the documentation validation commands below before merging. --> ## Checklist ### Required Checks * [x] Documentation is updated (if applicable) * [x] Files follow existing naming conventions * [x] Changes are backwards compatible (if applicable) * [ ] Tests added for new functionality (if applicable) (N/A — documentation-only) ### Required Automated Checks The following validation commands must pass before merging: * [x] Markdown linting: `npm run lint:md` * [x] Spell checking: `npm run spell-check` * [x] Frontmatter validation: `npm run lint:frontmatter` * [ ] Skill structure validation: `npm run validate:skills` * [x] Link validation: `npm run lint:md-links` * [ ] PowerShell analysis: `npm run lint:ps` * [ ] Plugin freshness: `npm run plugin:generate` * [ ] Docusaurus tests: `npm run docs:test` ## Security Considerations * [x] This PR does not contain any sensitive or NDA information * [ ] Any new dependencies have been reviewed for security issues (N/A — no dependency changes) * [ ] Security-related scripts follow the principle of least privilege (N/A — no security scripts modified) ## Additional Notes Documentation-only change. #2186 (DT Coach session-path rename) is deliberately excluded from this batch. --------- Co-authored-by: Bill Berry <WilliamBerryiii@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Description
Added the Vally-facing AI artifacts plus the runner and collection-tooling support behind them.
tos-violation).-Tagfiltering forInvoke-VallySpec.Get-CollectionMaturityVocabulary,Get-CollectionMaturityRank, andResolve-StrictSafeMaturity.Related Issue(s)
Closes #1819
Type of Change
Select all that apply:
Code & Documentation:
Infrastructure & Configuration:
AI Artifacts:
prompt-builderagent and addressed all feedback.github/instructions/*.instructions.md).github/prompts/*.prompt.md).github/agents/*.agent.md).github/skills/*/SKILL.md)Other:
.ps1,.sh,.py)Sample Prompts (for AI Artifact Contributions)
User Request: "Write Vally tests for the prompt-builder agent."
Execution Flow: The vally-test-write.prompt.md entry point dispatches the vally-test-author subagent, which loads the vally-tests skill, scaffolds stimuli and expectations, selects graders, and runs the safety linter.
Output Artifacts: Stimulus and expectation YAML files under the relevant eval corpus.
Success Indicators: Generated specs pass
Test-EvalSpec.ps1and the safety linter reports no findings.Testing
Validated via
npm run lint:all(exit 0), includingnpm run lint:frontmatter,npm run lint:ps, andnpm run lint:ai-artifacts. PowerShell coverage added and passing vianpm run test:ps:tos-violationrefusal category coverage.Checklist
Required Checks
AI Artifact Contributions
/prompt-analyzeto review contributionprompt-builderreviewRequired Automated Checks
npm run lint:mdnpm run spell-checknpm run lint:frontmatternpm run validate:skillsnpm run lint:md-linksnpm run lint:psnpm run plugin:generatenpm run docs:testSecurity Considerations
Additional Notes
Seventh PR in the #1637 stack. Base branch:
feat/1637-l5-corpora. New agent, subagent, and prompts atstablematurity. Collection registration and plugin regeneration land in a later PR in this stack (#1821).