Skip to content

feat(agents): add Vally evaluation agents and prompts#1834

Merged
WilliamBerryiii merged 19 commits into
mainfrom
feat/1637-l6-agents
Jun 26, 2026
Merged

feat(agents): add Vally evaluation agents and prompts#1834
WilliamBerryiii merged 19 commits into
mainfrom
feat/1637-l6-agents

Conversation

@WilliamBerryiii

@WilliamBerryiii WilliamBerryiii commented Jun 2, 2026

Copy link
Copy Markdown
Member

Pull Request

Description

Added the Vally-facing AI artifacts plus the runner and collection-tooling support behind them.

  • The content-policy-citation agent (content-policy-citation.agent.md) provides citation discretion rules for the CI agentic PR-review workflow.
  • The vally-test-author subagent (vally-test-author.agent.md) drives test authoring against the vally-tests skill, supporting from-artifact and corpus-import modes with a seven-category refusal taxonomy (including tos-violation).
  • Two companion prompts, evals-import.prompt.md and vally-test-write.prompt.md, expose import and authoring entry points.
  • The Vally runner (VallyRunner.psm1, Invoke-VallyEvals.ps1) gains tag-aware run plans, spec backlink counting, and -Tag filtering for Invoke-VallySpec.
  • Collection tooling (CollectionHelpers.psm1, Validate-Collections.ps1) gains strict-safe maturity propagation via Get-CollectionMaturityVocabulary, Get-CollectionMaturityRank, and Resolve-StrictSafeMaturity.

Related Issue(s)

Closes #1819

Type of Change

Select all that apply:

Code & Documentation:

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality)
  • Breaking change (fix or feature causing existing functionality to change)
  • Documentation update

Infrastructure & Configuration:

  • GitHub Actions workflow
  • Linting configuration (markdown, PowerShell, etc.)
  • Security configuration
  • DevContainer configuration
  • Dependency update

AI Artifacts:

  • Reviewed contribution with prompt-builder agent and addressed all feedback
  • Copilot instructions (.github/instructions/*.instructions.md)
  • Copilot prompt (.github/prompts/*.prompt.md)
  • Copilot agent (.github/agents/*.agent.md)
  • Copilot skill (.github/skills/*/SKILL.md)

Other:

  • Script/automation (.ps1, .sh, .py)
  • Other (please describe):

Sample Prompts (for AI Artifact Contributions)

User Request: "Write Vally tests for the prompt-builder agent."

Execution Flow: The vally-test-write.prompt.md entry point dispatches the vally-test-author subagent, which loads the vally-tests skill, scaffolds stimuli and expectations, selects graders, and runs the safety linter.

Output Artifacts: Stimulus and expectation YAML files under the relevant eval corpus.

Success Indicators: Generated specs pass Test-EvalSpec.ps1 and the safety linter reports no findings.

Testing

Validated via npm run lint:all (exit 0), including npm run lint:frontmatter, npm run lint:ps, and npm run lint:ai-artifacts. PowerShell coverage added and passing via npm run test:ps:

  • CollectionHelpers.Tests.ps1 — strict-safe maturity vocabulary, rank, and resolution (+267 lines).
  • Invoke-VallyEvals.Tests.ps1 — tag-aware run-plan and backlink behavior (+187 lines), with a new stub-vally.ps1 fixture.
  • Lint-VallyTestSafety.Tests.ps1tos-violation refusal category coverage.

Checklist

Required Checks

  • Documentation is updated (if applicable)
  • Files follow existing naming conventions
  • Changes are backwards compatible (if applicable)
  • Tests added for new functionality (if applicable)

AI Artifact Contributions

  • Used /prompt-analyze to review contribution
  • Addressed all feedback from prompt-builder review
  • Verified contribution follows common standards and type-specific requirements

Required Automated Checks

  • Markdown linting: npm run lint:md
  • Spell checking: npm run spell-check
  • Frontmatter validation: npm run lint:frontmatter
  • Skill structure validation: npm run validate:skills
  • Link validation: npm run lint:md-links
  • PowerShell analysis: npm run lint:ps
  • Plugin freshness: npm run plugin:generate
  • Docusaurus tests: npm run docs:test

Security Considerations

  • This PR does not contain any sensitive or NDA information
  • Any new dependencies have been reviewed for security issues
  • Security-related scripts follow the principle of least privilege

Additional Notes

Seventh PR in the #1637 stack. Base branch: feat/1637-l5-corpora. New agent, subagent, and prompts at stable maturity. Collection registration and plugin regeneration land in a later PR in this stack (#1821).

@WilliamBerryiii WilliamBerryiii requested a review from a team as a code owner June 2, 2026 03:43
rezatnoMsirhC
rezatnoMsirhC previously approved these changes Jun 8, 2026
Comment thread .github/agents/hve-core/subagents/vally-test-author.agent.md Outdated
Comment thread .github/prompts/hve-core/evals-import.prompt.md Outdated
Comment thread .github/agents/content-policy-citation.agent.md Outdated
Comment thread .github/agents/hve-core/subagents/vally-test-author.agent.md Outdated
Comment thread .github/agents/hve-core/subagents/vally-test-author.agent.md Outdated
Comment thread .github/agents/content-policy-citation.agent.md Outdated
Comment thread .github/prompts/hve-core/evals-import.prompt.md Outdated
Comment thread .github/prompts/hve-core/vally-test-write.prompt.md Outdated
agreaves-ms
agreaves-ms previously approved these changes Jun 9, 2026
- add vally-test-author subagent and content-policy-citation agent
- add evals-import and vally-test-write prompts

✨ - Generated by Copilot
@WilliamBerryiii WilliamBerryiii changed the base branch from feat/1637-l5-corpora to main June 23, 2026 02:17
@WilliamBerryiii WilliamBerryiii dismissed stale reviews from agreaves-ms and rezatnoMsirhC June 23, 2026 02:17

The base branch was changed.

- align vally-test-author subagent with canonical template sections and JSON report path
- whitelist Vally Test Author in prompt-builder and allow nested subagent calls
- decouple skill paths and unify JSON output path in evals-import and vally-test-write prompts
- drop attribution suffix and set disable-model-invocation on content-policy-citation

🔧 - Generated by Copilot
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Eval Execution

Status: Passed — no merge-blocking failures (31 advisory assertion failure(s) present)

  • Artifacts evaluated: 14
  • Specs run: 14
  • Assertions passed: 37
  • Assertions failed (blocking): 0
  • Assertions failed (advisory): 31
  • Failed specs (merge-blocking): 0
Artifact Kind Status Specs Passed Failed (blocking) Failed (advisory)
github-backlog-manager agent ⚠️ advisory-fail 1 2 0 2
prompt-builder agent ⚠️ advisory-fail 1 3 0 1
vally-test-author agent ⚠️ advisory-fail 1 2 0 5
community-interaction instruction ⚠️ advisory-fail 1 0 0 4
github-backlog-planning instruction ⚠️ advisory-fail 1 3 0 1
github-backlog-update instruction ⚠️ advisory-fail 1 2 0 2
prompt-builder instruction ⚠️ advisory-fail 1 3 0 1
pull-request instruction ⚠️ advisory-fail 1 3 0 1
content-policy-citation instruction ⚠️ advisory-fail 1 0 0 7
evals-import prompt ⚠️ advisory-fail 1 3 0 1
pull-request prompt ⚠️ advisory-fail 1 3 0 1
vally-test-write prompt ⚠️ advisory-fail 1 3 0 1
prompt-builder skill ⚠️ advisory-fail 1 3 0 1
vally-tests skill ⚠️ advisory-fail 1 7 0 3

Legend — ✅ clean · ⚠️ advisory failures only (non-blocking) · ⏭️ skipped · ❌ merge-blocking failure

Only Failed specs (merge-blocking) gates this PR. Advisory assertion failures are signal-quality checks captured during iteration; review them, but they do not block merge and may be acceptable.

@codecov-commenter

codecov-commenter commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.26%. Comparing base (aaef669) to head (7740baa).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1834      +/-   ##
==========================================
+ Coverage   81.25%   81.26%   +0.01%     
==========================================
  Files         127      127              
  Lines       18839    18850      +11     
  Branches       12       12              
==========================================
+ Hits        15308    15319      +11     
  Misses       3528     3528              
  Partials        3        3              
Flag Coverage Δ
docusaurus 61.84% <ø> (ø)
pester 86.09% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
scripts/collections/Modules/CollectionHelpers.psm1 99.52% <100.00%> (+0.56%) ⬆️
scripts/collections/Validate-Collections.ps1 93.88% <100.00%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

WilliamBerryiii and others added 9 commits June 23, 2026 09:59
- add Vally Test Author subagent with from-artifact and corpus-import modes
- add vally-test-write and evals-import commands
- enforce tos-violation safety lint coverage in vally-tests
- update collections, plugins, and evals/collections scripts

🧪 - Generated by Copilot
- capture error message in a variable for clarity
- conditionally call Write-CIAnnotation if the command exists

🔧 - Generated by Copilot
…tions

- move content-policy-citation from agent to shared instructions
- wire references across github backlog agents, prompts, and skills
- regenerate collections and plugins; add vally-test-author stimulus

✨ - Generated by Copilot
- tally totals per spec to avoid double-counting and add Specs column
- index only specs declaring a top-level stimuli key
- add regression tests for summary totals and stimulus indexing

🧪 - Generated by Copilot
# Conflicts:
#	.github/instructions/hve-core/prompt-builder.instructions.md
#	evals/behavior-conformance/skill-behavior.eval.yaml
- re-import CIHelpers in Prepare-Extension tests so Write-CIAnnotation resolves
- add deeplink/refus/stimul stems to cspell words
- add community-interaction instruction eval stimulus for coverage

🔧 - Generated by Copilot
Comment thread .github/instructions/hve-core/prompt-builder.instructions.md
Comment thread .github/agents/hve-core/prompt-builder.agent.md
Comment thread .github/agents/hve-core/subagents/vally-test-author.agent.md Outdated
Comment thread .github/agents/hve-core/subagents/vally-test-author.agent.md Outdated
Comment thread .github/agents/hve-core/subagents/vally-test-author.agent.md Outdated
WilliamBerryiii and others added 8 commits June 25, 2026 22:03
- wire Vally Test Author into prompt-builder skill and dispatch matrix

- move routing, safety lint, dedupe, and report ownership into vally-tests skill

- split eval failures into gating vs advisory and surface them in the PR comment

✅ - Generated by Copilot
…re absent

- reconcile unattributed failures by the spec's overall advisory posture

- guard the exit-code fallback so all-advisory specs never block merge

- parse quoted advisory tag values so a \\\\alse\\\\ graduates correctly

- add stub fail-noname mode and a Pester case for the empty perStimulus path

🐛 - Generated by Copilot
… specs

- demote per-trial failures to advisory when vally exits 0 (aggregate met threshold)

- keeps real aggregate failures (exit non-zero) gating as before

- add Pester coverage and run Integration-tagged eval tests

🐛 - Generated by Copilot
- implement TagFilter parameter for scoped advisory mapping
- enhance logic to return filtered advisory stimuli based on tags
- add unit tests for tag scoping behavior

🔍 - Generated by Copilot
- allow loading of helper functions without running the main workflow

🔧 - Generated by Copilot
@WilliamBerryiii WilliamBerryiii merged commit 18ce4c6 into main Jun 26, 2026
81 checks passed
WilliamBerryiii added a commit that referenced this pull request Jun 29, 2026
## Description

Resolves a batch of documentation drift issues where prose docs and
READMEs fell behind code, agent, and path changes from recent PRs. Each
fix realigns stale references with the current source of truth:

* **BRD/PRD output path move** (PR #2098): updated lifecycle, role, and
planning docs to point at `docs/project-planning/` instead of the
retired `docs/brds/` and `docs/prds/`.
* **Eval CI behavior** (PRs #1834, #1949): documented shared-spec
tag-aware run behavior, the missing eval npm commands, and the `mixed`
stub mode plus sub-threshold `advisory-fail` status.
* **Copyright tooling** (PR #2169): documented the new
`CopyrightHeader.psm1` module and the `Test-CopyrightHeaders.ps1` `-Fix`
switch and canonical 2026 header format.
* **Collection helpers** (PR #1834): documented the new strict-safe
maturity vocabulary functions in `CollectionHelpers.psm1`.
* **rai-license-posture.instructions.md‎**: fix dead link and added
rai-license-posture instruction conformance stimulus to
`instructions.eval.yml`

## Related Issue(s)

Closes #2179
Closes #2180
Closes #2181
Closes #2182
Closes #2183
Closes #2187
Closes #2188
Closes #2191

## Type of Change

Select all that apply:

**Code & Documentation:**

* [ ] Bug fix (non-breaking change fixing an issue)
* [ ] New feature (non-breaking change adding functionality)
* [ ] Breaking change (fix or feature causing existing functionality to
change)
* [x] Documentation update

**Infrastructure & Configuration:**

* [ ] GitHub Actions workflow
* [ ] Linting configuration (markdown, PowerShell, etc.)
* [ ] Security configuration
* [ ] DevContainer configuration
* [ ] Dependency update

**AI Artifacts:**

* [ ] Reviewed contribution with `prompt-builder` agent and addressed
all feedback
* [ ] Copilot instructions (`.github/instructions/*.instructions.md`)
* [ ] Copilot prompt (`.github/prompts/*.prompt.md`)
* [ ] Copilot agent (`.github/agents/*.agent.md`)
* [ ] Copilot skill (`.github/skills/*/SKILL.md`)
* [ ] Copilot hook (`.github/hooks/*/*.json`)
* [x] Eval spec added/updated for changed AI artifacts (`evals/`)

**Other:**

* [ ] Script/automation (`.ps1`, `.sh`, `.py`)
* [ ] Other (please describe):

## Testing

<!-- Add manual testing descriptions when applicable. Run the
documentation validation commands below before merging. -->

## Checklist

### Required Checks

* [x] Documentation is updated (if applicable)
* [x] Files follow existing naming conventions
* [x] Changes are backwards compatible (if applicable)
* [ ] Tests added for new functionality (if applicable) (N/A —
documentation-only)

### Required Automated Checks

The following validation commands must pass before merging:

* [x] Markdown linting: `npm run lint:md`
* [x] Spell checking: `npm run spell-check`
* [x] Frontmatter validation: `npm run lint:frontmatter`
* [ ] Skill structure validation: `npm run validate:skills`
* [x] Link validation: `npm run lint:md-links`
* [ ] PowerShell analysis: `npm run lint:ps`
* [ ] Plugin freshness: `npm run plugin:generate`
* [ ] Docusaurus tests: `npm run docs:test`

## Security Considerations

* [x] This PR does not contain any sensitive or NDA information
* [ ] Any new dependencies have been reviewed for security issues (N/A —
no dependency changes)
* [ ] Security-related scripts follow the principle of least privilege
(N/A — no security scripts modified)

## Additional Notes

Documentation-only change. #2186 (DT Coach session-path rename) is
deliberately excluded from this batch.

---------

Co-authored-by: Bill Berry <WilliamBerryiii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Vally evaluation agents and prompts

5 participants