feat(agents): add named research subagents to Task Researcher#2165
Draft
engineersamuel wants to merge 47 commits into
Draft
feat(agents): add named research subagents to Task Researcher#2165engineersamuel wants to merge 47 commits into
engineersamuel wants to merge 47 commits into
Conversation
Add lane orchestration to Task Researcher with research lanes for codebase locator, analyzer, pattern finder, and external research. Includes lane trigger matrix, prompt templates, and synthesis rules. Preserves Researcher Subagent as the only default child agent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add optional research lane parameter to inputs - Support lane names: Codebase locator, Codebase analyzer, Codebase pattern finder, External research - Add lane-specific output requirements for each lane - Update response format to include lane and status - Record research lane in document using 'Research lane: <lane name>' format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add mode and subagents command inputs with documentation for trigger-based selection, focused vs lane-enabled research modes, and subagent fan-out preferences. Extends argument-hint with new optional parameters and adds requirement for trigger matrix evaluation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add task-researcher-lane-fanout stimulus for research lane fan-out behavior - Add task-researcher-focused-mode stimulus for lightweight research mode - Add task-researcher-generic-subagent stimulus for subagent preservation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…de quality - Remove committed hve_core_task_researcher_comparison.egg-info directory - Add *.egg-info/ to .gitignore to prevent future packaging artifacts - Tighten build_metrics() return type from list[LazyGEval | GEval] to list[LazyGEval] - Add comprehensive docstrings to public functions: require_deepeval_llm_enabled, build_comparison_test_case, build_metrics - Fix docstring whitespace issues (ruff W293) 🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add explicit error handling to catch subprocess.CalledProcessError when the runner command fails, providing context about scenario and variant with user-friendly error messages to stderr and returning nonzero exit code. 🐛 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…collection - Fix npm run eval:task-researcher:compare to explicitly specify tests path - Prevents pytest from collecting 2979 unrelated repository tests - Tests now pass cleanly: 6 passed, 1 skipped - Resolves review finding: root command now passes deterministically 🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔌 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- add locator, analyzer, pattern, and web research agents - keep each agent scoped to a single research lane 🤖 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update Task Researcher to invoke named subagents in lane mode.\n\n🧭 - Generated by Copilot\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- update task-research prompt fan-out guidance - package codebase and web research subagents in hve-core 🧭 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- update stimulus, fixtures, and static scoring for named lanes - refresh comparison README and fixture outputs 🔍 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- restore pre-existing model arrays for unrelated agents\n- add headers to task-researcher comparison Python files\n- clarify fixtures are synthetic and live verification is separate\n\n🛠️ - Generated by Copilot\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔒 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Named lane subagents no longer create per-lane artifact files - Remove edit tools from codebase-locator, codebase-analyzer, codebase-pattern-finder, web-search-researcher - Update output sections to return structured findings in chat response - Remove file-writing prerequisites; replace with read-only setup steps - Remove lane-specific output requirements from Researcher Subagent - Update Task Researcher delegation and synthesis rules for read-only lanes - Add synthesize-into-main-doc rule to task-research prompt 🔬 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add one focused-mode return sentence to keep the delegation section symmetric without reintroducing lane artifacts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> 🔧 - Generated by Copilot
🔒 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace names-all-four-subagents grader with per-lane and conditional web-search graders in task-researcher.yml stimuli - Update codebase-lane required_evidence to list three local subagent files; remove Web Search Researcher - Update external-api required_evidence to include web-search-researcher subagent path and Web Search Researcher - Replace NAMED_SUBAGENT_MARKERS with LOCAL_LANE_MARKERS + WEB_LANE_MARKER constants - Rewrite _score_mode_compliance: codebase-lane penalizes web search, focused penalizes any lane fanout, external requires both lanes and web search plus FAR evidence - Update fixtures to match new scoring: remove web lane from codebase-lane, add FAR note and web lane to external-api, strip lane markers from focused-local - Add test_focused_case_penalizes_any_lane_fanout test 🔬 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
\n🔒 - Generated by Copilot\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- add runner_argv_from_env() parsing TASK_RESEARCHER_RUNNER_ARGV as JSON array - replace shell=True subprocess with safe argv list execution - add ValueError handler for malformed TASK_RESEARCHER_RUNNER_ARGV - add test_capture.py covering build_prompt and runner_argv_from_env 🔒 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- catch invalid runner argv before subprocess execution - align empty env handling and add parser coverage - restore README note about live captures 🛠️ - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ath guards - add validate_plugin_id() to reject unsafe plugin ID slugs - add safe_installed_path() to refuse delete targets outside INSTALL_ROOT - verify named Task Researcher subagents in verify_source_plugin() - update scripts/plugins/README.md with security description 🔒 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep dry-run from mutating the install root while preserving path safety checks.\n\n🔒 - Generated by Copilot\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nstalled_path - safe_installed_path() now resolves both INSTALL_ROOT and candidate parent via pwd -P; dry-run falls back to lexical check when dirs do not exist - add mkdir -p INSTALL_ROOT and INSTALL_ROOT/_direct in non-dry-run mode before safe path checks, satisfying spec requirement 🔒 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…bagents - regenerate plugins/hve-core and plugins/hve-core-all with codebase-analyzer, codebase-locator, codebase-pattern-finder, and web-search-researcher subagents - regenerate extension package.json and README artifacts for named subagents - update evals/README.md ms.date to 2026-06-24 - rename evals/agent-behavior/stimuli/adr-creator.yml to adr-creation.yml so the agent backlink tag resolves to adr-creation.agent.md - regenerate evals/agent-behavior/eval.yaml from corrected partials - fix Test-EvalSpec.ps1 null-safe array wrap on Test-EvalSpecCompliance return 🔄 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔗 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phrases like "unrelated" and "not relevant" describe intentional exclusions, which signal good noise control. Scoring 0 on their presence inverted the metric. Remove the word-match guard; rely on word-count only. 🔬 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔒 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A model field silently de-registers a subagent from the dispatchable agent-type registry, so the named Task Researcher lanes never fanned out. Removing it aligns source with the already-regenerated plugin copies that omit it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Lane Trigger Matrix is the sole lane selector; the separate Mode Selection block duplicated it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match report recommendation logic to the documented coverage/actionability/noise rubric, label the static scorers as lexical signal and proxy heuristics, and add recommendation boundary tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both terms are introduced by the Task Researcher comparison harness and lane stimuli. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Main's vally schema rejects boolean tag values, so the task-researcher advisory tags become strings. Regenerate the agent-behavior eval spec and hve-core plugin readme to include the named lanes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f69af70 to
3c660f8
Compare
Author
@microsoft-github-policy-service agree company="Microsoft" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR gives Task Researcher a set of named, parallel research subagents ("lanes") plus a single public control to turn them on or off. When enabled, the agent fans out to specialized read-only subagents in parallel and then synthesizes their findings into one consolidated research document, so the user still receives a single .copilot-tracking/research artifact backed by deeper, broader investigation.
The change also tightens the untrusted-content boundary so lane and web findings are always treated as data, adds a deterministic comparison harness that scores research quality with subagents disabled versus enabled, and regenerates collections and plugin outputs so the agent and all five subagents install together.
Named research lanes
Four read-only, chat-only subagents return structured findings for the parent to synthesize; they write no files of their own.
Single public control
The slash command in .github/prompts/hve-core/task-research.prompt.md exposes one knob and keeps the internal lane vocabulary out of the user interface:
auto(default) lets the agent's Lane Trigger Matrix pick the lightest sufficient path.trueforces the named lanes to fan out in parallel.falsekeeps research direct unless that makes the request impossible.The earlier
mode={auto|focused|lanes}input was removed, so lane selection is now internal orchestration. The redundant Mode Selection block in task-researcher.agent.md was dropped, leaving the Lane Trigger Matrix as the single selector.Lane subagent dispatch fix
During validation the lanes were not firing. A
model:field on a subagent definition silently dropped it from the dispatchable agent-type registry. Removing themodel:arrays from the four lane subagents restored registration and dispatch, which was confirmed end-to-end in a clean local install.Safety, evals, and packaging
The remaining work hardened the surrounding boundary, evaluation, and packaging surface.
.copilot-tracking/research/**, so lane and web findings are synthesized as data rather than executed as instructions.Related Issue(s)
None
Type of Change
Select all that apply:
Code & Documentation:
Infrastructure & Configuration:
AI Artifacts:
prompt-builderagent and addressed all feedback.github/instructions/*.instructions.md).github/prompts/*.prompt.md).github/agents/*.agent.md).github/skills/*/SKILL.md)Other:
.ps1,.sh,.py)Sample Prompts (for AI Artifact Contributions)
User Request:
Execution Flow:
autoselects lane mode and fans out to the Codebase Locator, Codebase Analyzer, and Codebase Pattern Finder in parallel; Web Search Researcher is omitted because the request is local-only..copilot-tracking/research/<date>/.Output Artifacts:
A single research markdown document, for example:
Success Indicators:
subagent.startedentries for the expected lanes.Testing
npm run eval:task-researcher:compareran the comparison harness tests: 17 passed, 1 skipped (the DeepEval LLM-judge test skips unlessDEEPEVAL_RUN_LLM=1).npm run lint:md,npm run spell-check,npm run lint:frontmatter,npm run validate:skills,npm run lint:md-links, andnpm run lint:psall passed.ruff checkon the comparison harness reported no issues, andnpm run lint:ai-artifactsreported no issues.subagent.startedsession events for the four named lanes.subagents=trueandsubagents=falsein isolated instances and verified fan-out at the event level (see Additional Notes).Checklist
Required Checks
AI Artifact Contributions
/prompt-analyzeto review contributionprompt-builderreviewRequired Automated Checks
The following validation commands must pass before merging:
npm run lint:mdnpm run spell-checknpm run lint:frontmatternpm run validate:skillsnpm run lint:md-linksnpm run lint:psnpm run plugin:generatenpm run docs:testSecurity Considerations
Additional Notes
.copilot-tracking/research/...document.subagentsis the only user-facing knob. The direct, focused, and lane behaviors are internal orchestration driven by the Lane Trigger Matrix.withrun fanned out the expected local lanes and correctly omitted Web Search Researcher for local-only prompts. The sample shows a stable direction rather than a tight confidence interval, and inline runs remained strong on narrow, single-file questions, which is the splitautois designed to route.deepevalandpytestas dev dependencies under scripts/evals/task-researcher-comparison; the DeepEval LLM-judge path is opt-in viaDEEPEVAL_RUN_LLM=1.docs/changed in this PR, sonpm run docs:testwas not run.