feat(compile): runtime prompt rendering via composable bash + awk (supersedes #617)#623
feat(compile): runtime prompt rendering via composable bash + awk (supersedes #617)#623jamesadevine wants to merge 1 commit into
Conversation
Default behaviour: the agent body is no longer embedded in the compiled pipeline YAML. The compiled Agent job emits a single "Render agent prompt" bash step that, at pipeline runtime, cats the source .md from the workspace, appends extension supplements as labelled heredocs (visible directly in the lock yaml), strips the front matter via awk, and runs a single-pass awk substitution program. Body-only edits to the source .md no longer require recompiling the pipeline. Set inlined-imports: true in front matter to opt out and keep the legacy heredoc-embedded behaviour. The single-pass awk substitution recognises four token shapes (backslash-escape $(VAR), parameters expressions, $(VAR), and $[...]) in priority order; replacement values are looked up in awk ENVIRON and inserted verbatim, never re-scanned. This blocks the queue-with-malicious-parameter-value chaining attack without requiring a Node bundle. Supersedes #617. Restructures the v2 design after observing that gh-aw renders prompts via composable inline shell steps rather than a Node bundle; that approach is far more transparent in the generated YAML and avoids a Node install on the Agent VM. Also folds in: - Flattened ado-script.zip layout: top-level gate.js, /tmp/ado-aw-scripts/gate.js runtime path. - Shared needs_scripts_bundle() trait method on CompilerExtension to dedupe the NodeTool@0 + scripts-zip download once per consuming job. - inlined-imports field on FrontMatter. - strip_prefix(" ") fix in the inlined branch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Closing per author request — the v3 design dropped the For future reference, what v3 does:
What v3 does NOT do (and what motivated this PR being closed):
Future work should reintroduce the token mechanism (likely as an awk/sed pass or a small dedicated bundle, depending on how complex the resolution semantics need to be — nested imports, cycle detection, etc.). |
🔍 Rust PR ReviewSummary: Looks good overall — the bash+awk design is sound and the security property (single-pass ENVIRON substitution) is well-preserved. A few small issues worth fixing before merge. Findings🐛 Bugs / Logic Issues
|
Summary
Supersedes #617. Same goal — body-only edits to the agent
.mdshould not require recompiling the pipeline — but a materially different mechanism. After inspecting how gh-aw renders prompts in its generated lock yamls (.github/workflows/*.lock.yml), it was clear that their approach is significantly more transparent and avoids the heavyweight Node bundle. This PR rebuilds the runtime-rendering path along the gh-aw lines.What the compiled Agent job now emits (default,
inlined-imports: false):Everything is visible in the lock yaml: which body file is loaded, which supplements are appended (and their contents), how front matter is stripped, and how substitution works.
Why v3 supersedes v2 (#617)
PR #617 shipped a
prompt.jsncc bundle driven by a base64-encodedPromptSpec. It worked, but:node /tmp/ado-aw-scripts/prompt.js+ 4KB base64 specprompt_supplement()→ goes intoPromptSpec.supplementsJSON arrayprompt_supplement()→ embeds as a labelled heredoc directly in YAMLThe gh-aw lock yamls in this repo (e.g.
.github/workflows/rust-pr-reviewer.lock.ymllines 160-220) use exactly this composable-bash shape. After looking at one carefully, v2's "neat symmetric IR" turned out to be solving a problem nobody had.Single-pass substitution (security property preserved)
The awk substitution program walks the assembled prompt once, matching at each position any of:
\$(VAR)$(VAR)literal.${{ parameters.NAME }}ENVIRON["ADO_AW_PARAM_<UPPER>"]$(VAR)/$(VAR.SUB)ENVIRON["<UPPER>"](dot→underscore)$[ ... ]The walker uses substring slicing rather than
gsub()so replacement text is never re-scanned. This blocks the same chaining attack flagged in #395's bot review: if a caller queues withtarget = "$(System.AccessToken)", the substituted value lands in the rendered prompt as the literal string$(System.AccessToken), not the access token. Unit and integration tests assert this property directly.Scope
awk.awksubstitution (blocks the chaining attack).inlined-imports: trueescape hatch (gh-aw-compatible field name).ado-script.ziplayout: top-levelgate.js,/tmp/ado-aw-scripts/gate.js.needs_scripts_bundle()trait method onCompilerExtension(gate consumes; future bundles join the same dedupe).strip_prefix(" ")fix in the inlined heredoc branch (preserves author-supplied leading whitespace).prompt.jsbundle. NoPromptSpecIR. NoADO_AW_PROMPT_SPECenv. NoExportPromptSchemaCLI. Notypes-prompt.gen.ts.Files
Rust
src/compile/common.rs—collect_prompt_supplements,generate_prepare_agent_prompt(both branches),PromptSupplementstruct (local, not in an IR file),supplement_delimiter. Templates'{{ agent_content }}→{{ prepare_agent_prompt }}.src/compile/extensions/mod.rs—needs_scripts_bundle()trait method,node_tool_step/scripts_download_step/scripts_install_steps_if_neededshared helpers.src/compile/extensions/trigger_filters.rs— refactored to declareneeds_scripts_bundle().src/compile/filter_ir.rs— gate path constant updated.src/compile/types.rs—inlined-imports: boolfield onFrontMatter.src/data/{base,1es-base,job-base,stage-base}.yml—{{ agent_content }}→{{ prepare_agent_prompt }}.Tests
tests/fixtures/runtime-prompt-default-agent.md(new)tests/fixtures/runtime-prompt-inlined-agent.md(new)tests/compiler_tests.rs— 4 new integration tests asserting body absence, awk substitution presence, supplement heredoc structure, and inlined-mode heredoc.src/compile/common.rs(test module) — 8 unit tests forgenerate_prepare_agent_promptcovering both branches, parameter env mappings, supplement embedding, the\$(...)escape, and the chaining-attack regression.src/compile/types.rs(test module) — 4 round-trip tests for theinlined-importsfield.CI / release
.github/workflows/release.yml— flatten zip layout (top-levelgate.js)..github/workflows/ado-script.yml— unchanged from origin/main (notypes-prompt.gen.tsdrift check needed).Docs
docs/ado-script.md— reworded intro to clarify prompt rendering is NOT a bundle; pointer totemplate-markers.mdandfront-matter.md.docs/front-matter.md—inlined-importsfield documented; explanation describes the bash+awk shape.docs/template-markers.md—{{ agent_content }}→{{ prepare_agent_prompt }}with full description of the compose + strip + substitute pipeline.docs/extending.md—needs_scripts_bundle()trait method; supplement delivery via labelled heredocs.AGENTS.md— source-tree overview updated (noprompt_ir.rs, noprompt.js).Test plan
cargo build✓cargo test— 1602 lib + 96 compiler_tests + sundry, 0 failures ✓cargo clippy --all-targets --all-features— no new lints (baseline pre-existed) ✓tests/fixtures/minimal-agent.md:cat "$(Build.SourcesDirectory)/m.md", includes the SafeOutputs supplement heredoc, awk strip program, and awk substitution program. Body line is absent from YAML. No NodeTool@0, no scripts.zip download for prompt rendering, no JS bundle.inlined-imports: true. YAML contains the body verbatim inside anAGENT_PROMPT_EOFheredoc; supplements emitted as separatecat >>steps via the unchangedwrap_prompt_append.The awk substitution program is structurally identical to a battle-tested pattern (single-pass walker with substring slicing). I do not have a Linux awk locally to do an end-to-end runtime verification, but
cargo testcovers the YAML emission shape and the unit tests cover the single-pass property by construction (replacement text is fed through ENVIRON, never re-injected into the regex match).Migration
Default behaviour changes. Existing compiled
.lock.ymlfiles will failado-aw checkafter this lands until consumers recompile.inlined-imports: trueis the documented escape hatch for users who can't recompile immediately, need a fully self-contained YAML, or are compiling outside the trigger repo.