Skip to content

Feature: Include pre-rendered prompt template in artifact output for differential analysis #30450

@davidslater

Description

@davidslater

Feature: Include pre-rendered prompt template in artifact output for differential analysis

Context

Related to #29283

Problem

Currently, the gh-aw artifact output only includes the final rendered prompt.txt — the fully interpolated prompt that was sent to the model. The threat detection system (gh-aw-threat-detection) must analyze this single file to determine whether content is:

  1. Trusted — authored by the workflow developer in their prompt template
  2. Untrusted — interpolated at runtime from repo files ({{#runtime-import}}), PR metadata (${{ github.event.* }}), or other dynamic sources

Without the original template, the detector cannot distinguish between a <system> block that the workflow author intentionally wrote and one that was injected by untrusted content. This leads to:

  • Heuristic-based detection that is inherently fragile and prone to false positives/negatives
  • Inability to precisely identify injection boundaries — we can't tell where trusted template ends and untrusted interpolation begins
  • No way to verify rendering correctness — if the rendering pipeline has a bug (like the `$`` replacement issue), we can't detect it without the template for comparison

Current State

The artifact directory structure is:

artifacts/
├── aw-prompts/
│   └── prompt.txt              # Final rendered prompt (template + interpolated content)
├── agent_output.json
└── aw-*.patch / aw-*.bundle

gh-aw-threat-detection reads prompt.txt from aw-prompts/prompt.txt and must use pattern-matching heuristics to identify threats.

Target State

artifacts/
├── aw-prompts/
│   ├── prompt-template.txt     # Raw template with placeholders intact (NEW)
│   └── prompt.txt              # Final rendered prompt
├── agent_output.json
└── aw-*.patch / aw-*.bundle

The prompt-template.txt file should contain the exact template source before any interpolation occurs, preserving:

  • {{#runtime-import src="..."}} blocks as-is
  • ${{ ... }} expression placeholders as-is (or their pre-expansion form)
  • All static content (system blocks, instructions, etc.) exactly as the workflow author wrote them

Approach

In gh-aw (prompt rendering pipeline):

  1. Before interpolation begins, write the raw template content to prompt-template.txt in the aw-prompts/ artifact directory.
  2. After interpolation completes, write the rendered content to prompt.txt (existing behavior).
  3. Both files must be written atomically — if the rendered prompt is produced, the template must also be present.

Implementation location:

Wherever the prompt rendering pipeline:

  • Reads the template from .github/prompts/*.prompt or equivalent
  • Performs {{#runtime-import}} expansion
  • Performs ${{ }} expression substitution
  • Writes the final prompt.txt

Add a step immediately before expansion that copies the template verbatim to prompt-template.txt.

Content requirements for prompt-template.txt:

  • Must include the full assembled template after all template files are concatenated/composed but before any runtime content is substituted
  • Placeholder syntax must be preserved exactly (e.g., {{#runtime-import src="path/to/file"}})
  • If multiple template sources are combined (e.g., base + custom), the combined result before interpolation should be captured
  • File encoding: UTF-8, matching prompt.txt

Downstream Consumer Changes (in gh-aw-threat-detection):

Once this artifact is available, gh-aw-threat-detection will be updated to:

  1. Load prompt-template.txt alongside prompt.txt
  2. Parse placeholder locations in the template to identify interpolation boundaries
  3. Diff template vs. rendered to extract only the untrusted interpolated segments
  4. Analyze only the interpolated segments for prompt injection — eliminating false positives from trusted template content
  5. Detect structural corruption (e.g., <system> blocks in rendered that don't exist in template) definitively rather than heuristically

Testing

  1. Unit test: Verify that prompt-template.txt is written with placeholders intact when a prompt template contains {{#runtime-import}} blocks.
  2. Unit test: Verify that prompt-template.txt content exactly matches the template source before interpolation.
  3. Integration test: Run a full agent workflow with runtime imports and verify both files are present in artifacts with correct content.
  4. Backward compatibility: Ensure existing consumers of prompt.txt are not affected.
  5. Edge cases:
    • Template with no placeholders → prompt-template.txt equals prompt.txt
    • Template with multiple {{#runtime-import}} blocks → all preserved
    • Template composed from multiple files → combined pre-interpolation output captured

Security Considerations

  • prompt-template.txt contains only trusted content (authored by the workflow developer), so it has no injection risk itself.
  • The template may reveal internal prompt engineering strategies — this is acceptable since artifacts are already accessible to the repository owner.
  • The template file must not be modifiable by the target repository being analyzed (it comes from the workflow definition, not from the repo's working tree).

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions