Skip to content

Activation/agent artifact-name collision when multiple reusable gh-aw workflows are dispatched in parallel from one orchestrator #30338

@theletterf

Description

@theletterf

Summary

When an orchestrator workflow dispatches multiple gh-aw reusable workflows in parallel via workflow_call, the compiler-generated artifact prefix is identical across all of them — both the <prefix>-activation.zip and <prefix>-agent.zip artifacts collide. As a result, every parallel agent ends up loading the same prompt (whichever activation upload won the race), regardless of which lock.yml it was compiled from.

This silently corrupts every fan-out pattern: each downstream agent runs the wrong prompt, calls the wrong skills, and produces output for the wrong sweep — but the workflow run looks "successful" because each sweep still creates an issue with the correct labels (safe-outputs config is per-workflow and unaffected).

Reproduction

Orchestrator (in elastic/docs-content's .github/workflows/docs-quality-sweep.yml) dispatches seven gh-aw reusable workflows in parallel from a single workflow_dispatch:

jobs:
  frontmatter:
    uses: elastic/docs-actions/.github/workflows/gh-aw-docs-frontmatter-sweep.lock.yml@v1
  applies-to:
    uses: elastic/docs-actions/.github/workflows/gh-aw-docs-applies-to-sweep.lock.yml@v1
  openings:
    uses: elastic/docs-actions/.github/workflows/gh-aw-docs-openings-sweep.lock.yml@v1
  style:
    uses: elastic/docs-actions/.github/workflows/gh-aw-docs-style-sweep.lock.yml@v1
  # ...etc, seven sweeps total

Each lock.yml contains a different prompt (different skill imports, different skill(skill: …) invocations, different categories, different "Generated by" line).

Real failing run on gh-aw v0.71.1: https://github.com/elastic/docs-content/actions/runs/25369724670 (resulting issues: elastic/docs-content#6266 / #6267 / #6268 / #6269).

Evidence

1. Artifact upload-name collision

$ gh api repos/.../actions/jobs/<job-id>/logs | grep "Uploading artifact"

style       activation: Uploading artifact: cfbe6cb9-activation.zip
openings    activation: Uploading artifact: cfbe6cb9-activation.zip
applies-to  activation: Uploading artifact: cfbe6cb9-activation.zip
frontmatter activation: Uploading artifact: cfbe6cb9-activation.zip

style       agent: Uploading artifact: cfbe6cb9-agent.zip
openings    agent: Uploading artifact: cfbe6cb9-agent.zip
applies-to  agent: Uploading artifact: cfbe6cb9-agent.zip
frontmatter agent: Uploading artifact: cfbe6cb9-agent.zip

The prefix cfbe6cb9 is identical across all four parallel reusable workflows. It does not appear as a literal in any lock.yml.

2. All agents downloaded the same artifact ID + size

style       agent downloads: cfbe6cb9-activation (ID: 6803476521, Size: 5835)
openings    agent downloads: cfbe6cb9-activation (ID: 6803476521, Size: 5835)
applies-to  agent downloads: cfbe6cb9-activation (ID: 6803476521, Size: 5835)
frontmatter agent downloads: cfbe6cb9-activation (ID: 6803476521, Size: 5835)

Same artifact ID 6803476521. Single source of truth, four consumers.

3. All agents loaded an identical-size prompt file

style       agent prompt: size=10851B
openings    agent prompt: size=10851B
applies-to  agent prompt: size=10851B
frontmatter agent prompt: size=10851B

But the lock.yml prompts have different sizes:

style-sweep.lock.yml      prompt heredocs total =  8432B
openings-sweep.lock.yml   prompt heredocs total = 10048B
applies-to-sweep.lock.yml prompt heredocs total =  7704B
frontmatter-sweep.lock.yml prompt heredocs total =  9979B

So the runtime prompt did not come from each agent's own lock.yml — it came from one shared activation artifact. After leading-whitespace stripping and assembly, the frontmatter activation produced the 10851-byte file that everyone downloaded.

4. Skills called confirm the wrong prompt

Every agent's log shows only:

skill(skill: docs-frontmatter-audit)
skill(skill: docs-frontmatter-description)

These are the literals from the frontmatter sweep's prompt. The style sweep should call skill(skill: docs-check-style); the openings sweep should call skill(skill: docs-page-opening-optimizer); the applies-to sweep should call skill(skill: docs-applies-to-tagging). None of those literals appear in any of the four agents' logs.

The openings agent additionally tried skill(skill: docs-frontmatter-description) and got Skill not found: docs-frontmatter-description — the openings lock.yml never imports that skill, so the APM bundle didn't install it; but the prompt the agent received told it to call that skill anyway.

5. Three issue bodies are byte-identical except for gh-aw footer

diff <(gh issue view 6266 --json body | jq -r .body) <(gh issue view 6267 --json body | jq -r .body)
# only differences: the post-agent gh-aw footer "Generated by [Docs <name> sweep agent]"
# and the <!-- gh-aw-workflow-id --> HTML comment
# the actual findings list is character-identical

Root cause hypothesis

The artifact prefix cfbe6cb9 looks like an 8-character hash. It's computed at the compiled workflow level, presumably as a digest of inputs that are the same for every reusable workflow_call from the same orchestrator run — e.g. sha(github.workflow + github.run_id)[0:8]. Both of those values are identical for every reusable invocation, since github.workflow resolves to the caller's name in workflow_call context.

The artifact prefix needs to also include the called workflow's identity — the source workflow file path, the lock.yml SHA, or the reusable's job-id (gh-aw-docs-style-sweep etc.). Anything that varies per reusable workflow.

Suggested fix

Mix the source workflow's lock-file digest (or github.workflow_ref, which is unique per called workflow) into the artifact-prefix hash. With that change, parallel reusable workflows in one orchestrator run get distinct artifact names and no longer cross-contaminate.

Why this matters

This makes the orchestrator-fanout pattern silently broken for any non-trivial agent prompts. Every dispatch produces N issues that all reflect the same prompt — the surface is multiple workflows, but the substance is a single one. The failure mode is invisible at the GitHub Actions UI level (every job is green; safe-outputs are correctly labeled per workflow) so users won't notice without inspecting issue bodies side by side.

Workarounds

For consumers, the only workarounds I've found:

  1. Don't fan out from a single orchestrator — install each reusable workflow as its own consumer-side workflow file with its own name:. Each then has a distinct github.workflow, distinct prefix hash, distinct artifacts. Loses the "one place to dispatch all sweeps" UX.
  2. Run sweeps sequentially — make each sub-job needs: [previous] so artifact uploads don't overlap in time. The overwrite race goes away but at the cost of parallelism.
  3. Stagger dispatches by handgh workflow run … && sleep 60 && gh workflow run …. Same trade-off as (2).

None of these are clean. A fix at gh-aw's prefix-derivation seems much more pleasant.

Happy to provide more reproduction artifacts (full job logs, full lock.yml dumps, side-by-side issue body diffs) if useful.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions