Skip to content

fix(flue-review): unstick the reviewer on @flue 1.0 (sandbox AbortSignal hang)#1520

Merged
ascorbic merged 3 commits into
mainfrom
fix/flue-review-sandbox-hang
Jun 17, 2026
Merged

fix(flue-review): unstick the reviewer on @flue 1.0 (sandbox AbortSignal hang)#1520
ascorbic merged 3 commits into
mainfrom
fix/flue-review-sandbox-hang

Conversation

@ascorbic

@ascorbic ascorbic commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes the automated PR reviewer (infra/flue-review), which hung on every run after the @flue 1.0 upgrade.

Root cause: @flue's cloudflareSandbox adapter forwards an AbortSignal to getSandbox(...).exec(), which is a Worker → Durable Object RPC call. An AbortSignal created outside the target DO can't cross that boundary, so the call never dispatches the command and hangs forever. @flue attaches a signal to every shell call (via createCallHandle) and to the agent's own bash/grep tools, so the workflow died right after the workspace scan, before the git checkout could run. Verified: an identical exec with no signal returns in ~50ms; with a live signal it never returns — across @cloudflare/sandbox 0.10.3/0.12.1 and both the http and rpc transports (so it is not a version/transport issue). Standalone repro: https://github.com/ascorbic/flue-sandbox-repro

This PR:

  • Signal-strip sandbox factory — wraps cloudflareSandbox(...) in a SandboxFactory that drops the AbortSignal before exec crosses the DO RPC boundary. We don't need cooperative exec cancellation; withCapacityRetry still bounds the model call and the container has its own lifecycle. This also covers the agent's own tool calls (same path).
  • AGENTS.md discovery — checks out the PR before init() (via the raw sandbox stub, no signal) so @flue's init-time workspace scan discovers the repo's AGENTS.md and .agents/skills/. Previously the checkout ran after init(), so the scan saw an empty /workspace and the review skill's "check against AGENTS.md conventions" had no AGENTS.md.
  • Deps/transportagents 0.13 → 0.14 (required by @flue 1.0), @cloudflare/sandbox 0.10.3 → 0.12.1 (+ matching image), and SANDBOX_TRANSPORT: "rpc" (the http/websocket transports are deprecated, removed after 2026-07-09).
  • FlueRegistry reset migration — upgrading @flue in place left the run-index DO's flue_registry_runs table with a pre-1.0 owner_kind NOT NULL column that 1.0 no longer writes, so every run-index write failed the constraint (non-fatal, but log spam + broken flue logs). Cloudflare won't delete-class a bound DO, so this is applied as a two-step deploy: deploy once with the FLUE_REGISTRY binding removed + deleted_classes, then a normal deploy recreates it fresh + rebinds.

Closes: n/a (internal infra, no tracking issue).

Type of change

  • Bug fix
  • Feature (requires maintainer-approved Discussion)
  • Refactor (no behavior change)
  • Translation
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes (tsc --noEmit in infra/flue-review)
  • pnpm lint passes (oxlint, 0 warnings/errors on the changed files)
  • pnpm test passes — n/a: flue-review has no test suite. Verified end-to-end instead: triggered reviews against a local flue dev and the deployed Worker; the agent now checks out the PR, discovers AGENTS.md, and traces the diff with git/rg without hanging.
  • pnpm format — n/a: oxfmt targets packages//demos/, not infra/. Files use tabs, consistent with the existing sources.
  • i18n — n/a: no admin UI strings.
  • Changeset — n/a: @emdash-cms/flue-review is private/unpublished.
  • Discussion — n/a: bug fix.

AI-generated code disclosure

  • This PR includes AI-generated code — model/tool: Claude Opus 4.8 (OpenCode)

Screenshots / test output

Deployed and verified on the live Worker (emdash-flue-review): the registry-reset two-step deploy applied cleanly (v3-reset-registry-dropv4-reset-registry-new, cursor at v4), the Worker is healthy (401 on unsigned webhook), and a triggered review checks out the repo and runs git diff / rg against the checkout with no hang.


Try this PR

Open a fresh playground →

A full working EmDash site, deployed from this branch. Each visit gets its own session-scoped sandbox: no login needed and no shared state. Try the admin, edit content, hit the public site.

Tracks fix/flue-review-sandbox-hang. Updated automatically when the playground redeploys.

The reviewer hung on every run after the @flue 1.0 upgrade. Root cause:
@flue's cloudflareSandbox adapter forwards an AbortSignal to
getSandbox(...).exec(), a Worker->Durable Object RPC call the signal can't
cross, so every container command (our git checkout and the agent's own
bash/grep tools) hung forever.

- Wrap cloudflareSandbox in a SandboxFactory that strips the AbortSignal
  before exec crosses the DO RPC boundary (we don't need exec cancellation;
  withCapacityRetry still bounds the model call).
- Check out the PR before init() via the raw sandbox stub so @flue's
  init-time workspace scan discovers the repo's AGENTS.md and .agents/skills.
- Bump agents 0.13->0.14 and @cloudflare/sandbox 0.10->0.12 (+image), and set
  SANDBOX_TRANSPORT=rpc (the http/websocket transports are deprecated).
- Reset the FlueRegistry run-index DO: upgrading @flue in place left its
  flue_registry_runs table with a pre-1.0 owner_kind NOT NULL column that
  1.0 no longer writes, breaking every run-index write. Applied via a
  two-step deploy (drop with the binding removed, then recreate + rebind).
@changeset-bot

changeset-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 69e97ac

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions

Copy link
Copy Markdown
Contributor

Scope check

This PR changes 865 lines across 8 files. Large PRs are harder to review and more likely to be closed without review.

If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs.

See CONTRIBUTING.md for contribution guidelines.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 17, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
docs 69e97ac Jun 17 2026, 12:55 PM

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 17, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-playground 69e97ac Jun 17 2026, 12:55 PM

@github-actions github-actions Bot added the review/needs-review No maintainer or bot review yet label Jun 17, 2026
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 17, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-demo-cache 69e97ac Jun 17 2026, 12:55 PM

@pkg-pr-new

pkg-pr-new Bot commented Jun 17, 2026

Copy link
Copy Markdown

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@1520

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@1520

@emdash-cms/auth-atproto

npm i https://pkg.pr.new/@emdash-cms/auth-atproto@1520

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@1520

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@1520

@emdash-cms/contentful-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/contentful-to-portable-text@1520

emdash

npm i https://pkg.pr.new/emdash@1520

create-emdash

npm i https://pkg.pr.new/create-emdash@1520

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@1520

@emdash-cms/plugin-cli

npm i https://pkg.pr.new/@emdash-cms/plugin-cli@1520

@emdash-cms/plugin-types

npm i https://pkg.pr.new/@emdash-cms/plugin-types@1520

@emdash-cms/registry-client

npm i https://pkg.pr.new/@emdash-cms/registry-client@1520

@emdash-cms/registry-lexicons

npm i https://pkg.pr.new/@emdash-cms/registry-lexicons@1520

@emdash-cms/sandbox-workerd

npm i https://pkg.pr.new/@emdash-cms/sandbox-workerd@1520

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@1520

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@1520

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@1520

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@1520

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@1520

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@1520

@emdash-cms/plugin-field-kit

npm i https://pkg.pr.new/@emdash-cms/plugin-field-kit@1520

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@1520

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@1520

commit: 69e97ac

A thorough agentic review makes many tool calls and turns and legitimately
runs well over 6 minutes (the per-file stat round-trips through the container
RPC alone add up), so the 6m per-attempt timeout killed real reviews mid-flight.
It was a guard against the now-fixed sandbox hang, not a budget for the review.
Flue's submission durability still caps the whole run at 1h.
@ascorbic ascorbic added the bot:review Trigger an emdashbot code review on this PR label Jun 17, 2026

@emdashbot emdashbot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach

This is the right change for the right problem. The root-cause analysis (an AbortSignal created in the Worker can't cross the Sandbox DO RPC boundary, so @flue's always-attached signal hangs every exec) is credible and well-documented, and the three-part fix matches it: strip the signal in a SandboxFactory wrapper, move the git checkout ahead of init() so Flue's workspace scan actually finds AGENTS.md/.agents/skills/, and reset the FlueRegistry DO whose pre-1.0 schema no longer accepts 1.0 writes. The signal-strip also correctly covers the agent's own shell tool calls (same exec path), and the model-call timeout in withCapacityRetry is left intact (it bounds the inference call, which is a Worker→Workers-AI call, not a DO RPC). For an internal, unpublished infra tool this is sound and fits the codebase. Note the PR also bundles a model swap (kimi → glm-5.2) and a 6m→20m per-attempt timeout bump beyond the stated "hang fix"; that's reasonable latitude for the maintainer's own tool, just flagging it's wider than the title.

What I checked

  • review.ts: traced the reviewSandbox wrapper, the raw containerStub.exec(setup) pre-init path, the createAgent(({ id, env }) => …) wiring, session.skill("review", …), and the finally reaction cleanup. Security is intact — assertSafe still validates owner/repo (NAME) and baseRef/headRef (REF, rejects ..) before any interpolation into the git script, and prNumber is a positive int, so the git fetch pull/N/head and cloneUrl are injection-safe. The signal: undefined override reaches the SDK via the options bag, consistent with the documented hang mechanism.
  • github.ts: the summary.trim() || FALLBACK_SUMMARY guard correctly prevents the GitHub 422-on-blank-COMMENT-body failure on both the with-comments and body-only paths, and the body-only fallback now folds findings inline (previously dropped). Logic is correct; the first request failing (422) means no review was created, so the retry can't duplicate.
  • wrangler.jsonc: the v3 deleted_classes + v4 new_sqlite_classes reset is append-only history (already applied, cursor at v4 per the description); SANDBOX_TRANSPORT: "rpc" and the @cloudflare/sandbox 0.12.1 image bump line up.
  • Dockerfile: rg install is reasonable given the agent reaches for rg; --no-install-recommends + apt-list cleanup is clean. Files use tabs throughout (no oxfmt target for infra/, consistent with existing sources).
  • Changeset/i18n/tests: correctly n/a — @emdash-cms/flue-review is private/unpublished, no admin UI, no test harness (author verified end-to-end instead).

Verification gaps (couldn't statically confirm, no node_modules)

I could not inspect the @flue/runtime or @cloudflare/sandbox type definitions. I reasoned around this: the author's stated tsc --noEmit pass implies createAgent's factory context exposes id, session.skill accepts a string, Sandbox.exec's return type has exitCode, and cloudflareSandbox(...) returns a SandboxFactory with createSessionEnv — any of those being wrong would be a type error. The one runtime assumption tsc can't cover is that the agent-factory id equals the workflow ctx.id (so setup and the agent share one container); the description's end-to-end claim ("traces the diff against the checkout") only holds if they match, so I'm taking that on the author's verification.

Conclusion

Implementation is solid and the fix is well-targeted. One low-confidence robustness suggestion below on the session-env wrapping pattern; nothing blocking.

Comment on lines +77 to +80
return {
...sessionEnv,
exec: (command, execOptions) => exec(command, { ...execOptions, signal: undefined }),
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] The wrapper builds a fresh plain object via { ...sessionEnv, exec } and returns that. Object spread copies only own enumerable properties, so if Flue's SessionEnv is a class instance this drops every prototype method and every private (#) field from the object Flue's runtime (or the agent's tools) subsequently calls against. The exercised path (the agent's shell tools → exec) survives because exec is the one member you re-define, but any other session-env method reached now or by a future Flue version would resolve to undefined or throw on private-field access.

I couldn't confirm this breaks today (no node_modules to inspect the SessionEnv shape, and the e2e verification only exercises exec), so treat this as low-confidence. If you want to remove the latent footgun, shadow exec on the original object so its prototype, private fields, and identity are preserved (or, if the framework returns a frozen instance, wrap it in a Proxy that intercepts only exec):

Suggested change
return {
...sessionEnv,
exec: (command, execOptions) => exec(command, { ...execOptions, signal: undefined }),
};
sessionEnv.exec = (command, execOptions) => exec(command, { ...execOptions, signal: undefined });
return sessionEnv;

@github-actions github-actions Bot added review/awaiting-author Reviewed; waiting on the author to respond and removed review/needs-review No maintainer or bot review yet labels Jun 17, 2026
@ascorbic ascorbic merged commit b9139ee into main Jun 17, 2026
48 of 49 checks passed
@ascorbic ascorbic deleted the fix/flue-review-sandbox-hang branch June 17, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:review Trigger an emdashbot code review on this PR review/awaiting-author Reviewed; waiting on the author to respond size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant