fix(flue-review): unstick the reviewer on @flue 1.0 (sandbox AbortSignal hang)#1520
Conversation
The reviewer hung on every run after the @flue 1.0 upgrade. Root cause: @flue's cloudflareSandbox adapter forwards an AbortSignal to getSandbox(...).exec(), a Worker->Durable Object RPC call the signal can't cross, so every container command (our git checkout and the agent's own bash/grep tools) hung forever. - Wrap cloudflareSandbox in a SandboxFactory that strips the AbortSignal before exec crosses the DO RPC boundary (we don't need exec cancellation; withCapacityRetry still bounds the model call). - Check out the PR before init() via the raw sandbox stub so @flue's init-time workspace scan discovers the repo's AGENTS.md and .agents/skills. - Bump agents 0.13->0.14 and @cloudflare/sandbox 0.10->0.12 (+image), and set SANDBOX_TRANSPORT=rpc (the http/websocket transports are deprecated). - Reset the FlueRegistry run-index DO: upgrading @flue in place left its flue_registry_runs table with a pre-1.0 owner_kind NOT NULL column that 1.0 no longer writes, breaking every run-index write. Applied via a two-step deploy (drop with the binding removed, then recreate + rebind).
|
Scope checkThis PR changes 865 lines across 8 files. Large PRs are harder to review and more likely to be closed without review. If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs. See CONTRIBUTING.md for contribution guidelines. |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | 69e97ac | Jun 17 2026, 12:55 PM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-playground | 69e97ac | Jun 17 2026, 12:55 PM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-demo-cache | 69e97ac | Jun 17 2026, 12:55 PM |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/auth-atproto
@emdash-cms/blocks
@emdash-cms/cloudflare
@emdash-cms/contentful-to-portable-text
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/plugin-cli
@emdash-cms/plugin-types
@emdash-cms/registry-client
@emdash-cms/registry-lexicons
@emdash-cms/sandbox-workerd
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-field-kit
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
A thorough agentic review makes many tool calls and turns and legitimately runs well over 6 minutes (the per-file stat round-trips through the container RPC alone add up), so the 6m per-attempt timeout killed real reviews mid-flight. It was a guard against the now-fixed sandbox hang, not a budget for the review. Flue's submission durability still caps the whole run at 1h.
There was a problem hiding this comment.
Approach
This is the right change for the right problem. The root-cause analysis (an AbortSignal created in the Worker can't cross the Sandbox DO RPC boundary, so @flue's always-attached signal hangs every exec) is credible and well-documented, and the three-part fix matches it: strip the signal in a SandboxFactory wrapper, move the git checkout ahead of init() so Flue's workspace scan actually finds AGENTS.md/.agents/skills/, and reset the FlueRegistry DO whose pre-1.0 schema no longer accepts 1.0 writes. The signal-strip also correctly covers the agent's own shell tool calls (same exec path), and the model-call timeout in withCapacityRetry is left intact (it bounds the inference call, which is a Worker→Workers-AI call, not a DO RPC). For an internal, unpublished infra tool this is sound and fits the codebase. Note the PR also bundles a model swap (kimi → glm-5.2) and a 6m→20m per-attempt timeout bump beyond the stated "hang fix"; that's reasonable latitude for the maintainer's own tool, just flagging it's wider than the title.
What I checked
review.ts: traced thereviewSandboxwrapper, the rawcontainerStub.exec(setup)pre-init path, thecreateAgent(({ id, env }) => …)wiring,session.skill("review", …), and thefinallyreaction cleanup. Security is intact —assertSafestill validatesowner/repo(NAME) andbaseRef/headRef(REF, rejects..) before any interpolation into the git script, andprNumberis a positive int, so thegit fetch pull/N/headandcloneUrlare injection-safe. Thesignal: undefinedoverride reaches the SDK via the options bag, consistent with the documented hang mechanism.github.ts: thesummary.trim() || FALLBACK_SUMMARYguard correctly prevents the GitHub 422-on-blank-COMMENT-body failure on both the with-comments and body-only paths, and the body-only fallback now folds findings inline (previously dropped). Logic is correct; the first request failing (422) means no review was created, so the retry can't duplicate.wrangler.jsonc: thev3 deleted_classes+v4 new_sqlite_classesreset is append-only history (already applied, cursor atv4per the description);SANDBOX_TRANSPORT: "rpc"and the@cloudflare/sandbox0.12.1 image bump line up.Dockerfile:rginstall is reasonable given the agent reaches forrg;--no-install-recommends+ apt-list cleanup is clean. Files use tabs throughout (nooxfmttarget forinfra/, consistent with existing sources).- Changeset/i18n/tests: correctly n/a —
@emdash-cms/flue-reviewis private/unpublished, no admin UI, no test harness (author verified end-to-end instead).
Verification gaps (couldn't statically confirm, no node_modules)
I could not inspect the @flue/runtime or @cloudflare/sandbox type definitions. I reasoned around this: the author's stated tsc --noEmit pass implies createAgent's factory context exposes id, session.skill accepts a string, Sandbox.exec's return type has exitCode, and cloudflareSandbox(...) returns a SandboxFactory with createSessionEnv — any of those being wrong would be a type error. The one runtime assumption tsc can't cover is that the agent-factory id equals the workflow ctx.id (so setup and the agent share one container); the description's end-to-end claim ("traces the diff against the checkout") only holds if they match, so I'm taking that on the author's verification.
Conclusion
Implementation is solid and the fix is well-targeted. One low-confidence robustness suggestion below on the session-env wrapping pattern; nothing blocking.
| return { | ||
| ...sessionEnv, | ||
| exec: (command, execOptions) => exec(command, { ...execOptions, signal: undefined }), | ||
| }; |
There was a problem hiding this comment.
[suggestion] The wrapper builds a fresh plain object via { ...sessionEnv, exec } and returns that. Object spread copies only own enumerable properties, so if Flue's SessionEnv is a class instance this drops every prototype method and every private (#) field from the object Flue's runtime (or the agent's tools) subsequently calls against. The exercised path (the agent's shell tools → exec) survives because exec is the one member you re-define, but any other session-env method reached now or by a future Flue version would resolve to undefined or throw on private-field access.
I couldn't confirm this breaks today (no node_modules to inspect the SessionEnv shape, and the e2e verification only exercises exec), so treat this as low-confidence. If you want to remove the latent footgun, shadow exec on the original object so its prototype, private fields, and identity are preserved (or, if the framework returns a frozen instance, wrap it in a Proxy that intercepts only exec):
| return { | |
| ...sessionEnv, | |
| exec: (command, execOptions) => exec(command, { ...execOptions, signal: undefined }), | |
| }; | |
| sessionEnv.exec = (command, execOptions) => exec(command, { ...execOptions, signal: undefined }); | |
| return sessionEnv; |
What does this PR do?
Fixes the automated PR reviewer (
infra/flue-review), which hung on every run after the@flue1.0 upgrade.Root cause:
@flue'scloudflareSandboxadapter forwards anAbortSignaltogetSandbox(...).exec(), which is a Worker → Durable Object RPC call. AnAbortSignalcreated outside the target DO can't cross that boundary, so the call never dispatches the command and hangs forever.@flueattaches a signal to every shell call (viacreateCallHandle) and to the agent's own bash/grep tools, so the workflow died right after the workspace scan, before the git checkout could run. Verified: an identicalexecwith no signal returns in ~50ms; with a live signal it never returns — across@cloudflare/sandbox0.10.3/0.12.1 and both thehttpandrpctransports (so it is not a version/transport issue). Standalone repro: https://github.com/ascorbic/flue-sandbox-reproThis PR:
cloudflareSandbox(...)in aSandboxFactorythat drops theAbortSignalbeforeexeccrosses the DO RPC boundary. We don't need cooperative exec cancellation;withCapacityRetrystill bounds the model call and the container has its own lifecycle. This also covers the agent's own tool calls (same path).init()(via the raw sandbox stub, no signal) so@flue's init-time workspace scan discovers the repo'sAGENTS.mdand.agents/skills/. Previously the checkout ran afterinit(), so the scan saw an empty/workspaceand the review skill's "check against AGENTS.md conventions" had no AGENTS.md.agents0.13 → 0.14 (required by@flue1.0),@cloudflare/sandbox0.10.3 → 0.12.1 (+ matching image), andSANDBOX_TRANSPORT: "rpc"(thehttp/websockettransports are deprecated, removed after 2026-07-09).@fluein place left the run-index DO'sflue_registry_runstable with a pre-1.0owner_kind NOT NULLcolumn that 1.0 no longer writes, so every run-index write failed the constraint (non-fatal, but log spam + brokenflue logs). Cloudflare won'tdelete-classa bound DO, so this is applied as a two-step deploy: deploy once with theFLUE_REGISTRYbinding removed +deleted_classes, then a normal deploy recreates it fresh + rebinds.Closes: n/a (internal infra, no tracking issue).
Type of change
Checklist
pnpm typecheckpasses (tsc --noEmitininfra/flue-review)pnpm lintpasses (oxlint, 0 warnings/errors on the changed files)pnpm testpasses — n/a:flue-reviewhas no test suite. Verified end-to-end instead: triggered reviews against a localflue devand the deployed Worker; the agent now checks out the PR, discovers AGENTS.md, and traces the diff withgit/rgwithout hanging.pnpm format— n/a: oxfmt targetspackages//demos/, notinfra/. Files use tabs, consistent with the existing sources.@emdash-cms/flue-reviewis private/unpublished.AI-generated code disclosure
Screenshots / test output
Deployed and verified on the live Worker (
emdash-flue-review): the registry-reset two-step deploy applied cleanly (v3-reset-registry-drop→v4-reset-registry-new, cursor atv4), the Worker is healthy (401on unsigned webhook), and a triggered review checks out the repo and runsgit diff/rgagainst the checkout with no hang.Try this PR
Open a fresh playground →
A full working EmDash site, deployed from this branch. Each visit gets its own session-scoped sandbox: no login needed and no shared state. Try the admin, edit content, hit the public site.
Tracks
fix/flue-review-sandbox-hang. Updated automatically when the playground redeploys.