fix(core): reclaim abandoned init locks to prevent isolate-poisoning deadlock by ascorbic · Pull Request #1405 · emdash-cms/emdash

ascorbic · 2026-06-11T13:35:04Z

What does this PR do?

Fixes an isolate-poisoning deadlock in runtime and database initialization on workerd.

Mechanism: getRuntime() (middleware) and getDatabase() (runtime) guard cold-isolate initialization with single-owner state — a runtimeInitializing boolean and a shared dbInitPromise respectively. If the request that owns the init is cancelled mid-await (client disconnects during cold migrations on contended D1), workerd tears down its continuation and the finally that releases the guard never runs. The guard is stuck forever: every subsequent request in that isolate spins in the wait path (or awaits a never-settling promise) until the platform kills it at the wall limit — observed in production as 524s at exactly ~100s, with cf-cache-status: BYPASS, zero bytes, and no Server-Timing. The isolate stays poisoned until eviction, and the CDN cache's request coalescing amplifies each poisoned fill to every concurrent visitor of that URL.

External reproduction during diagnosis: probes that aborted in-flight cold requests drove the hang rate on a test deployment from ~17% to 44% of uncached requests — each abort during a cold init manufactured another poisoned isolate. This is self-reinforcing in production: slow cold init → user abandons → poisoned isolate → more 524s.

Fix: both sites now use a reclaimable init lock (packages/core/src/utils/init-lock.ts):

Waiters poll instead of awaiting a cross-request promise (which workerd flags and which can never settle if the owner dies).
The owner records a claim timestamp; a waiter that finds the lock held past a 15s deadline assumes the owner is dead, reclaims the lock, and runs init itself — the isolate self-heals.
Waiters give up with an error after 30s, so the worst case degrades to an error response instead of a 100-second hang.

TDD: the test simulating the abandoned owner (init never settles, cleanup never runs) was written first and fails against the old boolean/shared-promise shape.

Closes #1274

Type of change

Checklist

I have read CONTRIBUTING.md
pnpm typecheck passes (packages/core; the repo-wide run fails on a pre-existing unbuilt-workspace error in registry-client, unrelated to this change)
pnpm lint passes (zero new diagnostics vs. the pre-existing baseline)
pnpm test passes — core unit (174 files / 2,639 tests) and integration (70 files / 1,039 tests) suites
pnpm format has been run
I have added/updated tests for my changes (if applicable)
User-visible strings in the admin UI are wrapped for translation (if applicable). n/a — no admin UI strings.
I have added a changeset (if this PR changes a published package)
New features link to an approved Discussion: https://github.com/emdash-cms/emdash/discussions/... n/a — bug fix.

AI-generated code disclosure

This PR includes AI-generated code — model/tool: Claude Fable 5 (Claude Code)

Screenshots / test output

 Test Files  174 passed (174)        # tests/unit
      Tests  2639 passed (2639)

 Test Files  70 passed (70)          # tests/integration
      Tests  1039 passed (1039)

New suite: packages/core/tests/unit/utils/init-lock.test.ts (6 tests) — cached short-circuit, single init shared by concurrent waiters, stale-owner reclaim after deadline (the production deadlock), lock release on init failure, bounded max wait, waiter picks up owner's cached value.

Production fingerprint this fixes (test deployment, cache-busted requests): 524 at ~100,0xx ms, cf-cache-status: BYPASS, 0 bytes, no Server-Timing; bimodal isolates (healthy ones serve ~300–500ms misses throughout).

🤖 Generated with Claude Code

Try this PR

Open a fresh playground →

A full working EmDash site, deployed from this branch. Each visit gets its own session-scoped sandbox: no login needed and no shared state. Try the admin, edit content, hit the public site.

_{Tracks fix/init-lock-poisoned-isolate. Updated automatically when the playground redeploys.}

…deadlock If the request owning a cold-isolate init was cancelled mid-await (client disconnect during migrations), workerd tears down its continuation and the finally that releases the init guard never runs. The stuck guard (runtimeInitializing boolean / shared dbInitPromise) then deadlocked every subsequent request in the isolate until platform wall-limit (524 at 100s), poisoning the isolate until eviction. Replace both guards with a reclaimable lock: waiters poll (never awaiting a cross-request promise), a stale owner is reclaimed after a 15s deadline, and waiters give up with an error after 30s instead of hanging. Closes #1274 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

changeset-bot · 2026-06-11T13:35:09Z

🦋 Changeset detected

Latest commit: 90711fd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages

Name	Type
emdash	Patch
@emdash-cms/cloudflare	Patch
@emdash-cms/sandbox-workerd	Patch
@emdash-cms/fixture-perf-site	Patch
@emdash-cms/perf-demo-site	Patch
@emdash-cms/cache-demo-site	Patch
@emdash-cms/admin	Patch
@emdash-cms/auth	Patch
@emdash-cms/blocks	Patch
@emdash-cms/gutenberg-to-portable-text	Patch
@emdash-cms/x402	Patch
create-emdash	Patch
@emdash-cms/auth-atproto	Patch
@emdash-cms/plugin-embeds	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

cloudflare-workers-and-pages · 2026-06-11T13:37:09Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
✅ Deployment successful! View logs	docs	`6f8ab6d`	Jun 11 2026, 03:18 PM

pkg-pr-new · 2026-06-11T13:37:52Z

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@1405

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@1405

@emdash-cms/auth-atproto

npm i https://pkg.pr.new/@emdash-cms/auth-atproto@1405

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@1405

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@1405

@emdash-cms/contentful-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/contentful-to-portable-text@1405

emdash

npm i https://pkg.pr.new/emdash@1405

create-emdash

npm i https://pkg.pr.new/create-emdash@1405

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@1405

@emdash-cms/plugin-cli

npm i https://pkg.pr.new/@emdash-cms/plugin-cli@1405

@emdash-cms/plugin-types

npm i https://pkg.pr.new/@emdash-cms/plugin-types@1405

@emdash-cms/registry-client

npm i https://pkg.pr.new/@emdash-cms/registry-client@1405

@emdash-cms/registry-lexicons

npm i https://pkg.pr.new/@emdash-cms/registry-lexicons@1405

@emdash-cms/sandbox-workerd

npm i https://pkg.pr.new/@emdash-cms/sandbox-workerd@1405

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@1405

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@1405

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@1405

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@1405

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@1405

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@1405

@emdash-cms/plugin-field-kit

npm i https://pkg.pr.new/@emdash-cms/plugin-field-kit@1405

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@1405

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@1405

commit: 6f8ab6d

cloudflare-workers-and-pages · 2026-06-11T13:38:16Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
✅ Deployment successful! View logs	emdash-playground	`6f8ab6d`	Jun 11 2026, 03:20 PM

cloudflare-workers-and-pages · 2026-06-11T13:38:19Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
✅ Deployment successful! View logs	emdash-demo-cache	`6f8ab6d`	Jun 11 2026, 03:19 PM

… anchoring, globalThis singletons - Derive the db init deadline from MIGRATION_RACE_WAIT_MS (now exported) so a healthy owner blocking in waitForConcurrentMigrator is never presumed dead: DB_INIT_DEADLINE_MS = race wait + 20s. The outer runtime lock uses DB_INIT_DEADLINE_MS + 15s so nested deadlines compose. - Default maxWaitMs is now deadlineMs + 15s headroom so a waiter always survives long enough to reclaim a dead owner before giving up. - Anchor the in-flight init promise via after()/waitUntil: a request cancelled mid-init no longer abandons the work — init completes, populates the cache, and releases the lock, preventing the poisoning instead of recovering from it. Reclaim remains as the net. - Move the runtime singleton + lock and db cache + lock onto globalThis behind Symbol.for keys (same reasoning as the setup-verified flag): bundler-duplicated SSR chunks would otherwise get independent locks and caches — multiple runtimes and cron schedulers per isolate. - New tests: live-but-slow owner reclaimed at deadline (init runs at most twice, cache converges) and anchor receives a rejection-swallowed copy. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ascorbic · 2026-06-11T13:55:50Z

Review addressed in 85bc8b4:

(a) Deadline composition — MIGRATION_RACE_WAIT_MS is now exported from the runner and the db lock deadline derives from it (DB_INIT_DEADLINE_MS = MIGRATION_RACE_WAIT_MS + 20_000 = 30s), so they can't drift. The outer runtime lock uses DB_INIT_DEADLINE_MS + 15_000 (45s) — inner strictly less than outer. Waiter maxWaitMs now defaults to deadlineMs + 15s so a waiter always survives long enough to reclaim a dead owner.

(b) waitUntil anchoring — initWithLock gained an anchor option; both call sites pass (p) => after(() => p) (the existing after() helper resolves the host's waitUntil via the virtual module). A cancelled owner's init now completes under waitUntil, populates the cache, and releases the lock — prevention first, reclaim as the net. The anchored copy is rejection-swallowed so a failed init can't surface as an unhandled rejection in the host.

(c) globalThis singletons — runtime instance + lock and db cache + lock now live behind Symbol.for keys (emdash:runtime-holder, emdash:db-cache), same reasoning as the adjacent setup-verified flag. Duplicated SSR chunks share one lock/cache; no more double-runtime/double-cron hazard.

(d) Slow-live-owner test — added: owner slower than the deadline, waiter reclaims; asserts init runs at most twice, both callers resolve, cache converges, and a third caller is served from cache. Plus an anchor test.

(e) Date.now() guard comment added (clock only advances across I/O in workerd — the sleep is load-bearing). Confirmed the doInit path degrades gracefully on a maxWait throw: the existing catch at the runtime-init site logs and continues, leaving locals.emdash unset — admin/API routes return the structured NOT_CONFIGURED error rather than a raw 500.

Also worth stating for the record (per review): reclaim-induced concurrent migration lands in the already-handled waitForConcurrentMigrator path — the runner detects the UNIQUE-constraint race and waits it out, which is what makes polling-reclaim safe here.

Suites: 2,641 unit + 1,039 integration pass; core typecheck and lint clean.

Copilot

Pull request overview

Fixes a workerd isolate “poisoning” deadlock during cold initialization by replacing non-reclaimable init guards (boolean/shared promise) with a reclaimable, polling-based init lock that can be anchored via after()/waitUntil and reclaimed after a deadline.

Changes:

Add initWithLock / createInitLock utility (with unit tests) to support reclaimable, polling-based init.
Update runtime DB initialization and Astro middleware runtime initialization to use the new init lock and store holders on globalThis via Symbol.for.
Export MIGRATION_RACE_WAIT_MS for deadline derivation and add a changeset entry.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/core/src/utils/init-lock.ts	Introduces reclaimable init lock helper used to avoid isolate-poisoning deadlocks.
packages/core/src/emdash-runtime.ts	Switches DB singleton initialization to `initWithLock`; adds globalThis-backed db holder + derived deadline constant.
packages/core/src/astro/middleware.ts	Switches runtime singleton initialization to `initWithLock`; adds globalThis-backed runtime holder + derived deadline.
packages/core/src/database/migrations/runner.ts	Exports `MIGRATION_RACE_WAIT_MS` for use in init-lock deadline derivation.
packages/core/tests/unit/utils/init-lock.test.ts	Adds unit coverage for reclaim, timeout, anchoring, and waiter behaviors.
packages/core/tests/unit/astro/middleware-prerender.test.ts	Updates mocks/reset logic to account for new exported deadline + global singleton key.
.changeset/init-lock-reclaim.md	Publishes patch note describing the deadlock fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+export interface InitLock {
+	/** Epoch ms when the current owner claimed the lock, or null when free. */
+	ownerStartedAt: number | null;
+}
+
+export function createInitLock(): InitLock {
+	return { ownerStartedAt: null };
+}


+		const ownerStartedAt = lock.ownerStartedAt;
+		if (ownerStartedAt === null || Date.now() - ownerStartedAt > deadlineMs) {
+			// Free, or the owner has been gone past the deadline — claim it.
+			// Synchronous between awaits, so two waiters can't both claim.
+			lock.ownerStartedAt = Date.now();
+			try {
+				const initPromise = init();
+				options?.anchor?.(
+					initPromise.then(
+						() => undefined,
+						() => undefined,
+					),
+				);
+				return await initPromise;
+			} finally {
+				// If this request dies mid-init unanchored this never runs;
+				// the next waiter reclaims after deadlineMs instead.
+				lock.ownerStartedAt = null;
+			}
+		}


Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

emdashbot

This is a well-scoped bug fix that correctly diagnoses and addresses a real production issue: workerd cancels can strand an init lock, poisoning the isolate. The reclaimable-polling-lock approach is the right one for this runtime, and moving the singletons to globalThis behind Symbol.for follows the repo's established pattern. Tests are thorough and cover the dead-owner reclaim path.

I found one correctness bug in the lock implementation and a test that happens to pass without proving the property it advertises:

initWithLock finally clobbers reclaimed locks. The finally block unconditionally sets lock.ownerStartedAt = null. If the original owner was live-but-slow and a waiter reclaimed the lock, the original owner's finally wipes out the new owner's claim. A third waiter that arrives while the reclaimer is still running can then claim the lock and start a third init. For the runtime lock that means duplicate cron schedulers / sandbox runners; for the DB lock it means duplicate Kysely instances and competing migrators. The fix is a claim-generation counter: only the current owner may clear the lock.
The "at most twice" test doesn't actually prove the property. It passes because the reclaiming waiter's init is instantaneous, so the original owner's harmless clobber happens after the lock was already released. A third waiter that arrives while both the original owner and the reclaimer are still in flight would trigger a third init, but the test never exercises that overlap.
Minor: setupFlagStore is used lexically before its declaration in middleware.ts. Not a runtime bug (the function is called after module evaluation), but it's confusing ordering; moving the variable above getRuntimeHolder would make it consistent with the rest of the file.

The changeset is correct, the PR type is appropriate, and the approach is sound once the clobber race is closed.

emdashbot · 2026-06-11T14:41:57Z

+		if (ownerStartedAt === null || Date.now() - ownerStartedAt > deadlineMs) {
+			// Free, or the owner has been gone past the deadline — claim it.
+			// Synchronous between awaits, so two waiters can't both claim.
+			lock.ownerStartedAt = Date.now();


[needs fixing] The finally block unconditionally sets lock.ownerStartedAt = null. If the original owner was live-but-slow and a waiter reclaimed the lock in the meantime, the original owner's finally clobbers the new owner's claim. A third waiter that arrives while the reclaimer is still running can then claim the lock and start a third concurrent init — for the runtime lock that means duplicate cron schedulers and sandbox runners.

The fix is to track claim ownership so only the current owner may clear the lock:

Suggested change

lock.ownerStartedAt = Date.now();

export interface InitLock {

ownerStartedAt: number | null;

generation: number;

}

export function createInitLock(): InitLock {

return { ownerStartedAt: null, generation: 0 };

}

And in initWithLock:

Suggested change

lock.ownerStartedAt = Date.now();

if (ownerStartedAt === null || Date.now() - ownerStartedAt > deadlineMs) {

const myGeneration = lock.generation + 1;

lock.generation = myGeneration;

lock.ownerStartedAt = Date.now();

try {

const initPromise = init();

options?.anchor?.(

initPromise.then(

() => undefined,

() => undefined,

),

);

return await initPromise;

} finally {

if (lock.generation === myGeneration) {

lock.ownerStartedAt = null;

}

}

}

emdashbot · 2026-06-11T14:41:57Z

+
+		const owner = initWithLock(
+			lock,
+			() => cache,


[suggestion] This test happens to pass because the reclaiming waiter's init is instantaneous, so the original owner's finally runs after the lock was already released. It does not prove the "at most twice" property it advertises. To actually validate the claim, add a third waiter that arrives while both the original owner and the reclaimer are still in flight, and assert that it waits (or reclaims only after the reclaimer's deadline) rather than triggering initCalls === 3.

emdashbot · 2026-06-11T14:41:57Z

+	lock: InitLock;
+}
+
+function getRuntimeHolder(): RuntimeHolder {


[suggestion] setupFlagStore is declared on line 112, below this function that uses it. In JavaScript the function is hoisted and isn't called during module evaluation, so there's no runtime TDZ bug, but it's confusing ordering. Move setupFlagStore (and the SETUP_VERIFIED_KEY declaration next to it) above getRuntimeHolder so the variable is lexically declared before its first use.

A slow owner that finished (or failed) after a waiter had reclaimed the lock would unconditionally clear ownerStartedAt in its finally, releasing the reclaimer's claim and admitting a third concurrent init. Each claim now bumps a generation counter and release is gated on still being the current owner. Also: normalize init via Promise.resolve().then(init) so a synchronous throw still attaches the anchor before rejecting; move setupFlagStore above its first lexical use; add a regression test where the stale owner fails while the reclaimer is mid-init and a third caller arrives in the clobber window (fails with 3 inits before the fix, 2 after). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ascorbic · 2026-06-11T14:48:12Z

Both review findings addressed in 90711fd:

Clobber race (emdashbot #1, Copilot both comments) — confirmed real, and the suggested fix is what shipped: InitLock gains a monotonic generation claim counter, each claim bumps it, and the finally only clears ownerStartedAt while still the current owner. A slow stale owner finishing (or failing) after a reclaim can no longer release the reclaimer's claim.

Test gap (emdashbot #2) — you were right that the "at most twice" test passed by timing accident: the genuinely dangerous interleaving needs the stale owner to fail slowly (success populates the cache, which already shields later callers). Added the regression test with the full overlap: owner A rejects at t≈400 after waiter B reclaimed at t≈300, third caller C arrives at t≈440 in the clobber window while B is mid-init. Before the fix it fails with 3 inits; after, C waits for B and initCalls === 2.

Also picked up Copilot's secondary suggestion — init is now normalized via Promise.resolve().then(init) so a synchronous throw still attaches the waitUntil anchor before rejecting, and the ordering nit — setupFlagStore/SETUP_VERIFIED_KEY moved above getRuntimeHolder.

9/9 lock tests, 2,642 unit + 1,039 integration pass; typecheck/lint/format clean.

ascorbic · 2026-06-11T14:58:11Z

/review

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

+		async () => {
+			const deps = buildDependencies(config);
+			const runtime = await EmDashRuntime.create(deps, initTimings);
+			holder.instance = runtime;
+			return runtime;
+		},


+				holder.cache.set(cacheKey, db);
+				return db;


ask-bonk

Reviewed against 90711fd. The core fix is sound — the reclaimable polling lock is the right shape for workerd's cancellation semantics, the generation counter closes the clobber race that emdashbot and Copilot flagged, and the deadline composition (MIGRATION_RACE_WAIT_MS < DB_INIT_DEADLINE_MS < RUNTIME_INIT_DEADLINE_MS, with maxWaitMs = deadlineMs + 15s so a waiter always outlives the deadline) is internally consistent. Anchoring init via after() is the right belt to the reclaim's suspenders. Tests cover the production deadlock, the clobber race (with an actual interleaved C-arrives-during-B-init regression test), and the sync-throw path.

A few residual concerns worth flagging — none block the fix:

holder.instance / holder.cache cache clobber from a live-but-slow owner finishing after a reclaimer. The generation guard protects the lock release, but the cache write inside init (e.g. holder.instance = runtime in middleware.ts:223, holder.cache.set(cacheKey, db) in emdash-runtime.ts:1369) is not guarded. A slow owner that legitimately finishes after a reclaimer succeeds will overwrite the reclaimer's cached value with its own runtime/db — and now holder.instance points to an instance whose cronScheduler (NodeCronScheduler with real timers, on the Node host) was never the one in active use, and the orphan reclaimer's scheduler continues to fire. On the Workers target this is benign (PiggybackScheduler ticks via tickCron() on the active runtime only, orphans are quiet), but on Node it's a real timer leak per reclaim event. Existing runMigrations race recovery via waitForConcurrentMigrator means the db side won't double-apply schema, but it does end up with two distinct Kysely instances racing to be "the cached one".
The new clobber-race test asserts cache convergence but not which value wins. In tests/unit/utils/init-lock.test.ts, the live-slow-owner test asserts third === cache — but cache was last written by the slow owner ("slow-owner"), not the reclaimer. The test is correctly proving "init runs at most twice and the cache is non-null", but a future reader might mistake it for "the reclaimer's value wins". Worth a one-line comment that the cache-clobber behavior is intentional (and the next-best alternative is the generation-gated cache write in (1)).
after() registration is a microtask, not synchronous. waitUntilReady is a module-scope dynamic import; on workerd the resolution is effectively instantaneous, but the waitUntilReady.then(waitUntil => waitUntil(promise)) handoff in after() is still one microtask away. If the owning request is cancelled in the narrow window between after(() => initPromise) being called and that microtask running, the waitUntil registration may not actually land. Workers' microtask scheduling normally drains pending microtasks before teardown, so this is likely fine, but it's hard to verify without a real workerd integration test — and the reclaim path covers it as a backstop, which is the point. No action needed; just naming the assumption.

Approach and risk assessment are correct, and the fix degrades gracefully (waiter throws after maxWaitMs → existing catch in doInit logs and leaves locals.emdash unset → routes return NOT_CONFIGURED). Ship it; consider (1) and (2) as follow-ups.

ask-bonk · 2026-06-11T15:05:17Z

+		async () => {
+			const deps = buildDependencies(config);
+			const runtime = await EmDashRuntime.create(deps, initTimings);
+			holder.instance = runtime;


Cache write is not generation-gated. If a live-but-slow owner reaches this assignment after a waiter reclaimed and ran its own init, the slow owner's runtime overwrites the reclaimer's. Future requests then read an instance whose cronScheduler was never adopted by the active code path, while the reclaimer's scheduler keeps firing — a NodeCronScheduler timer leak per reclaim event on the Node host (benign on Workers, where PiggybackScheduler is dormant without tickCron()). Consider gating this write on the lock generation the same way the release is, e.g. expose the claim from initWithLock so init can check if (lock.generation === myClaim) holder.instance = runtime. Not a deadlock — the reclaimer's caller still gets the correct runtime via the return value — but it admits an orphan-scheduler leak that the rest of the fix is otherwise careful to avoid.

ask-bonk · 2026-06-11T15:05:27Z

-		} finally {
-			dbInitPromise = null;
-		}
+				holder.cache.set(cacheKey, db);


Same concern as the runtime holder write in middleware.ts — this holder.cache.set runs inside init, ungated by the lock generation. A slow-but-live owner finishing after a reclaimer replaces the reclaimer's cached Kysely with its own. Both Kyselys are functionally valid (D1 sessions / dialect-level state is per-instance, not shared), so this isn't correctness-breaking, but it does mean two runMigrations will run concurrently on the same physical DB (UNIQUE-constraint race is handled, so that part is safe) and the "cached" Kysely identity will flip mid-isolate. Worth either gating on the generation or documenting that the second migrator path is the intended degradation.

ask-bonk · 2026-06-11T15:05:35Z

+			},
+			{ deadlineMs: 100, pollMs: 10 },
+		);
+		expect(third).toBe(cache);


Subtle: this asserts third === cache, but cache was last written by the slow owner ("slow-owner") clobbering the reclaimer's "reclaimer" value. The test correctly proves "init runs at most twice and the cache is populated" — but a future reader might read it as "the reclaimer's value wins". It does not. Worth a comment that the slow-owner cache write is intentionally last-writer-wins (and that holder-side cache writes in middleware/runtime inherit that behavior). Without the comment this test will mislead anyone trying to understand which runtime/db survives a reclaim race.

ask-bonk · 2026-06-11T15:05:40Z

Review posted. Summary:

1 review with overall body (COMMENT event)
3 line-anchored comments:
- middleware.ts:223 — holder.instance cache write not generation-gated
- emdash-runtime.ts:1369 — holder.cache.set same concern
- tests/unit/utils/init-lock.test.ts:182 — test assertion subtly tests last-writer-wins cache, worth a comment

The fix itself is correct and well-tested. The remaining concerns are non-blocking edge cases on the recovery path (live-but-slow owner finishing after a reclaim), most of which are benign on the Workers deployment target and only meaningful on Node hosts.

github run

A reclaimed-but-live init that finished late could overwrite the reclaimer's published runtime/db, flipping the singleton back to a stale instance and orphaning the active cron scheduler (a timer leak on Node). initWithLock now passes an isCurrentClaim predicate to init; both call sites publish only while still the current owner, and a superseded runtime stops its own cron scheduler instead of leaking it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ascorbic · 2026-06-11T15:11:59Z

Follow-up round addressed in 6f8ab6d — all three reviewers converged on the publication side of the same race, and you're right that the cache write needed the same gating as the release:

Generation-gated publication — initWithLock now passes an isCurrentClaim predicate into init. Both call sites publish only while still the current owner, so a reclaimed-but-live init finishing late can no longer flip the singleton back to a stale instance.

The orphan-scheduler leak (ask-bonk) — went one better than suppressing the write: a superseded runtime now calls its own runtime.stopCron() (public API, documented for exactly this shutdown case), so the loser's scheduler is stopped rather than leaked — on either side of the race. The superseded instance is still returned to the request that built it, since it's fully functional for that request.

Db cache identity flip (ask-bonk/Copilot) — same gating on holder.cache.set; the concurrent-second-migrator degradation is unchanged and remains covered by the runner's UNIQUE-race handling.

Test readability (ask-bonk) — added the requested comment: the existing convergence test's inits write the cache ungated, so it demonstrates last-writer-wins explicitly (expect(cache).toBe("slow-owner") is now asserted, not implied), with a pointer to the new gated-publication test that shows real-caller behavior (reclaimer's value survives, each caller still gets its own return value, claim checks observed as [true, false]).

10/10 lock tests, 2,643 unit + 1,039 integration, typecheck/lint/format clean.

github-actions Bot added area/core size/L cla: signed labels Jun 11, 2026

github-actions Bot added size/XL and removed size/L labels Jun 11, 2026

ascorbic marked this pull request as ready for review June 11, 2026 14:24

Copilot AI review requested due to automatic review settings June 11, 2026 14:24

Copilot started reviewing on behalf of ascorbic June 11, 2026 14:24 View session

ascorbic added the bot:review Trigger an emdashbot code review on this PR label Jun 11, 2026

github-actions Bot added the review/needs-review No maintainer or bot review yet label Jun 11, 2026

Copilot AI reviewed Jun 11, 2026

View reviewed changes

docs: user-facing changeset wording

7b3f499

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions Bot added review/needs-rereview Author pushed changes since the last review and removed review/needs-review No maintainer or bot review yet labels Jun 11, 2026

emdashbot Bot reviewed Jun 11, 2026

View reviewed changes

github-actions Bot added review/awaiting-author Reviewed; waiting on the author to respond and removed review/needs-rereview Author pushed changes since the last review labels Jun 11, 2026

github-actions Bot added review/needs-rereview Author pushed changes since the last review and removed review/awaiting-author Reviewed; waiting on the author to respond labels Jun 11, 2026

ascorbic requested a review from Copilot June 11, 2026 14:56

Copilot started reviewing on behalf of ascorbic June 11, 2026 14:56 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

ask-bonk Bot reviewed Jun 11, 2026

View reviewed changes

github-actions Bot added review/awaiting-author Reviewed; waiting on the author to respond and removed review/needs-rereview Author pushed changes since the last review labels Jun 11, 2026

ask-bonk Bot reviewed Jun 11, 2026

View reviewed changes

github-actions Bot added review/needs-rereview Author pushed changes since the last review and removed review/awaiting-author Reviewed; waiting on the author to respond labels Jun 11, 2026

ascorbic merged commit bdabff7 into main Jun 11, 2026
45 checks passed

ascorbic deleted the fix/init-lock-poisoned-isolate branch June 11, 2026 15:51

emdashbot Bot mentioned this pull request Jun 11, 2026

ci: release #1360

Merged

Conversation

ascorbic commented Jun 11, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Checklist

AI-generated code disclosure

Screenshots / test output

Try this PR

Uh oh!

changeset-bot Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

pkg-pr-new Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

ascorbic commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

emdashbot Bot left a comment

Choose a reason for hiding this comment

Uh oh!

emdashbot Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

emdashbot Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

emdashbot Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

ascorbic commented Jun 11, 2026

Uh oh!

ascorbic commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ask-bonk Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot commented Jun 11, 2026

Uh oh!

ascorbic commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

ascorbic commented Jun 11, 2026 •

edited by github-actions Bot

Loading

changeset-bot Bot commented Jun 11, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 11, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 11, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 11, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 11, 2026 •

edited

Loading