Skip to content

Session KV lookup hangs indefinitely, blocking entire page response for logged-in users #1274

@marcusbellamyshaw-cell

Description

@marcusbellamyshaw-cell

Product: EmDash CMS
Version: 0.15.0 (first observed in 0.14.0, confirmed still present in 0.15.0)
Adapter: @astrojs/cloudflare (Cloudflare Workers + KV)
Date: 2026-05-29


Summary

When a user has an active EmDash admin session cookie, page loads occasionally hang indefinitely — the browser tab spinner spins forever and no HTML is delivered. The issue is intermittent and clears on its own after some time. It never occurs in private/incognito windows (no cookie).

The issue affects any browser (confirmed on Chrome desktop and Chrome iOS).


Steps to reproduce

  1. Log into the EmDash admin panel in a browser (establishes a session cookie)
  2. Navigate to any public-facing page on the site (e.g. the homepage)
  3. Observe: the browser tab spinner runs indefinitely — no HTML is received, the page never renders
  4. Open the same URL in a private/incognito window (no session cookie) → page loads immediately and correctly
  5. Once the hang starts, all pages on the site hang for that browser — retrying does not help. The issue persists until the session cookie is cleared (e.g. via browser cache wipe). It can occur after hours of normal use with no obvious trigger.

Expected behaviour

If the session KV lookup is slow or unavailable, EmDash's auth middleware should either:

  • Time out after a reasonable threshold (e.g. 3–5 seconds) and continue the request with locals.user = null (degrade gracefully — treat the user as unauthenticated for this request), or
  • Return a partial/fallback response rather than holding the response stream open indefinitely

The public page should always load. Admin toolbar features can safely be absent when the session cannot be confirmed quickly.


Actual behaviour

The response stream is never completed. The Worker holds the connection open indefinitely (until Cloudflare's 30-second subrequest/CPU limit forces a termination). The user sees an empty browser tab with a spinning loading indicator.


Secondary Error: sandboxed plugin fires on every request

A separate error log reveals a second issue firing on every page load across the entire site:

EmDash: Sandboxed plugin atproto page:metadata error: Error: Storage collection not declared: records

This occurs on every GET request to any post or page, continuously throughout the day. The outcome: ok means it is non-fatal and pages load successfully. However:

  1. The atproto marketplace plugin is invoking a Worker-to-Worker call (via the LOADER sandboxed plugin binding) on every single page request
  2. The plugin is misconfigured — it requires a storage collection named records which has not been declared
  3. Normally this fails fast and is swallowed, but if the sandboxed LOADER Worker is under load, this call could stall instead of failing — which may be a contributing factor to the intermittent hangs

Immediate fix: either configure the atproto plugin with the required records storage collection, or uninstall it from the EmDash marketplace if it is not in use. Eliminating this unnecessary Worker-to-Worker call on every request reduces overhead and removes a potential hang vector.


Log Evidence

Cloudflare Workers observability traces confirm two hang windows, both exclusively affecting a single device: iPhone running Chrome (CriOS/149, Houston TX). All other devices — iPhone Safari, Windows Chrome, bots, uptime monitors — load normally throughout both windows.

Window 1: ~10:08–10:10pm CDT (03:08–03:10 UTC)

Time (UTC) URL Wall time CPU time Outcome UA
03:08:49 GET / 60,550 ms 88 ms canceled iPhone Chrome
03:09:16 GET / 4,100 ms 6 ms canceled iPhone Chrome
03:09:21 GET /_emdash/admin 60,149 ms 67 ms canceled iPhone Chrome
03:09:54 GET / 36,350 ms 43 ms canceled iPhone Chrome
03:10:28 GET / 1,508 ms 88 ms ✓ ok iPhone Safari
03:10:40 GET / 1,535 ms 116 ms ✓ ok iPhone Chrome

The iPhone Chrome load at 03:10:40 succeeded immediately after the user cleared their browser cache (removing the session cookie).

Window 2: ~11:05–11:13pm CDT (04:05–04:13 UTC)

Time (UTC) URL Wall time CPU time Outcome UA
04:05:01 GET / 30,000 ms 36 ms canceled iPhone Chrome
04:05:34 GET / 60,050 ms 48 ms canceled iPhone Chrome
04:08:32 GET /_emdash/admin/logout 5,800 ms 5 ms canceled iPhone Chrome
04:08:38 GET /_emdash/admin/logout 60,850 ms 56 ms canceled iPhone Chrome
04:10:11 GET /_emdash/admin/logout 50,600 ms 48 ms canceled iPhone Chrome
04:11:02 GET /_emdash/admin/logout 170,400 ms 158 ms canceled iPhone Chrome
04:11:08 GET / 60,100 ms 54 ms canceled iPhone Chrome
04:13:34 GET / 17,849 ms 16 ms canceled iPhone Chrome

Critical pattern across all hung requests: CPU time is 5–158 ms; wall time is 4,000–170,400 ms. The Worker executes a tiny amount of code then waits on I/O indefinitely. This is not CPU exhaustion — it is a blocking I/O call with no timeout.

All hung requests have no child spans in the trace data — no D1 queries, no KV reads, no R2 operations were recorded. This is consistent with either (a) the Worker stalling before reaching any binding, or (b) a binding call (e.g. KV session read) hanging mid-execution — in which case its span would never be emitted. A normal successful GET / completes in ~1.2 seconds with ~28 child spans.

Diagnosis

The symptom — browser tab spinner, not an in-page spinner — indicates the HTTP response itself never starts arriving, not a client-side JavaScript failure. Because the issue is:

  • Consistent when a session cookie is present
  • Never occurs without a session cookie (private window, other browsers without a session, external uptime monitors)
  • Affects any browser (confirmed on Chrome desktop and Chrome iOS)
  • Intermittent (sometimes fast, sometimes hangs)

The trace evidence rules out KV latency or slow D1 queries as the cause — the Worker stalls before any binding is accessed. This points to something in EmDash's request initialization path (module-level code, middleware setup, or sandboxed plugin bootstrap) that deadlocks or blocks indefinitely for certain requests, likely triggered by the presence of a session cookie.


Environment

  • Runtime: Cloudflare Workers (zone-routed, not workers.dev)
  • Session storage: Cloudflare KV (SESSION binding)
  • Database: Cloudflare D1
  • Astro output: server (full SSR, no static generation)

Impact

  • Affects only users with active admin sessions — public visitors are unaffected
  • During a hang window, the admin cannot load any page on the site in any browser while logged in, without switching to private mode or logging out
  • Workaround: clear browser cookies/cache for the site, which removes the session cookie and restores normal loading
  • No data loss; the site is fully functional for visitors without a session cookie

Suggested fix

Since the hang occurs before any binding is accessed, the fix likely needs to be in the request initialization path rather than in a specific binding call. A global request timeout guard would prevent indefinite hangs regardless of where in initialization the stall occurs:

// Pseudocode — exact implementation depends on EmDash internals
const INIT_TIMEOUT_MS = 5_000;

async function handleRequest(request: Request, env: Env, ctx: ExecutionContext) {
  const timeout = new Promise<Response>((resolve) =>
    setTimeout(() => resolve(new Response("Service unavailable", { status: 503 })), INIT_TIMEOUT_MS)
  );
  return Promise.race([processRequest(request, env, ctx), timeout]);
}

Alternatively, if the stall is in session validation specifically, a timeout with graceful fallback to unauthenticated state would prevent the hang while keeping the page functional for the visitor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions