QueryResource: don't resurrect disposed cache entries in next/error by jwatzman · Pull Request #5317 · facebook/relay

jwatzman · 2026-06-11T15:18:23Z

When a QueryResource cache entry is disposed while its fetch is still in flight, the next and error callbacks in _fetchAndSaveQuery call _getOrCreateCacheEntry, which silently recreates the entry in the LRU under the same cacheIdentifier. The resurrected entry has refcount 0 — no permanent retain was ever attached to it by QueryResource.retain, and no useEffect cleanup will ever fire. It outlives navigation, sits in the LRU until evicted by capacity, and gets reused on a later mount that happens to compute the same cacheIdentifier.

Reusing the zombie skips prepareWithIdentifier's _fetchAndSaveQuery path entirely: no environment.check, no fetch. By the time the zombie is reused, the store records uniquely retained by this query have likely been freed — released to the GC buffer when the original entry was disposed, then reclaimed by a later GC sweep triggered by an unrelated operation. The fragment reader walks into the data, hits a missing pageInfo or @connection-key tracker record (or any record selected only by this query), and returns undefined for the affected field. The application sees useLazyLoadQuery return a data ref whose fields are unexpectedly undefined, with no fetch, no error, and no suspension to recover.

Several scenarios can dispose an entry while its fetch is in flight:

React StrictMode, where the mount/cleanup/mount cycle of useLazyLoadQueryNode synchronously disposes the cache entry created during the initial render, then the maybeHiddenOrFastRefresh path triggers a forceUpdate that creates a new entry under a different cacheBreaker. The original fetch is still alive on the deduped observable; when it lands, the disposed entry is resurrected alongside the new one being updated. This is the most reliable trigger and is what made the bug observable to us.
Navigating away mid-fetch. A user clicks into a route that cache-misses, the fetch starts, the user navigates away before it completes. The route unmounts, the cache entry is disposed. When the fetch lands, the disposed entry is resurrected.
Concurrent rendering discarding a suspended subtree. A component throws a Promise from useLazyLoadQuery; React suspends. A parent transitions away before the fetch completes; the suspended render never commits. The temporary retain expires after TEMPORARY_RETAIN_DURATION_MS (5 minutes) and the entry is disposed. The fetch — started synchronously inside _fetchAndSaveQuery before the component suspended — continues to completion and resurrects the entry.
Offscreen / fast refresh, which produce similar dispose-while- fetching cycles. The maybeHiddenOrFastRefresh comment in useLazyLoadQueryNode already calls these out as the rationale for its forceUpdate path.

In all of these, the dispose path in createCacheEntry's SuspenseResource only unsubscribes the network observable for live queries; for regular queries the subscription stays alive precisely so late payloads can land. That is what allows the next/error callbacks to fire on a disposed entry. Cancelling the subscription on dispose is not the right fix — the fetch is deduped via fetchQueryDeduped and may also be attached to a live cache entry sharing the same fetch identifier (most obviously the post-forceUpdate entry in the StrictMode case), which still needs to receive its next.

The fix is to make next and error bail when the cache entry has been disposed, while still allowing them to create the entry on a first-time synchronous fire. The store still receives the data via the normalizer (execute.normalize.*), so any live entry for the same fetch is updated correctly; only the zombie write is skipped.

A naive this._cache.get(cacheIdentifier) == null → bail check conflates two different states: "the entry has been disposed" (the zombie case we want to suppress) and "the entry has not yet been created" (a synchronous observable that fires next/error during the .subscribe() call, before the post-subscribe creation block at the bottom of _fetchAndSaveQuery runs). The latter happens with cached or mocked transports — network-only with a synchronous fetch observable is the canonical case, and the test suite exercises it directly. Bailing there would drop the only payload the entry will ever see, leaving the post-subscribe block to cache a networkPromise that never resolves; the component then suspends indefinitely on success, or sees a Promise instead of the Error on failure.

To distinguish the two states, _fetchAndSaveQuery tracks a local entryHasBeenCreated flag, flipped to true at each site that populates the LRU for this cacheIdentifier (the shouldAllowRender block, the first-fire branch inside next/error, and the post-subscribe networkPromise creation). The callbacks bail only when the entry is missing and was previously created. First-time synchronous fires fall through to _getOrCreateCacheEntry exactly as before.

For an error arriving after the entry has been disposed, the error is dropped instead of being cached for later re-throw. This is correct: the entry has no useEffect retainers, so there is no consumer to read a cached error. A fresh mount with the same cacheIdentifier will trigger a new fetch and observe the error (or success) on its own.

The StrictMode case is easy to reproduce and likely accounts for most of the dev-time pain. The prod-side triggers (mid-fetch navigation, suspended-and-discarded renders) are observed in the wild but rare: they leave a zombie behind only if the user later revisits a mount that computes the same cacheIdentifier, and the records the query uniquely retains have been GC'd in the meantime. Most apps don't notice because most records are co-retained by multiple operations and survive GC regardless. Apps with at least one query that uniquely retains some records (e.g. a @connection(key: ...) selection whose records aren't selected by any other live query — common in chat-style features) will eventually hit it.

When a QueryResource cache entry is disposed while its fetch is still in flight, the `next` and `error` callbacks in `_fetchAndSaveQuery` call `_getOrCreateCacheEntry`, which silently recreates the entry in the LRU under the same cacheIdentifier. The resurrected entry has refcount 0 — no permanent retain was ever attached to it by `QueryResource.retain`, and no useEffect cleanup will ever fire. It outlives navigation, sits in the LRU until evicted by capacity, and gets reused on a later mount that happens to compute the same cacheIdentifier. Reusing the zombie skips `prepareWithIdentifier`'s `_fetchAndSaveQuery` path entirely: no `environment.check`, no fetch. By the time the zombie is reused, the store records uniquely retained by this query have likely been freed — released to the GC buffer when the original entry was disposed, then reclaimed by a later GC sweep triggered by an unrelated operation. The fragment reader walks into the data, hits a missing pageInfo or `@connection`-key tracker record (or any record selected only by this query), and returns `undefined` for the affected field. The application sees `useLazyLoadQuery` return a data ref whose fields are unexpectedly undefined, with no fetch, no error, and no suspension to recover. Several scenarios can dispose an entry while its fetch is in flight: - **React StrictMode**, where the mount/cleanup/mount cycle of `useLazyLoadQueryNode` synchronously disposes the cache entry created during the initial render, then the `maybeHiddenOrFastRefresh` path triggers a `forceUpdate` that creates a new entry under a different cacheBreaker. The original fetch is still alive on the deduped observable; when it lands, the disposed entry is resurrected alongside the new one being updated. This is the most reliable trigger and is what made the bug observable to us. - **Navigating away mid-fetch**. A user clicks into a route that cache-misses, the fetch starts, the user navigates away before it completes. The route unmounts, the cache entry is disposed. When the fetch lands, the disposed entry is resurrected. - **Concurrent rendering discarding a suspended subtree**. A component throws a Promise from `useLazyLoadQuery`; React suspends. A parent transitions away before the fetch completes; the suspended render never commits. The temporary retain expires after `TEMPORARY_RETAIN_DURATION_MS` (5 minutes) and the entry is disposed. The fetch — started synchronously inside `_fetchAndSaveQuery` *before* the component suspended — continues to completion and resurrects the entry. - **Offscreen / fast refresh**, which produce similar dispose-while- fetching cycles. The `maybeHiddenOrFastRefresh` comment in `useLazyLoadQueryNode` already calls these out as the rationale for its forceUpdate path. In all of these, the dispose path in `createCacheEntry`'s `SuspenseResource` only unsubscribes the network observable for live queries; for regular queries the subscription stays alive precisely so late payloads can land. That is what allows the `next`/`error` callbacks to fire on a disposed entry. Cancelling the subscription on dispose is not the right fix — the fetch is deduped via `fetchQueryDeduped` and may also be attached to a *live* cache entry sharing the same fetch identifier (most obviously the post-`forceUpdate` entry in the StrictMode case), which still needs to receive its `next`. The fix is to make `next` and `error` bail when the cache entry has been disposed, while still allowing them to *create* the entry on a first-time synchronous fire. The store still receives the data via the normalizer (`execute.normalize.*`), so any live entry for the same fetch is updated correctly; only the zombie write is skipped. A naive `this._cache.get(cacheIdentifier) == null → bail` check conflates two different states: "the entry has been disposed" (the zombie case we want to suppress) and "the entry has not yet been created" (a synchronous observable that fires `next`/`error` *during* the `.subscribe()` call, before the post-subscribe creation block at the bottom of `_fetchAndSaveQuery` runs). The latter happens with cached or mocked transports — `network-only` with a synchronous fetch observable is the canonical case, and the test suite exercises it directly. Bailing there would drop the only payload the entry will ever see, leaving the post-subscribe block to cache a `networkPromise` that never resolves; the component then suspends indefinitely on success, or sees a Promise instead of the Error on failure. To distinguish the two states, `_fetchAndSaveQuery` tracks a local `entryHasBeenCreated` flag, flipped to true at each site that populates the LRU for this `cacheIdentifier` (the `shouldAllowRender` block, the first-fire branch inside `next`/`error`, and the post-subscribe `networkPromise` creation). The callbacks bail only when the entry is missing *and* was previously created. First-time synchronous fires fall through to `_getOrCreateCacheEntry` exactly as before. For an `error` arriving after the entry has been disposed, the error is dropped instead of being cached for later re-throw. This is correct: the entry has no useEffect retainers, so there is no consumer to read a cached error. A fresh mount with the same cacheIdentifier will trigger a new fetch and observe the error (or success) on its own. The StrictMode case is easy to reproduce and likely accounts for most of the dev-time pain. The prod-side triggers (mid-fetch navigation, suspended-and-discarded renders) are observed in the wild but rare: they leave a zombie behind only if the user later revisits a mount that computes the same cacheIdentifier, *and* the records the query uniquely retains have been GC'd in the meantime. Most apps don't notice because most records are co-retained by multiple operations and survive GC regardless. Apps with at least one query that uniquely retains some records (e.g. a `@connection(key: ...)` selection whose records aren't selected by any other live query — common in chat-style features) will eventually hit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jwatzman · 2026-06-11T15:19:50Z

This is the result of a debugging session with Claude, guided by me, trying to find the source of a bug in a real production app. Unfortunately due to the nature, it's kind of hard to distil down the real example beyond the test case here.

This PR was written by Claude in its entirety. I have read it and think it makes sense, but am not really in a position to carefully review it. I don't know what your policy on AI contributions is -- I will take no offence if you close this without reading it due to the AI nature.

meta-cla Bot added the CLA Signed label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QueryResource: don't resurrect disposed cache entries in next/error#5317

QueryResource: don't resurrect disposed cache entries in next/error#5317
jwatzman wants to merge 1 commit into
facebook:mainfrom
jwatzman:cache-slot

jwatzman commented Jun 11, 2026

Uh oh!

jwatzman commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jwatzman commented Jun 11, 2026

Uh oh!

jwatzman commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant