QueryResource: don't resurrect disposed cache entries in next/error#5317
Open
jwatzman wants to merge 1 commit into
Open
QueryResource: don't resurrect disposed cache entries in next/error#5317jwatzman wants to merge 1 commit into
jwatzman wants to merge 1 commit into
Conversation
When a QueryResource cache entry is disposed while its fetch is still in flight, the `next` and `error` callbacks in `_fetchAndSaveQuery` call `_getOrCreateCacheEntry`, which silently recreates the entry in the LRU under the same cacheIdentifier. The resurrected entry has refcount 0 — no permanent retain was ever attached to it by `QueryResource.retain`, and no useEffect cleanup will ever fire. It outlives navigation, sits in the LRU until evicted by capacity, and gets reused on a later mount that happens to compute the same cacheIdentifier. Reusing the zombie skips `prepareWithIdentifier`'s `_fetchAndSaveQuery` path entirely: no `environment.check`, no fetch. By the time the zombie is reused, the store records uniquely retained by this query have likely been freed — released to the GC buffer when the original entry was disposed, then reclaimed by a later GC sweep triggered by an unrelated operation. The fragment reader walks into the data, hits a missing pageInfo or `@connection`-key tracker record (or any record selected only by this query), and returns `undefined` for the affected field. The application sees `useLazyLoadQuery` return a data ref whose fields are unexpectedly undefined, with no fetch, no error, and no suspension to recover. Several scenarios can dispose an entry while its fetch is in flight: - **React StrictMode**, where the mount/cleanup/mount cycle of `useLazyLoadQueryNode` synchronously disposes the cache entry created during the initial render, then the `maybeHiddenOrFastRefresh` path triggers a `forceUpdate` that creates a new entry under a different cacheBreaker. The original fetch is still alive on the deduped observable; when it lands, the disposed entry is resurrected alongside the new one being updated. This is the most reliable trigger and is what made the bug observable to us. - **Navigating away mid-fetch**. A user clicks into a route that cache-misses, the fetch starts, the user navigates away before it completes. The route unmounts, the cache entry is disposed. When the fetch lands, the disposed entry is resurrected. - **Concurrent rendering discarding a suspended subtree**. A component throws a Promise from `useLazyLoadQuery`; React suspends. A parent transitions away before the fetch completes; the suspended render never commits. The temporary retain expires after `TEMPORARY_RETAIN_DURATION_MS` (5 minutes) and the entry is disposed. The fetch — started synchronously inside `_fetchAndSaveQuery` *before* the component suspended — continues to completion and resurrects the entry. - **Offscreen / fast refresh**, which produce similar dispose-while- fetching cycles. The `maybeHiddenOrFastRefresh` comment in `useLazyLoadQueryNode` already calls these out as the rationale for its forceUpdate path. In all of these, the dispose path in `createCacheEntry`'s `SuspenseResource` only unsubscribes the network observable for live queries; for regular queries the subscription stays alive precisely so late payloads can land. That is what allows the `next`/`error` callbacks to fire on a disposed entry. Cancelling the subscription on dispose is not the right fix — the fetch is deduped via `fetchQueryDeduped` and may also be attached to a *live* cache entry sharing the same fetch identifier (most obviously the post-`forceUpdate` entry in the StrictMode case), which still needs to receive its `next`. The fix is to make `next` and `error` bail when the cache entry has been disposed, while still allowing them to *create* the entry on a first-time synchronous fire. The store still receives the data via the normalizer (`execute.normalize.*`), so any live entry for the same fetch is updated correctly; only the zombie write is skipped. A naive `this._cache.get(cacheIdentifier) == null → bail` check conflates two different states: "the entry has been disposed" (the zombie case we want to suppress) and "the entry has not yet been created" (a synchronous observable that fires `next`/`error` *during* the `.subscribe()` call, before the post-subscribe creation block at the bottom of `_fetchAndSaveQuery` runs). The latter happens with cached or mocked transports — `network-only` with a synchronous fetch observable is the canonical case, and the test suite exercises it directly. Bailing there would drop the only payload the entry will ever see, leaving the post-subscribe block to cache a `networkPromise` that never resolves; the component then suspends indefinitely on success, or sees a Promise instead of the Error on failure. To distinguish the two states, `_fetchAndSaveQuery` tracks a local `entryHasBeenCreated` flag, flipped to true at each site that populates the LRU for this `cacheIdentifier` (the `shouldAllowRender` block, the first-fire branch inside `next`/`error`, and the post-subscribe `networkPromise` creation). The callbacks bail only when the entry is missing *and* was previously created. First-time synchronous fires fall through to `_getOrCreateCacheEntry` exactly as before. For an `error` arriving after the entry has been disposed, the error is dropped instead of being cached for later re-throw. This is correct: the entry has no useEffect retainers, so there is no consumer to read a cached error. A fresh mount with the same cacheIdentifier will trigger a new fetch and observe the error (or success) on its own. The StrictMode case is easy to reproduce and likely accounts for most of the dev-time pain. The prod-side triggers (mid-fetch navigation, suspended-and-discarded renders) are observed in the wild but rare: they leave a zombie behind only if the user later revisits a mount that computes the same cacheIdentifier, *and* the records the query uniquely retains have been GC'd in the meantime. Most apps don't notice because most records are co-retained by multiple operations and survive GC regardless. Apps with at least one query that uniquely retains some records (e.g. a `@connection(key: ...)` selection whose records aren't selected by any other live query — common in chat-style features) will eventually hit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
|
This is the result of a debugging session with Claude, guided by me, trying to find the source of a bug in a real production app. Unfortunately due to the nature, it's kind of hard to distil down the real example beyond the test case here. This PR was written by Claude in its entirety. I have read it and think it makes sense, but am not really in a position to carefully review it. I don't know what your policy on AI contributions is -- I will take no offence if you close this without reading it due to the AI nature. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a QueryResource cache entry is disposed while its fetch is still in flight, the
nextanderrorcallbacks in_fetchAndSaveQuerycall_getOrCreateCacheEntry, which silently recreates the entry in the LRU under the same cacheIdentifier. The resurrected entry has refcount 0 — no permanent retain was ever attached to it byQueryResource.retain, and no useEffect cleanup will ever fire. It outlives navigation, sits in the LRU until evicted by capacity, and gets reused on a later mount that happens to compute the same cacheIdentifier.Reusing the zombie skips
prepareWithIdentifier's_fetchAndSaveQuerypath entirely: noenvironment.check, no fetch. By the time the zombie is reused, the store records uniquely retained by this query have likely been freed — released to the GC buffer when the original entry was disposed, then reclaimed by a later GC sweep triggered by an unrelated operation. The fragment reader walks into the data, hits a missing pageInfo or@connection-key tracker record (or any record selected only by this query), and returnsundefinedfor the affected field. The application seesuseLazyLoadQueryreturn a data ref whose fields are unexpectedly undefined, with no fetch, no error, and no suspension to recover.Several scenarios can dispose an entry while its fetch is in flight:
React StrictMode, where the mount/cleanup/mount cycle of
useLazyLoadQueryNodesynchronously disposes the cache entry created during the initial render, then themaybeHiddenOrFastRefreshpath triggers aforceUpdatethat creates a new entry under a different cacheBreaker. The original fetch is still alive on the deduped observable; when it lands, the disposed entry is resurrected alongside the new one being updated. This is the most reliable trigger and is what made the bug observable to us.Navigating away mid-fetch. A user clicks into a route that cache-misses, the fetch starts, the user navigates away before it completes. The route unmounts, the cache entry is disposed. When the fetch lands, the disposed entry is resurrected.
Concurrent rendering discarding a suspended subtree. A component throws a Promise from
useLazyLoadQuery; React suspends. A parent transitions away before the fetch completes; the suspended render never commits. The temporary retain expires afterTEMPORARY_RETAIN_DURATION_MS(5 minutes) and the entry is disposed. The fetch — started synchronously inside_fetchAndSaveQuerybefore the component suspended — continues to completion and resurrects the entry.Offscreen / fast refresh, which produce similar dispose-while- fetching cycles. The
maybeHiddenOrFastRefreshcomment inuseLazyLoadQueryNodealready calls these out as the rationale for its forceUpdate path.In all of these, the dispose path in
createCacheEntry'sSuspenseResourceonly unsubscribes the network observable for live queries; for regular queries the subscription stays alive precisely so late payloads can land. That is what allows thenext/errorcallbacks to fire on a disposed entry. Cancelling the subscription on dispose is not the right fix — the fetch is deduped viafetchQueryDedupedand may also be attached to a live cache entry sharing the same fetch identifier (most obviously the post-forceUpdateentry in the StrictMode case), which still needs to receive itsnext.The fix is to make
nextanderrorbail when the cache entry has been disposed, while still allowing them to create the entry on a first-time synchronous fire. The store still receives the data via the normalizer (execute.normalize.*), so any live entry for the same fetch is updated correctly; only the zombie write is skipped.A naive
this._cache.get(cacheIdentifier) == null → bailcheck conflates two different states: "the entry has been disposed" (the zombie case we want to suppress) and "the entry has not yet been created" (a synchronous observable that firesnext/errorduring the.subscribe()call, before the post-subscribe creation block at the bottom of_fetchAndSaveQueryruns). The latter happens with cached or mocked transports —network-onlywith a synchronous fetch observable is the canonical case, and the test suite exercises it directly. Bailing there would drop the only payload the entry will ever see, leaving the post-subscribe block to cache anetworkPromisethat never resolves; the component then suspends indefinitely on success, or sees a Promise instead of the Error on failure.To distinguish the two states,
_fetchAndSaveQuerytracks a localentryHasBeenCreatedflag, flipped to true at each site that populates the LRU for thiscacheIdentifier(theshouldAllowRenderblock, the first-fire branch insidenext/error, and the post-subscribenetworkPromisecreation). The callbacks bail only when the entry is missing and was previously created. First-time synchronous fires fall through to_getOrCreateCacheEntryexactly as before.For an
errorarriving after the entry has been disposed, the error is dropped instead of being cached for later re-throw. This is correct: the entry has no useEffect retainers, so there is no consumer to read a cached error. A fresh mount with the same cacheIdentifier will trigger a new fetch and observe the error (or success) on its own.The StrictMode case is easy to reproduce and likely accounts for most of the dev-time pain. The prod-side triggers (mid-fetch navigation, suspended-and-discarded renders) are observed in the wild but rare: they leave a zombie behind only if the user later revisits a mount that computes the same cacheIdentifier, and the records the query uniquely retains have been GC'd in the meantime. Most apps don't notice because most records are co-retained by multiple operations and survive GC regardless. Apps with at least one query that uniquely retains some records (e.g. a
@connection(key: ...)selection whose records aren't selected by any other live query — common in chat-style features) will eventually hit it.