fix(cluster): recover sharded pubsub topology after node reconnects by PavelPashov · Pull Request #3223 · redis/node-redis

PavelPashov · 2026-04-09T12:29:00Z

Description

Describe your pull request here

Checklist

Does npm test pass with this change (including linting)?
Is the new or changed code fully tested?
Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?

Note

Medium Risk
Changes core cluster reconnection and rediscovery behavior, which can affect topology refresh frequency and node selection during instability. Guardrails exist (strategy validation, refresh de-duplication), but the behavior is new and exercised mainly via tests.

Overview
Improves cluster resilience by triggering background topology rediscovery after post-ready node reconnection attempts, using a new ClusterReconnectionTracker and a configurable topologyRefreshOnReconnectionAttemptStrategy option (default 5s; false/0 disables; function supported) with refresh de-duplication and exclusion of currently reconnecting node addresses.

RedisClusterSlots now tracks reconnecting node clients via reconnecting/ready/end events, clears tracking on destroy/unsubscribe/node removal, and expands rediscovery to try known nodes first (optionally excluding addresses) before falling back to root nodes.

Tests are strengthened with a full ClusterReconnectionTracker unit suite and a major refactor/expansion of sharded Pub/Sub E2E scenarios to cover multiple failure modes (failover, node/proxy/shard failures) and recovery delivery expectations; small typing/lint tweaks accompany this.

^{Reviewed by Cursor Bugbot for commit 6aae3fd. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit 6008ca9. Configure here.}

cursor · 2026-04-30T07:36:34Z

+
+      if (!this.#isOpen) {
+        continue;
+      }


Loop uses continue instead of break when closed

Low Severity

When !this.#isOpen in #discoverWithKnownNodeCandidates, the code uses continue instead of break. This causes the loop to spin through all remaining candidates doing nothing, rather than exiting immediately. More importantly, this is inconsistent with #discoverWithRootNodes which throws 'Cluster closed' on the same check. The continue semantically implies "skip this candidate and try the next," but the actual intent is "stop trying because the cluster is closed."

^{Reviewed by Cursor Bugbot for commit 6008ca9. Configure here.}

nkaradzhov mentioned this pull request Apr 27, 2026

Redis Cluster Doesn't Recover from Node Changes #3252

Open

PavelPashov marked this pull request as ready for review April 29, 2026 08:22

PavelPashov added 5 commits April 29, 2026 11:27

fix(cluster): recover sharded pubsub topology after node reconnects

80c2b40

fix: prefer ready known nodes during topology recovery

88e8ff6

test: expand sharded pubsub E2E recovery coverage

04a4f25

refactor: simplify topology recovery candidate iteration

e424324

feat: make cluster reconnect topology refresh configurable

4f1b8f2

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread packages/client/lib/cluster/cluster-slots.ts Outdated

PavelPashov added 2 commits April 29, 2026 13:29

chore: fix lint

f7da797

docs(cluster): clarify reconnect topology refresh threshold

8281293

PavelPashov force-pushed the fix/cluster-sharded-pubsub-topology-recovery branch from 3cb7843 to 8281293 Compare April 29, 2026 10:33

refactor: extract reconnection topology refresh tracking

e5bda76

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread packages/client/lib/cluster/cluster-slots.spec.ts Outdated

fix: remove incorrect cluster slots tests

6008ca9

cursor Bot reviewed Apr 30, 2026

View reviewed changes

fix: refine reconnection refresh strategy

6aae3fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cluster): recover sharded pubsub topology after node reconnects#3223

fix(cluster): recover sharded pubsub topology after node reconnects#3223
PavelPashov wants to merge 10 commits intoredis:masterfrom
PavelPashov:fix/cluster-sharded-pubsub-topology-recovery

PavelPashov commented Apr 9, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PavelPashov commented Apr 9, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 30, 2026

Choose a reason for hiding this comment

Loop uses continue instead of break when closed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PavelPashov commented Apr 9, 2026 •

edited by cursor Bot

Loading

Loop uses `continue` instead of `break` when closed