Skip to content

fix(cluster): recover sharded pubsub topology after node reconnects#3223

Open
PavelPashov wants to merge 10 commits intoredis:masterfrom
PavelPashov:fix/cluster-sharded-pubsub-topology-recovery
Open

fix(cluster): recover sharded pubsub topology after node reconnects#3223
PavelPashov wants to merge 10 commits intoredis:masterfrom
PavelPashov:fix/cluster-sharded-pubsub-topology-recovery

Conversation

@PavelPashov
Copy link
Copy Markdown
Contributor

@PavelPashov PavelPashov commented Apr 9, 2026

Description

Describe your pull request here


Checklist

  • Does npm test pass with this change (including linting)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?

Note

Medium Risk
Changes core cluster reconnection and rediscovery behavior, which can affect topology refresh frequency and node selection during instability. Guardrails exist (strategy validation, refresh de-duplication), but the behavior is new and exercised mainly via tests.

Overview
Improves cluster resilience by triggering background topology rediscovery after post-ready node reconnection attempts, using a new ClusterReconnectionTracker and a configurable topologyRefreshOnReconnectionAttemptStrategy option (default 5s; false/0 disables; function supported) with refresh de-duplication and exclusion of currently reconnecting node addresses.

RedisClusterSlots now tracks reconnecting node clients via reconnecting/ready/end events, clears tracking on destroy/unsubscribe/node removal, and expands rediscovery to try known nodes first (optionally excluding addresses) before falling back to root nodes.

Tests are strengthened with a full ClusterReconnectionTracker unit suite and a major refactor/expansion of sharded Pub/Sub E2E scenarios to cover multiple failure modes (failover, node/proxy/shard failures) and recovery delivery expectations; small typing/lint tweaks accompany this.

Reviewed by Cursor Bugbot for commit 6aae3fd. Bugbot is set up for automated code reviews on this repo. Configure here.

@PavelPashov PavelPashov marked this pull request as ready for review April 29, 2026 08:22
Comment thread packages/client/lib/cluster/cluster-slots.ts Outdated
@PavelPashov PavelPashov force-pushed the fix/cluster-sharded-pubsub-topology-recovery branch from 3cb7843 to 8281293 Compare April 29, 2026 10:33
Comment thread packages/client/lib/cluster/cluster-slots.spec.ts Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 6008ca9. Configure here.


if (!this.#isOpen) {
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop uses continue instead of break when closed

Low Severity

When !this.#isOpen in #discoverWithKnownNodeCandidates, the code uses continue instead of break. This causes the loop to spin through all remaining candidates doing nothing, rather than exiting immediately. More importantly, this is inconsistent with #discoverWithRootNodes which throws 'Cluster closed' on the same check. The continue semantically implies "skip this candidate and try the next," but the actual intent is "stop trying because the cluster is closed."

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6008ca9. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant