Skip to content

Conversation

@jakobht
Copy link
Member

@jakobht jakobht commented Nov 24, 2025

This PR depends on #7476 (spectator return metadata) being merged first.

What changed?
Added SpectatorPeerChooser that implements YARPC's peer.Chooser interface to route requests to the correct executor based on shard ownership.

Why?
Enable executor-to-executor communication in the shard distributor canary system. The peer chooser queries the Spectator to find which executor owns a shard, then routes requests to that executor's gRPC address.

How did you test it?
Unit tests

Potential risks
None

Release notes

Documentation Changes

@jakobht jakobht marked this pull request as draft November 24, 2025 13:19
@jakobht jakobht force-pushed the add-spectator-peer-chooser branch 4 times, most recently from 642ad52 to 1597904 Compare November 25, 2025 07:53
@jakobht jakobht force-pushed the add-spectator-peer-chooser branch 2 times, most recently from f4bb653 to 85017c4 Compare November 28, 2025 11:37
@jakobht jakobht marked this pull request as ready for review November 28, 2025 11:43
}

// Extract GRPC address from owner metadata
grpcAddress, ok := owner.Metadata[grpcAddressMetadataKey]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a bit unexpected to see such specifics as grpcAddressMetadataKey here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would like better separation. We need to address here, so I decided to do it like this, but it does feel like mixing application and network level.

@jakobht jakobht force-pushed the add-spectator-peer-chooser branch from 89bb17f to 1e6cc8a Compare November 30, 2025 10:37
Implement a YARPC peer chooser that routes requests to the correct
executor based on shard ownership. This is the shard distributor
equivalent of Cadence's RingpopPeerChooser.

Flow:
1. Client calls RPC with yarpc.WithShardKey("shard-key")
2. Chooser queries Spectator for shard owner
3. Extracts grpc_address from owner metadata
4. Creates/reuses peer for that address
5. Returns peer to YARPC for connection

The peer chooser maintains a cache of peers and handles concurrent
access safely. It uses the x-shard-distributor-namespace header to
determine which namespace's spectator to query.

Dependencies:
- Requires spectator GetShardOwner to return metadata (see previous commit)

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Signed-off-by: Jakob Haahr Taankvist <[email protected]>
The tests have dependency issues with mock generation that need
to be resolved separately. The peer chooser implementation is
complete and functional.

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Tests cover:
- Success path with peer creation
- Peer reuse on subsequent calls
- Error cases (missing shard key, namespace header, spectator not found)
- Lifecycle methods (Start, Stop, IsRunning)
- SetSpectators method

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Signed-off-by: Jakob Haahr Taankvist <[email protected]>
…peer creation and mutex lock and unlock

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
@jakobht jakobht force-pushed the add-spectator-peer-chooser branch from 1e6cc8a to 83f485a Compare November 30, 2025 10:42
@jakobht jakobht merged commit 9f31613 into cadence-workflow:master Dec 1, 2025
43 of 44 checks passed
jakobht added a commit that referenced this pull request Dec 2, 2025
…ip verification (#7487)

Depends on #7475 and
#7478 being merged

**What changed?**
Added canary pinger component that periodically sends ping requests to
shard owners to verify executor-to-executor communication and shard
ownership.

**Why?**
The canary pinger provides active monitoring of the shard distributor's
routing and ownership mechanisms by:
- Periodically selecting random shards and pinging their owners
- Verifying that the pinged executor owns the shard
- Detecting communication failures between executors

This is part of the canary ping/pong implementation that validates
end-to-end executor-to-executor gRPC communication.

**How did you test it?**
Unit tests

**Potential risks**

**Release notes**

**Documentation Changes**

---------

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants