Skip to content

Conversation

@jakobht
Copy link
Member

@jakobht jakobht commented Nov 28, 2025

What changed?
The ephemeral shard creator now pings shard owners immediately after creation to verify end-to-end functionality.

Why?
This adds canary verification that validates:

  • Shards are created successfully
  • Owners can be reached via gRPC
  • Owners actually own the assigned shards

Previously, the shard creator only verified that GetShardOwner returned successfully, but didn't verify that the executor was actually reachable or owned the shard.

How did you test it?
Unit tests

Potential risks
Low risk - this only affects the canary test environment and adds verification without changing core shard creation logic.

Documentation

@jakobht jakobht marked this pull request as draft November 28, 2025 11:02
@jakobht jakobht force-pushed the add-ephemeral-shard-ping branch 3 times, most recently from a043f8b to f9d6569 Compare November 28, 2025 12:03
@jakobht jakobht changed the title Add ping verification to ephemeral shard creator feat(shard-distributor): Add ping verification to ephemeral shard creator Nov 28, 2025
@jakobht jakobht force-pushed the add-ephemeral-shard-ping branch 2 times, most recently from 7a49361 to 4b3f347 Compare November 28, 2025 13:49
Implement a YARPC peer chooser that routes requests to the correct
executor based on shard ownership. This is the shard distributor
equivalent of Cadence's RingpopPeerChooser.

Flow:
1. Client calls RPC with yarpc.WithShardKey("shard-key")
2. Chooser queries Spectator for shard owner
3. Extracts grpc_address from owner metadata
4. Creates/reuses peer for that address
5. Returns peer to YARPC for connection

The peer chooser maintains a cache of peers and handles concurrent
access safely. It uses the x-shard-distributor-namespace header to
determine which namespace's spectator to query.

Dependencies:
- Requires spectator GetShardOwner to return metadata (see previous commit)

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Signed-off-by: Jakob Haahr Taankvist <[email protected]>
The tests have dependency issues with mock generation that need
to be resolved separately. The peer chooser implementation is
complete and functional.

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Tests cover:
- Success path with peer creation
- Peer reuse on subsequent calls
- Error cases (missing shard key, namespace header, spectator not found)
- Lifecycle methods (Start, Stop, IsRunning)
- SetSpectators method

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Signed-off-by: Jakob Haahr Taankvist <[email protected]>
…peer creation and mutex lock and unlock

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Implement the client-side pinger that periodically pings random shard
owners to verify:
1. Executors can route to each other based on shard ownership
2. Shard ownership information is accurate
3. The shard distributor is functioning correctly

The pinger:
- Selects random shards at regular intervals (1s with 10% jitter)
- Sends ping requests to the executor owning each shard
- Validates that the receiving executor actually owns the shard
- Logs warnings when ownership is incorrect

Dependencies:
- Requires ShardDistributorExecutorCanaryAPI proto and client
- Will use SpectatorPeerChooser for routing (wired in later commit)

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Signed-off-by: Jakob Haahr Taankvist <[email protected]>
@jakobht jakobht force-pushed the add-ephemeral-shard-ping branch 2 times, most recently from 80a9f46 to c5e5cc9 Compare November 30, 2025 11:12
After creating a shard, the ephemeral shard creator now pings the owner
to verify that:
1. The shard was created successfully
2. The owner can be reached via gRPC
3. The owner actually owns the shard

This provides end-to-end validation of the shard creation and routing
mechanisms in the canary test environment.

Changes:
- Switch from using ShardDistributor client to using Spectators
- Add pingShardOwner() method that sends canary ping after creation
- Verify executor ID and ownership in ping response
- Log warnings if executor ID mismatches or ownership is incorrect
- Add test coverage for ping verification flow
- Use refactored *spectatorclient.Spectators type

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
Integrate all canary components into the module:
- Create SpectatorPeerChooser and manage its lifecycle
- Provide canary client using the dispatcher
- Create and start the pinger component
- Register ping handler as a YARPC server
- Wire spectators to peer chooser on startup

This connects all the pieces needed for executor-to-executor
canary testing via ping/pong requests.

Dependencies:
- Requires SpectatorPeerChooser
- Requires PingHandler
- Requires Pinger component

Signed-off-by: Jakob Haahr Taankvist <[email protected]>
@jakobht jakobht force-pushed the add-ephemeral-shard-ping branch from c5e5cc9 to ae1cf2b Compare November 30, 2025 11:40
Signed-off-by: Jakob Haahr Taankvist <[email protected]>
@jakobht jakobht force-pushed the add-ephemeral-shard-ping branch from 5dc1df9 to 0046451 Compare November 30, 2025 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant