feat(grpc): add gracefulswitch LB policy and some other LB policy changes #2442

dfawley · 2025-11-06T18:28:29Z

Sorry in advance for the large and unfocused change. Some smaller things could be split out if it helps, but since many parts of the design are still in flight and can/will be changing going forward, I thought this would be okay for now.

This mainly supersedes #2399 (and keeps it as a commit), but most of the implementation is different except the tests, which are largely copied verbatim. And it's a net 200 LoC smaller in graceful switch (100 LoC in total) which hopefully is an indication that the reuse of ChildManager is preferable.

In addition, I'd like to revisit the test implementation later, as I'm not sure the current way they are written is ideal.

--

General:

Add Debug to many traits and derive/impl in structs.
Pass LB config to LB policies via Option<LbConfig> instead of Option<&LbConfig>. It should be rare that policies want to store a config except for the leaf policy.

Child manager:

The original assumption was that all children would be the same type/configuration, but several policies (including gracefulswitch) will not have that property. So, several changes are made:

Children are considered unique by both their identifier and their LbPolicyBuilder's name().
Make it so the sharder also can shard LbConfig and provide it via the ChildUpdate.child_update field in addition to the ResolverUpdate.
Make ResolverUpdateSharder a generic instead of Box<dyn>.
Add booleans so users of child manager can easily easily tell whether any child policies updated themselves, and which ones did.
Pass &mut self for sharder so that it can maintain and update its state if needed.
Change the sharder's output ChildUpdate.child_update field to an Option; if None then the child will not be called during the resolver update, but will remain in the child manager.
Change child_states into children and provide the whole Child struct, exposing the fields it contains.
Provide mutable access to the sharder.
Minor test cleanups

Graceful switch:

The previous implementation in #2399 contained a lot of logic to manage child policy delegation. It was intended that only ChildManager should need to have this kind of logic.

Create a new implementation of this policy that delegates to ChildManager.
Uses a Sharder that simply emits the active policy with no update alongside any new policy in the new LbConfig.
maybe_swap is called after every call into the ChildManager to determine if child updates necessitate a swap.
This logic is simple: if the active policy is not Ready, or if there is a new policy and it is not Connecting, then set the new policy as the active policy and call resolver_update on the ChildManager. The sharder will see that no LbConfig is provided and just emit the active policy with no config, causing the ChildManager to drop the previously active policy. If no swap is needed, update the picker of the active policy if it had an update.
Minor test cleanups/fixes vs. feat(grpc): add gracefulswitch load balancing policy #2399.

rename mock picker and remove spaces make test picker private

General: - Add Debug to many traits and derive/impl in structs. - Pass LB config to LB policies via `Option<LbConfig>` instead of `Option<&LbConfig>`. It should be rare that policies want to store a config except for the leaf policy. Child manager: The original assumption was that all children would be the same type/configuration, but several policies (including gracefulswitch) will not have that property. So, several changes are made: - Children are considered unique by both their identifier and their LbPolicyBuilder's name(). - Make it so the sharder also can shard LbConfig and provide it via the ChildUpdate.child_update field in addition to the ResolverUpdate. - Make ResolverUpdateSharder a generic instead of Box<dyn>. - Add booleans so users of child manager can easily easily tell whether any child policies updated themselves, and which ones did. - Pass &mut self for sharder so that it can maintain and update its state if needed. - Change the sharder's output ChildUpdate.child_update field to an Option; if None then the child will not be called during the resolver update, but will remain in the child manager. - Change child_states into children and provide the whole Child struct, exposing the fields it contains. - Provide mutable access to the sharder. - Change the LB config to be a flat JSON array to facilitate use within another LB policy that should not need a struct to contain on the children. - Minor test cleanups Graceful switch: The previous implementation in hyperium#2399 contained a lot of logic to manage child policy delegation. It was intended that only ChildManager should need to have this kind of logic. - Create a new implementation of this policy that delegates to ChildManager. - Uses a Sharder that simply emits the active policy with no update alongside any new policy in the new LbConfig. - maybe_swap is called after every call into the ChildManager to determine if child updates necessitate a swap. - This logic is simple: if the active policy is not Ready, or if there is a new policy and it is not Connecting, then set the new policy as the active policy and call resolver_update on the ChildManager. The sharder will see that no LbConfig is provided and just emit the active policy with no config, causing the ChildManager to drop the previously active policy. If no swap is needed, update the picker of the active policy if it had an update. - Minor test cleanups/fixes vs. hyperium#2399.

arjan-bal · 2025-11-07T09:40:26Z

grpc/src/client/load_balancing/mod.rs

        &mut self,
        update: ResolverUpdate,
-        config: Option<&LbConfig>,
+        config: Option<LbConfig>,


We can avoid clones by sending references. For example, pickfirst doesn't seem to be storing the LB config presently. The ChildManager is still cloning the config to pass an owned object to all its children. With a reference, LB policies can choose to clone if they need an owned object.

Is supporting references complicating the implementation?

Mainly, it felt like the more natural API to pass by value instead of reference. Otherwise...why would pass any parameters by value, since passing by reference is theoretically more flexible for the caller? I.e. why not &ResolverUpdate here, too?

I don't think this actually saves us any clones in practice if we require it to be passed by value. Passing by reference probably results in more clones since the parent is less likely to want to keep it than the child. The child might keep it since that is supposed to configure its behavior. The parent is done with it.

I made the change back as a separate commit if you want to compare the two.

For example, pickfirst doesn't seem to be storing the LB config presently

Regarding this, note that PF only uses the lb config for its behavior during resolver_update which I believe is much less common than configuring the behavior of its ongoing operation.

Thinking more about this, it seems most instances of LbConfig will actually be references -- they will come out of parts of the service config. So perhaps this way is best...

arjan-bal

I haven't reviewed the entire PR, leaving some initial comments.

arjan-bal · 2025-11-07T10:13:02Z

grpc/src/client/load_balancing/child_manager.rs

+        &mut self,
        resolver_update: ResolverUpdate,
-    ) -> Result<Box<dyn Iterator<Item = ChildUpdate<T>>>, Box<dyn Error + Send + Sync>>;
+        update: Option<LbConfig>,


nit: update seems like a very general name. Maybe we should call this lb_config or config to be more specific?

These should match resolver_update IMO. For now that's apparently update: ResolverUpdate, config: Option<LbConfig>. I'll go with that and if we want to rename the lb policy API for any reason, then we should come back and rename these too.

arjan-bal · 2025-11-07T10:30:33Z

grpc/src/client/load_balancing/child_manager.rs

-        self.children
-            .iter()
-            .map(|child| (&child.identifier, &child.state))
+    pub fn children(&mut self) -> impl Iterator<Item = &Child<T>> {


Can this method accept an immutable reference instead? Same question regarding the aggregate_states method below.

Yes, that should be fine. I am not sure why I made these &mut self since they are read-only operations. I'm expecting all real-world uses will have the ChildManager mutably when calling into it, but there's no reason to make these &mut.

arjan-bal · 2025-11-07T11:04:01Z

grpc/src/client/load_balancing/child_manager.rs

+    /// not exist.  The child_policy_builder's name is effectively a part of the
+    /// child_identifier.  If two identifiers are identical but have different
+    /// builder names, they are treated as different children.


Do you think a simpler design would be to have the type argument for T include the builder name if necessary?

That's what I started out trying to do, actually. Somewhere I decided that it was a correctness issue and that the child manager should take care of it for that reason. Otherwise you could end up sending the wrong config to the wrong type of child. And there should be no reason to want to switch the builder type for a child while keeping it around - it could only result in bugs.

In terms of code complexity, it makes the child manager itself only a little more complicated, because is really only one place where it matters (in resolver_update when determining which children to create vs. keep). Anyone using child manager needs to be aware it's happening, e.g. when they're iterating through the children in the child manager, but doesn't have to do anything special to take advantage of it.

cjqzhao and others added 2 commits November 4, 2025 14:21

add gracefulswitch and tests

6b2c2c7

rename mock picker and remove spaces make test picker private

dfawley added this to the grpc-next milestone Nov 6, 2025

dfawley requested a review from arjan-bal November 6, 2025 18:28

dfawley assigned arjan-bal Nov 6, 2025

dfawley added C-enhancement Category: New feature or request A-grpc-next labels Nov 6, 2025

dfawley mentioned this pull request Nov 6, 2025

feat(grpc): make ChildManager call the parent work_scheduler #2443

Open

arjan-bal reviewed Nov 7, 2025

View reviewed changes

dfawley added 2 commits November 7, 2025 08:06

no mut receiver; rename sharder params

60419a5

switch LbConfig back to a reference

f933d87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(grpc): add gracefulswitch LB policy and some other LB policy changes #2442

feat(grpc): add gracefulswitch LB policy and some other LB policy changes #2442

Uh oh!

dfawley commented Nov 6, 2025 •

edited

Loading

Uh oh!

arjan-bal Nov 7, 2025

Uh oh!

dfawley Nov 7, 2025

Uh oh!

dfawley Nov 7, 2025

Uh oh!

dfawley Nov 7, 2025

Uh oh!

arjan-bal left a comment

Uh oh!

arjan-bal Nov 7, 2025

Uh oh!

dfawley Nov 7, 2025

Uh oh!

arjan-bal Nov 7, 2025

Uh oh!

dfawley Nov 7, 2025

Uh oh!

arjan-bal Nov 7, 2025

Uh oh!

dfawley Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(grpc): add gracefulswitch LB policy and some other LB policy changes #2442

Are you sure you want to change the base?

feat(grpc): add gracefulswitch LB policy and some other LB policy changes #2442

Uh oh!

Conversation

dfawley commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arjan-bal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dfawley commented Nov 6, 2025 •

edited

Loading