-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(grpc): add gracefulswitch LB policy and some other LB policy changes #2442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
rename mock picker and remove spaces make test picker private
General: - Add Debug to many traits and derive/impl in structs. - Pass LB config to LB policies via `Option<LbConfig>` instead of `Option<&LbConfig>`. It should be rare that policies want to store a config except for the leaf policy. Child manager: The original assumption was that all children would be the same type/configuration, but several policies (including gracefulswitch) will not have that property. So, several changes are made: - Children are considered unique by both their identifier and their LbPolicyBuilder's name(). - Make it so the sharder also can shard LbConfig and provide it via the ChildUpdate.child_update field in addition to the ResolverUpdate. - Make ResolverUpdateSharder a generic instead of Box<dyn>. - Add booleans so users of child manager can easily easily tell whether any child policies updated themselves, and which ones did. - Pass &mut self for sharder so that it can maintain and update its state if needed. - Change the sharder's output ChildUpdate.child_update field to an Option; if None then the child will not be called during the resolver update, but will remain in the child manager. - Change child_states into children and provide the whole Child struct, exposing the fields it contains. - Provide mutable access to the sharder. - Change the LB config to be a flat JSON array to facilitate use within another LB policy that should not need a struct to contain on the children. - Minor test cleanups Graceful switch: The previous implementation in hyperium#2399 contained a lot of logic to manage child policy delegation. It was intended that only ChildManager should need to have this kind of logic. - Create a new implementation of this policy that delegates to ChildManager. - Uses a Sharder that simply emits the active policy with no update alongside any new policy in the new LbConfig. - maybe_swap is called after every call into the ChildManager to determine if child updates necessitate a swap. - This logic is simple: if the active policy is not Ready, or if there is a new policy and it is not Connecting, then set the new policy as the active policy and call resolver_update on the ChildManager. The sharder will see that no LbConfig is provided and just emit the active policy with no config, causing the ChildManager to drop the previously active policy. If no swap is needed, update the picker of the active policy if it had an update. - Minor test cleanups/fixes vs. hyperium#2399.
| &mut self, | ||
| update: ResolverUpdate, | ||
| config: Option<&LbConfig>, | ||
| config: Option<LbConfig>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid clones by sending references. For example, pickfirst doesn't seem to be storing the LB config presently. The ChildManager is still cloning the config to pass an owned object to all its children. With a reference, LB policies can choose to clone if they need an owned object.
Is supporting references complicating the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly, it felt like the more natural API to pass by value instead of reference. Otherwise...why would pass any parameters by value, since passing by reference is theoretically more flexible for the caller? I.e. why not &ResolverUpdate here, too?
I don't think this actually saves us any clones in practice if we require it to be passed by value. Passing by reference probably results in more clones since the parent is less likely to want to keep it than the child. The child might keep it since that is supposed to configure its behavior. The parent is done with it.
I made the change back as a separate commit if you want to compare the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, pickfirst doesn't seem to be storing the LB config presently
Regarding this, note that PF only uses the lb config for its behavior during resolver_update which I believe is much less common than configuring the behavior of its ongoing operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, it seems most instances of LbConfig will actually be references -- they will come out of parts of the service config. So perhaps this way is best...
arjan-bal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the entire PR, leaving some initial comments.
| &mut self, | ||
| resolver_update: ResolverUpdate, | ||
| ) -> Result<Box<dyn Iterator<Item = ChildUpdate<T>>>, Box<dyn Error + Send + Sync>>; | ||
| update: Option<LbConfig>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: update seems like a very general name. Maybe we should call this lb_config or config to be more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should match resolver_update IMO. For now that's apparently update: ResolverUpdate, config: Option<LbConfig>. I'll go with that and if we want to rename the lb policy API for any reason, then we should come back and rename these too.
| self.children | ||
| .iter() | ||
| .map(|child| (&child.identifier, &child.state)) | ||
| pub fn children(&mut self) -> impl Iterator<Item = &Child<T>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this method accept an immutable reference instead? Same question regarding the aggregate_states method below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should be fine. I am not sure why I made these &mut self since they are read-only operations. I'm expecting all real-world uses will have the ChildManager mutably when calling into it, but there's no reason to make these &mut.
| /// not exist. The child_policy_builder's name is effectively a part of the | ||
| /// child_identifier. If two identifiers are identical but have different | ||
| /// builder names, they are treated as different children. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think a simpler design would be to have the type argument for T include the builder name if necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I started out trying to do, actually. Somewhere I decided that it was a correctness issue and that the child manager should take care of it for that reason. Otherwise you could end up sending the wrong config to the wrong type of child. And there should be no reason to want to switch the builder type for a child while keeping it around - it could only result in bugs.
In terms of code complexity, it makes the child manager itself only a little more complicated, because is really only one place where it matters (in resolver_update when determining which children to create vs. keep). Anyone using child manager needs to be aware it's happening, e.g. when they're iterating through the children in the child manager, but doesn't have to do anything special to take advantage of it.
Sorry in advance for the large and unfocused change. Some smaller things could be split out if it helps, but since many parts of the design are still in flight and can/will be changing going forward, I thought this would be okay for now.
This mainly supersedes #2399 (and keeps it as a commit), but most of the implementation is different except the tests, which are largely copied verbatim. And it's a net 200 LoC smaller in graceful switch (100 LoC in total) which hopefully is an indication that the reuse of ChildManager is preferable.
In addition, I'd like to revisit the test implementation later, as I'm not sure the current way they are written is ideal.
--
General:
Debugto many traits and derive/impl in structs.Option<LbConfig>instead ofOption<&LbConfig>. It should be rare that policies want to store a config except for the leaf policy.Child manager:
The original assumption was that all children would be the same type/configuration, but several policies (including gracefulswitch) will not have that property. So, several changes are made:
LbPolicyBuilder's name().LbConfigand provide it via theChildUpdate.child_updatefield in addition to theResolverUpdate.ResolverUpdateShardera generic instead ofBox<dyn>.&mut selffor sharder so that it can maintain and update its state if needed.ChildUpdate.child_updatefield to anOption; ifNonethen the child will not be called during the resolver update, but will remain in the child manager.child_statesintochildrenand provide the wholeChildstruct, exposing the fields it contains.Graceful switch:
The previous implementation in #2399 contained a lot of logic to manage child policy delegation. It was intended that only
ChildManagershould need to have this kind of logic.ChildManager.LbConfig.maybe_swapis called after every call into theChildManagerto determine if child updates necessitate a swap.Ready, or if there is a new policy and it is notConnecting, then set the new policy as the active policy and call resolver_update on theChildManager. The sharder will see that noLbConfigis provided and just emit the active policy with no config, causing theChildManagerto drop the previously active policy. If no swap is needed, update the picker of the active policy if it had an update.