Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Add designs/multi-cluster.md #2746
base: main
Are you sure you want to change the base?
📖 Add designs/multi-cluster.md #2746
Changes from all commits
eb207cb
120cef5
799a911
3f2fa4f
e4cac69
7469ce9
34a44f9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a general note, for the purpose of this proposal we should focus on general controller runtime users. While we can keep
kcp
as a reference along other implementation. I'd rephrase the motivation at a high level "setup controllers and watches across multiple Kubernetes clusters in a transparent way"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we tried to cover the high-level part with the first bullet point. The kcp controller-runtime fork is just mentioned to give personal motivation, but I don't think we have to mention it here if that is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explicitly don't want one controller (with its own workqueue) per cluster.
Example: forget about workspaces. Imagine controller-runtime only supported controllers per one (!) namespace, i.e. another controller with another namespace for every namespace you want to serve. Same argument here, just a level higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And independently you could imagine cases where the same is true e.g. for cluster-api cases where the workqueue should be shared. That's what this enhancement is enabling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a decision that shouldn't be forced onto customers. I can see the case where a workqueue per cluster is desired as it provides some built in fairness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others might disagree. This question needs a proper evaluation with pro/cons of both approaches rather than jumping to conclusions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that both topologies can have their place. Am not even sure pro/con is helpful. We shouldn't be opinionated, but give the primitives for the developer to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying my comment from here since this is the bigger thread:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A stated goal of this doc is to avoid
divergence in the ecosystem
, writing that down while at the same time handwaving away comments about this not actually being a good approach and saying we shouldn't be opinionated is not particularly convincing.Our goal is be make the majority use case simple and other use-cases possible. This is not possible if we refuse to even look into the question what the majority use-case is and default to assuming that the use-case of the author of a design must be the majority use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I didn't mean we shouldn't think about which workqueue topology is useful when. I meant that there are good reasons for a joint workqueue in some situations (like when I want to throttle all reconciles in a process because that throughput it limited), and independent ones in other situations (like when e.g. writes to a cluster are the limited factor).
I played with a
TypedFair
queue:that could be plugged in here, wrapped by throttling and delays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "Start/Stop a controller for each cluster" is out of scope, this is purely about "Add/Remove sources to/from controller on cluster arrival/departure"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to "one workqueue" I guess. Start/Stop means another workqueue, which we don't want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to focus the first iteration of this proposal to how Kubernetes works today? Adding uncommon use cases at this point in time increase overall complexity of the implementation. Other use cases should be pluggable imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agree things should be highly pluggable, that's why this is just an (incomplete) list of things that you could eventually plug in. Agreed that kcp is an uncommon use case, but so far we've made (recent) design decisions with Kubernetes clusters in general in mind. The "kcp" provider we'd like to built is itself far from ready yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't be helpful for us though. I don't think the design now is influenced much by the kcp requirements, maybe with exception of the shared workqueue. Other than that the fleet-namespace example (which kind of reflects the kcp requirements) shows that the kcp use-case can be covered by a pretty generic design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "binary pluggable" mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like https://pkg.go.dev/plugin to dynamically load providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a goal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second argument should probably be a typed reference, like we have for
ObjectReference
, even if it contains a singleName
field, it would help with expanding it later, wdyt?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking of
logical.Name
here or more a struct?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, this
Name()
shoudl be converted to something likeSelfRef
or similarThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would a context be selected from the kubeconfig passed to the controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to avoid any confusion, this description is talking about a
context.Context
, not a kubeconfig context.But in general: This would be something to implement in a
Provider
, a kubeconfig provider could be a very simple implementation (although the focus is on dynamic providers, and a kubeconfig would probably be fairly static).How that provider translates the
clusterName
parameter in theGet
method (see theProvider
interface above) to a kubeconfig context would be up to the implementation, but I could see the context name being the identifier here (sinceGet
returns acluster.Cluster
, we need credentials for the embedded client, so a context makes a lot of sense here).Does that make sense? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its non-blocking, how is the controller supposed to surface errors here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that anything that needs to be done for starting operations on a cluster is "blocking" (and thus would return an error), but the operations on the engaged cluster themselves are not blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example?
I have some trouble understanding the difference between "anything that needs to be done for starting operations on a cluster" vs "operations on the engaged cluster" that both seems to be done or aysnchronously triggered (maybe?) in the Engage func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things that happen in the prototype implementation (#3019) for the
typedMultiClusterController
is starting the new watches on the newly engaged cluster. So if that fails,Engage
returns an error, but it's not blocking for processing items.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that mean the current interface specification for
cluster.Provider
is insufficient, as this entails that thecluster.Provider
needs a reference to the manager?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's out of scope of the goal of the interface. Wiring the manager in will happen when starting the provider:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are saying the
ctx
passed bycluster.Provider
when callingEngage
on the manager needs to be stored by the manager and re-used when callingEngage
on any multi-cluster runnable which in turn needs to use it to control the lifecycle of the source? What is the point of havingDisengage
then?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, and we can do without
Disengage
. I will try to change the prototype implementation to eliminate it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to imply that the context used to call Start() on a Source will be used to stop the Source?
I think stopping the Source is not possible by just canceling this context for our current source implementations (or at least some of them)
IIRC the only similar functionality that we have today is that cancelling the context passed into Cache.Start() will stop all informers.
I also wonder how this works when multiple controllers are sharing the same underlying informer. I think if a controller doesn't exclusively own an informer it also shouldn't just shut it down. Or is my assumption wrong that they usually would share informers? (like it works today when multiple controllers are sharing the same cache & underlying informers).
For additional context. If I apply what we currently do in Cluster API to this proposal, it would be the cluster provider that shuts down the cache and all underlying informers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am curious which you have in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least source.Kind. The ctx passed into Start can be used to cancel the start process (e.g. WaitForCacheSync) but not the informer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line with the other comments above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go by being explicit here, this was one of the main issues is that an empty string is also a default value. We can set a
cluster.Reference
to a specific value that's very specific, which in turns is used across the entire codebaseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we're saying
must
here butSHOULD
in the text right underneath theCluster-Aware Request
titleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? The cluster provider doesn't know anything about what request type controllers use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about the provider, but about controllers. We want to allow "uniform cluster support" with minimal changes to existing controllers. We don't control the controllers. These are 3rd-party controllers. Hence, we aim for local changes rather than changing many lines of source code. Generics usually are the later.
Hence:
The controllers cannot be generic as outlined above. Imagine they were:
This does not compile because the Go compiler resolves
.Namespace
against an arbitraryT any
. Hence, we have two ways out to make this compile:Name+Namespace() string
as a method. This is diverging from 99.9% of controllers today.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the confusion comes from that this sentence makes it sound like using another request type will make a controller only compatible with a subset of cluster.Provider implementations. I have some trouble seeing the connection.
I got your point though that a standard upstream ClusterAwareRequest is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why teach handlers to copy themselves rather than just layering this:
?
(name likely needs improvement but you get the idea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this was the attempt to keep existing function signatures stable while enabling them to be multi-cluster aware. A
HandlerConstructor
would probably need a new separate builder function to be passed as argument, so e.g. somethingWatches
vsClusterAwareWatches
(or whatever).I'm totally open to changing this if you prefer it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit too easy to mess up. What's stopping a
reconcile.ClusterAwareRequest
being used in the wrong place or vice-versa?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general:
reconcile.ClusterAwareRequest
could totally be used in a non-multi-cluster setup and it wouldn't change a thing since it embeds areconcile.Request
. If theClusterName
field is empty,req.ClusterName
would imply the "default cluster", which is the single-cluster use-case today (e.g. we changed thefleet
example in #3019 to have a flag that lets you toggle multi-cluster vs single-cluster usage).If you end up using
reconcile.Request
you would quickly notice that you don't have the cluster name to pass tomgr.GetCluster
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth pointing out that this can also be achieved by instantiating a controller per target cluster rather than adding/removing sources to/from an existing controller.
IMHO if you ever actually want to operate the resulting component, you likely want "create/remove controller" rather than "create/remove source", because otherwise a single problematic cluster can completely mess up the workqueue metrics and on-calls can't tell if one cluster has an issue or all, which is going to be a big difference in term of severity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be comfortable with a way to tell this design to either use a shared queue (e.g. start sources) or to start controllers with a config switch or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the "default" cluster we won't get events for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be configured by controller. We default to writing a uniform multi-cluster controller, i.e. one that only reacts to provider clusters. It is not common afaik to have the same semantics for both a local cluster (the hub usually) and provider clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This double binary distinction is pretty much guaranted to be too little. It would be better if we somehow tell the builder if this is a multi-cluster controller or not and then the
cluster.Provider
callsEngage
for all clusters that should be engaged and its up to the implementor of the provider if they want to include the cluster themanager
has a kubeconfig for or not.If this is insufficient, we need a way for the
cluster.Provider
to decide if a givenAware
runnable should be engaged or not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm not sure if it should be up to the provider. Let's say I implement a cluster provider for Cluster API. I think it shouldn't be my call if all or none of the controllers that are used with this provider also watch the hub cluster or not.
I could make this a configuration option of the cluster provider but if this doesn't work because we only have one cluster provider per manager and I think it's valid that only some controllers of a manager watch the hub cluster and others do not.
So I think it should be a per-controller decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it would make sense to just always call Engage and then the Engage function always can just do nothing if it doesn't want to engage a Cluster. This seems the most flexible option.
(If necessary Engage could have a bool return parameter signalling if the controller actually engaged a cluster or not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc is completely missing info on:
Engage
/Disengage
in theController
) - We are not expecting users to do that, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See implementation proposal #3019.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that something like cert-manager would decide on startup which cluster provider should be used and can then only work with one cluster provider at a time?
Phrased differently. Do we also want to support using multiple cluster providers at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mgr.GetCluster(name)
instead of accessing cluster methods directly).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - This is precicesly why
pkg/cluster
existsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about details why it is not possible to implement this outside of CR. (I got the points around adoption across the ecosystem, just wondering about the technical reasons)
We have implemented a component (called ClusterCache) in Cluster API that seems to come very close to what this design is trying to achieve (apart from that it is Cluster API specific of course). Especially since the generic support was added to CR.
Basically ClusterCache in CAPI:
xref: https://github.com/kubernetes-sigs/cluster-api/tree/main/controllers/clustercache
P.S. we are not creating multiple Cluster objects, instead we have our own simplified version that only contains what we need (https://github.com/kubernetes-sigs/cluster-api/blob/main/controllers/clustercache/cluster_accessor.go#L85-L89)
P.S.2. I don't understand the last two points in this list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main "blocker" for this is that the builder is pretty common in 3rdparty controller code. If we do all of this outside of CR, this will very likely mean a fork of
pkg/builder
. I think everything else in the implementation PR could be done outside of CR.