Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Add designs/multi-cluster.md #2746

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
376 changes: 376 additions & 0 deletions designs/multi-cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,376 @@
# Multi-Cluster Support

Author: @sttts @embik

Initial implementation: @vincepri

Last Updated on: 2025-01-07

## Table of Contents

<!--ts-->
- [Multi-Cluster Support](#multi-cluster-support)
- [Table of Contents](#table-of-contents)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Examples](#examples)
- [Non-Goals/Future Work](#non-goalsfuture-work)
- [Proposal](#proposal)
- [Multi-Cluster-Compatible Reconcilers](#multi-cluster-compatible-reconcilers)
- [User Stories](#user-stories)
- [Controller Author with no interest in multi-cluster wanting to old behaviour.](#controller-author-with-no-interest-in-multi-cluster-wanting-to-old-behaviour)
- [Multi-Cluster Integrator wanting to support cluster managers like Cluster-API or kind](#multi-cluster-integrator-wanting-to-support-cluster-managers-like-cluster-api-or-kind)
- [Multi-Cluster Integrator wanting to support apiservers with logical cluster (like kcp)](#multi-cluster-integrator-wanting-to-support-apiservers-with-logical-cluster-like-kcp)
- [Controller Author without self-interest in multi-cluster, but open for adoption in multi-cluster setups](#controller-author-without-self-interest-in-multi-cluster-but-open-for-adoption-in-multi-cluster-setups)
- [Controller Author who wants to support certain multi-cluster setups](#controller-author-who-wants-to-support-certain-multi-cluster-setups)
- [Risks and Mitigations](#risks-and-mitigations)
- [Alternatives](#alternatives)
- [Implementation History](#implementation-history)

<!--te-->

## Summary

Controller-runtime today allows to write controllers against one cluster only.
Multi-cluster use-cases require the creation of multiple managers and/or cluster
objects. This proposal is about adding native support for multi-cluster use-cases
to controller-runtime.

With this change, it will be possible to implement pluggable cluster providers
that automatically start and stop sources (and thus, cluster-aware reconcilers) when
the cluster provider adds ("engages") or removes ("disengages") a cluster.

## Motivation

This change is important because:
- multi-cluster use-cases are becoming more and more common, compare projects
like Kamarda, Crossplane or kcp. They all need to write (controller-runtime)
sbueringer marked this conversation as resolved.
Show resolved Hide resolved
controllers that operate on multiple clusters.
- writing controllers for upper systems in a **portable** way is hard today.
Consequently, there is no multi-cluster controller ecosystem, but could and
should be.
- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a general note, for the purpose of this proposal we should focus on general controller runtime users. While we can keep kcp as a reference along other implementation. I'd rephrase the motivation at a high level "setup controllers and watches across multiple Kubernetes clusters in a transparent way"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we tried to cover the high-level part with the first bullet point. The kcp controller-runtime fork is just mentioned to give personal motivation, but I don't think we have to mention it here if that is preferred.

because adding support on-top leads to an inefficient controller design, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leads to an inefficient controller design

Could you elaborate on this point?

Copy link
Author

@sttts sttts Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicitly don't want one controller (with its own workqueue) per cluster.

Example: forget about workspaces. Imagine controller-runtime only supported controllers per one (!) namespace, i.e. another controller with another namespace for every namespace you want to serve. Same argument here, just a level higher.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And independently you could imagine cases where the same is true e.g. for cluster-api cases where the workqueue should be shared. That's what this enhancement is enabling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a decision that shouldn't be forced onto customers. I can see the case where a workqueue per cluster is desired as it provides some built in fairness

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicitly don't want one controller (with its own workqueue) per cluster.

Others might disagree. This question needs a proper evaluation with pro/cons of both approaches rather than jumping to conclusions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that both topologies can have their place. Am not even sure pro/con is helpful. We shouldn't be opinionated, but give the primitives for the developer to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying my comment from here since this is the bigger thread:

Would you be comfortable with a way to tell this design to either use a shared queue (e.g. start sources) or to start controllers with a config switch [in the manager] or similar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that both topologies can have their place. Am not even sure pro/con is helpful. We shouldn't be opinionated, but give the primitives for the developer to decide.

A stated goal of this doc is to avoid divergence in the ecosystem, writing that down while at the same time handwaving away comments about this not actually being a good approach and saying we shouldn't be opinionated is not particularly convincing.

Our goal is be make the majority use case simple and other use-cases possible. This is not possible if we refuse to even look into the question what the majority use-case is and default to assuming that the use-case of the author of a design must be the majority use-case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I didn't mean we shouldn't think about which workqueue topology is useful when. I meant that there are good reasons for a joint workqueue in some situations (like when I want to throttle all reconciles in a process because that throughput it limited), and independent ones in other situations (like when e.g. writes to a cluster are the limited factor).

I played with a TypedFair queue:

// Fair is a queue that ensures items are dequeued fairly across different
// fairness keys while maintaining FIFO order within each key.
type Fair TypedFair[any]

// FairnessKeyFunc is a function that returns a string key for a given item.
// Items with different keys are dequeued fairly.
type FairnessKeyFunc[T comparable] func(T) string

// NewFair creates a new Fair instance.
func NewFair(keyFunc FairnessKeyFunc[any]) *Fair {
	return (*Fair)(NewTypedFair[any](keyFunc))
}

that could be plugged in here, wrapped by throttling and delays.

even more important leads of divergence in the ecosystem.

### Goals

- Allow 3rd-parties to implement an (optional) multi-cluster provider Go interface that controller-runtime will use (if configured on the manager) to dynamically attach and detach registered controllers to clusters that come and go.
- With that, provide a way to natively write controllers for these patterns:
1. (UNIFORM MULTI-CLUSTER CONTROLLERS) operate on multiple clusters in a uniform way,
i.e. reconciling the same resources on multiple clusters, **optionally**
- sourcing information from one central hub cluster
- sourcing information cross-cluster.

Example: distributed `ReplicaSet` controller, reconciling `ReplicaSets` on multiple clusters.
2. (AGGREGATING MULTI-CLUSTER CONTROLLERS) operate on one central hub cluster aggregating information from multiple clusters.

Example: distributed `Deployment` controller, aggregating `ReplicaSets` across multiple clusters back into a central `Deployment` object.

#### Low-Level Requirements

- Allow event sources to be cross-cluster such that:
1. Multi-cluster events can trigger reconciliation in the one central hub cluster.
2. Central hub cluster events can trigger reconciliation on multiple clusters.
- Allow reconcilers to look up objects through (informer) indexes from specific other clusters.
- Minimize the amount of changes to make a controller-runtime controller
multi-cluster-compatible, in a way that 3rd-party projects have no reason to
object to these kind of changes.

Here we call a controller to be multi-cluster-compatible if the reconcilers get
reconcile requests in cluster `X` and do all reconciliation in cluster `X`. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "Start/Stop a controller for each cluster" is out of scope, this is purely about "Add/Remove sources to/from controller on cluster arrival/departure"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to "one workqueue" I guess. Start/Stop means another workqueue, which we don't want.

is less than being multi-cluster-aware, where reconcilers implement cross-cluster
logic.

### Examples

- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled.
- Run a controller-runtime controller against cluster managers like kind, Cluster API, Open-Cluster-Manager or Hypershift.
- Run a controller-runtime controller against a kcp shard with a wildcard watch.
Comment on lines +88 to +90
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to focus the first iteration of this proposal to how Kubernetes works today? Adding uncommon use cases at this point in time increase overall complexity of the implementation. Other use cases should be pluggable imo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agree things should be highly pluggable, that's why this is just an (incomplete) list of things that you could eventually plug in. Agreed that kcp is an uncommon use case, but so far we've made (recent) design decisions with Kubernetes clusters in general in mind. The "kcp" provider we'd like to built is itself far from ready yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't be helpful for us though. I don't think the design now is influenced much by the kcp requirements, maybe with exception of the shared workqueue. Other than that the fleet-namespace example (which kind of reflects the kcp requirements) shows that the kcp use-case can be covered by a pretty generic design.


### Non-Goals/Future Work

- Ship integration for different multi-cluster setups. This should become
out-of-tree subprojects that can individually evolve and vendor'ed by controller authors.
- Make controller-runtime controllers "binary pluggable".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "binary pluggable" mean in this context?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like https://pkg.go.dev/plugin to dynamically load providers.

- Manage one manager per cluster.
- Manage one controller per cluster with dedicated workqueues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Manage one controller per cluster with dedicated workqueues.

This should be a goal


## Proposal

The `ctrl.Manager` _SHOULD_ be extended to get an optional `cluster.Provider` via
`ctrl.Options`, implementing:

```golang
// pkg/cluster

// Provider defines methods to retrieve clusters by name. The provider is
// responsible for discovering and managing the lifecycle of each cluster,
// and to engage or disengage clusters with the manager the provider is
// run against.
//
// Example: A Cluster API provider would be responsible for discovering and
// managing clusters that are backed by Cluster API resources, which can live
// in multiple namespaces in a single management cluster.
type Provider interface {
// Get returns a cluster for the given identifying cluster name. Get
// returns an existing cluster if it has been created before.
Get(ctx context.Context, clusterName string) (Cluster, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second argument should probably be a typed reference, like we have for ObjectReference, even if it contains a single Name field, it would help with expanding it later, wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking of logical.Name here or more a struct?

}
```

A cluster provider is responsible for constructing `cluster.Cluster` instances and returning
upon calls to `Get(ctx, clusterName)`. Providers should keep track of created clusters and
return them again if the same name is requested. Since providers are responsible for constructing
the `cluster.Cluster` instance, they can make decisions about e.g. reusing existing informers.

The `cluster.Cluster` _SHOULD_ be extended with a unique name identifier:

```golang
// pkg/cluster:
type Cluster interface {
Name() string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this Name() shoudl be converted to something like SelfRef or similar

...
}
```

A new interface for cluster-aware runnables will be provided:

```golang
// pkg/cluster
type Aware interface {
// Engage gets called when the component should start operations for the given Cluster.
// The given context is tied to the Cluster's lifecycle and will be cancelled when the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would a context be selected from the kubeconfig passed to the controller?

Copy link
Member

@embik embik Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid any confusion, this description is talking about a context.Context, not a kubeconfig context.

But in general: This would be something to implement in a Provider, a kubeconfig provider could be a very simple implementation (although the focus is on dynamic providers, and a kubeconfig would probably be fairly static).

How that provider translates the clusterName parameter in the Get method (see the Provider interface above) to a kubeconfig context would be up to the implementation, but I could see the context name being the identifier here (since Get returns a cluster.Cluster, we need credentials for the embedded client, so a context makes a lot of sense here).

Does that make sense? 🤔

// Cluster is removed or an error occurs.
//
// Implementers should return an error if they cannot start operations for the given Cluster,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its non-blocking, how is the controller supposed to surface errors here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that anything that needs to be done for starting operations on a cluster is "blocking" (and thus would return an error), but the operations on the engaged cluster themselves are not blocking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example?

I have some trouble understanding the difference between "anything that needs to be done for starting operations on a cluster" vs "operations on the engaged cluster" that both seems to be done or aysnchronously triggered (maybe?) in the Engage func

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things that happen in the prototype implementation (#3019) for the typedMultiClusterController is starting the new watches on the newly engaged cluster. So if that fails, Engage returns an error, but it's not blocking for processing items.

// and should ensure this operation is re-entrant and non-blocking.
//
// \_________________|)____.---'--`---.____
// || \----.________.----/
// || / / `--'
// __||____/ /_
// |___ \
// `--------'
Engage(context.Context, Cluster) error

// Disengage gets called when the component should stop operations for the given Cluster.
Disengage(context.Context, Cluster) error
}
```

`ctrl.Manager` will implement `cluster.Aware`. As specified in the `Provider` interface,
it is the cluster provider's responsibility to call `Engage` and `Disengage` on a `ctrl.Manager`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that mean the current interface specification for cluster.Provider is insufficient, as this entails that the cluster.Provider needs a reference to the manager?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's out of scope of the goal of the interface. Wiring the manager in will happen when starting the provider:

prov := NewProvider()
mgr := NewManager(..., Options: {provider: prov})
go mgr.Start(ctx)
go prov.Start(ctx, mgr)

instance when clusters join or leave the set of target clusters that should be reconciled.

The internal `ctrl.Manager` implementation in turn will call `Engage` and `Disengage` on all
its runnables that are cluster-aware (i.e. that implement the `cluster.Aware` interface).

In particular, cluster-aware controllers implement the `cluster.Aware` interface and are
responsible for starting watches on clusters when they are engaged. This is expressed through
the interface below:

```golang
// pkg/controller
type TypedMultiClusterController[request comparable] interface {
cluster.Aware
TypedController[request]
}
```

The multi-cluster controller implementation reacts to engaged clusters by starting
a new `TypedSyncingSource` that also wraps the context passed down from the call to `Engage`,
which _MUST_ be canceled by the cluster provider at the end of a cluster's lifecycle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are saying the ctx passed by cluster.Provider when calling Engage on the manager needs to be stored by the manager and re-used when calling Engage on any multi-cluster runnable which in turn needs to use it to control the lifecycle of the source? What is the point of having Disengage then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, and we can do without Disengage. I will try to change the prototype implementation to eliminate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply that the context used to call Start() on a Source will be used to stop the Source?

I think stopping the Source is not possible by just canceling this context for our current source implementations (or at least some of them)

IIRC the only similar functionality that we have today is that cancelling the context passed into Cache.Start() will stop all informers.

I also wonder how this works when multiple controllers are sharing the same underlying informer. I think if a controller doesn't exclusively own an informer it also shouldn't just shut it down. Or is my assumption wrong that they usually would share informers? (like it works today when multiple controllers are sharing the same cache & underlying informers).

For additional context. If I apply what we currently do in Cluster API to this proposal, it would be the cluster provider that shuts down the cache and all underlying informers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think stopping the Source is not possible by just canceling this context for our current source implementations (or at least some of them)

Am curious which you have in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least source.Kind. The ctx passed into Start can be used to cancel the start process (e.g. WaitForCacheSync) but not the informer


The `ctrl.Manager` _SHOULD_ be extended by a `cluster.Cluster` getter:

```golang
// pkg/manager
type Manager interface {
// ...
GetCluster(ctx context.Context, clusterName string) (cluster.Cluster, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GetCluster(ctx context.Context, clusterName string) (cluster.Cluster, error)
GetCluster(ctx context.Context, ref cluster.Reference) (cluster.Cluster, error)

In line with the other comments above

}
```

The embedded `cluster.Cluster` corresponds to `GetCluster(ctx, "")`. We call the
clusters with non-empty name "provider clusters" or "enganged clusters", while
the embedded cluster of the manager is called the "default cluster" or "hub
cluster".
Comment on lines +196 to +199
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go by being explicit here, this was one of the main issues is that an empty string is also a default value. We can set a cluster.Reference to a specific value that's very specific, which in turns is used across the entire codebase


### Cluster-Aware Request

To provide information about the source cluster of a request, a new type
`reconcile.ClusterAwareRequest` _SHOULD_ be added:

```golang
// pkg/reconcile
type ClusterAwareRequest struct {
Request
ClusterName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ClusterName string
Cluster cluster.Reference

}
```

This struct embeds a `reconcile.Request` to store the "usual" information (name and namespace)
about an object, plus the name of the originating cluster.

Given that an empty cluster name represents the "default cluster", a `reconcile.ClusterAwareRequest`
can be used as `request` type even for controllers that do not have an active cluster provider.
The cluster name will simply be an empty string, which is compatible with calls to `mgr.GetCluster`.

**Note:** controller-runtime must provide this cluster-aware request type to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we're saying must here but SHOULD in the text right underneath the Cluster-Aware Request title

allow writing *uniform* multi-cluster controllers (see goals), i.e. controllers
that both work as single-cluster and multi-cluster controllers against arbitrary
cluster providers. Think of generic CNCF projects like cert-manager wanting to
support multi-cluster setups generically without forking the codebase.

### BYO Request Type

Instead of using the new `reconcile.ClusterAwareRequest`, implementations _CAN_ also bring their
own request type through the generics support in `Typed*` types (`request comparable`).

**Note:** these kind of controllers won't be uniform in the sense of compatibility
with arbitrary cluster providers, but for use-cases that are tiedly integrated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? The cluster provider doesn't know anything about what request type controllers use?

Copy link
Author

@sttts sttts Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster provider doesn't know anything about what request type controllers use?

This is not about the provider, but about controllers. We want to allow "uniform cluster support" with minimal changes to existing controllers. We don't control the controllers. These are 3rd-party controllers. Hence, we aim for local changes rather than changing many lines of source code. Generics usually are the later.

Hence:

The controllers cannot be generic as outlined above. Imagine they were:

Reconcile(ctx context.Context, req T) (reconcile.Result, error) {
  existing code ... req.Namespace ... existing code
}

This does not compile because the Go compiler resolves .Namespace against an arbitrary T any. Hence, we have two ways out to make this compile:

  1. use some interface for T that provides Name+Namespace() string as a method. This is diverging from 99.9% of controllers today.
  2. have a struct that is common aka upstream, and with that has least resistence to be used in practice.

Copy link
Member

@sbueringer sbueringer Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion comes from that this sentence makes it sound like using another request type will make a controller only compatible with a subset of cluster.Provider implementations. I have some trouble seeing the connection.

I got your point though that a standard upstream ClusterAwareRequest is needed.

with specific cluster providers, this might be useful.

Optionally, a passed `TypedEventHandler` will be duplicated per engaged cluster if they
fullfil the following interface:

```golang
// pkg/handler
type TypedDeepCopyableEventHandler[object any, request comparable] interface {
TypedEventHandler[object, request]
DeepCopyFor(c cluster.Cluster) TypedDeepCopyableEventHandler[object, request]
Comment on lines +241 to +243
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why teach handlers to copy themselves rather than just layering this:

type HandlerConstructor[object any, request comparable] func(cluster.Cluster) TypedHandler[object, request]

?
(name likely needs improvement but you get the idea)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this was the attempt to keep existing function signatures stable while enabling them to be multi-cluster aware. A HandlerConstructor would probably need a new separate builder function to be passed as argument, so e.g. something Watches vs ClusterAwareWatches (or whatever).

I'm totally open to changing this if you prefer it.

}
```

This might be necessary if a BYO `TypedEventHandler` needs to store information about
the engaged cluster (e.g. because the events do not supply information about the cluster in
object annotations) that it has been started for.

### Multi-Cluster-Compatible Reconcilers

Reconcilers can be made multi-cluster-compatible by changing client and cache
accessing code from directly accessing `mgr.GetClient()` and `mgr.GetCache()` to
going through `mgr.GetCluster(req.ClusterName).GetClient()` and
`mgr.GetCluster(req.ClusterName).GetCache()`.

A typical snippet at the beginning of a reconciler to fetch the client could look like this:

```golang
cl, err := mgr.GetCluster(ctx, req.ClusterName)
if err != nil {
return reconcile.Result{}, err
}
client := cl.GetClient()
```

Due to `request.ClusterAwareRequest`, changes to the controller builder process are minimal:

```golang
// previous
builder.TypedControllerManagedBy[reconcile.Request](mgr).
Named("single-cluster-controller").
For(&corev1.Pod{}).
Complete(reconciler)

// new
builder.TypedControllerManagedBy[reconcile.ClusterAwareRequest](mgr).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too easy to mess up. What's stopping a reconcile.ClusterAwareRequest being used in the wrong place or vice-versa?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general: reconcile.ClusterAwareRequest could totally be used in a non-multi-cluster setup and it wouldn't change a thing since it embeds a reconcile.Request. If the ClusterName field is empty, req.ClusterName would imply the "default cluster", which is the single-cluster use-case today (e.g. we changed the fleet example in #3019 to have a flag that lets you toggle multi-cluster vs single-cluster usage).

If you end up using reconcile.Request you would quickly notice that you don't have the cluster name to pass to mgr.GetCluster.

Named("multi-cluster-controller").
For(&corev1.Pod{}).
Complete(reconciler)
```

The builder will chose the correct `EventHandler` implementation for both `For` and `Owns`
depending on the `request` type used.

With the described changes (use `GetCluster(ctx, req.ClusterName)`, making `reconciler`
a `TypedFunc[reconcile.ClusterAwareRequest]`) an existing controller will automatically act as
*uniform multi-cluster controller* if a cluster provider is configured.
It will reconcile resources from cluster `X` in cluster `X`.
Comment on lines +284 to +290
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth pointing out that this can also be achieved by instantiating a controller per target cluster rather than adding/removing sources to/from an existing controller.

IMHO if you ever actually want to operate the resulting component, you likely want "create/remove controller" rather than "create/remove source", because otherwise a single problematic cluster can completely mess up the workqueue metrics and on-calls can't tell if one cluster has an issue or all, which is going to be a big difference in term of severity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be comfortable with a way to tell this design to either use a shared queue (e.g. start sources) or to start controllers with a config switch or similar?


For a manager with `cluster.Provider`, the builder _SHOULD_ create a controller
that sources events **ONLY** from the provider clusters that got engaged with
the controller.
Comment on lines +292 to +294
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the "default" cluster we won't get events for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be configured by controller. We default to writing a uniform multi-cluster controller, i.e. one that only reacts to provider clusters. It is not common afaik to have the same semantics for both a local cluster (the hub usually) and provider clusters.


Controllers that should be triggered by events on the hub cluster can continue
to use `For` and `Owns` and explicitly pass the intention to engage only with the
"default" cluster (this is only necessary if a cluster provider is plugged in):

```golang
builder.NewControllerManagedBy(mgr).
WithOptions(controller.TypedOptions{
EngageWithDefaultCluster: ptr.To(true),
EngageWithProviderClusters: ptr.To(false),
Comment on lines +303 to +304
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This double binary distinction is pretty much guaranted to be too little. It would be better if we somehow tell the builder if this is a multi-cluster controller or not and then the cluster.Provider calls Engage for all clusters that should be engaged and its up to the implementor of the provider if they want to include the cluster the manager has a kubeconfig for or not.

If this is insufficient, we need a way for the cluster.Provider to decide if a given Aware runnable should be engaged or not.

Copy link
Member

@sbueringer sbueringer Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its up to the implementor of the provider

Hm not sure if it should be up to the provider. Let's say I implement a cluster provider for Cluster API. I think it shouldn't be my call if all or none of the controllers that are used with this provider also watch the hub cluster or not.

I could make this a configuration option of the cluster provider but if this doesn't work because we only have one cluster provider per manager and I think it's valid that only some controllers of a manager watch the hub cluster and others do not.

So I think it should be a per-controller decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would make sense to just always call Engage and then the Engage function always can just do nothing if it doesn't want to engage a Cluster. This seems the most flexible option.

(If necessary Engage could have a bool return parameter signalling if the controller actually engaged a cluster or not)

}).
For(&appsv1.Deployment{}).
Owns(&v1.ReplicaSet{}).
Complete(reconciler)
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is completely missing info on:

  • The actual implementation of a multi-cluster controller (i.E. Engage/Disengage in the Controller) - We are not expecting users to do that, right?
  • The same for source.Source, but arguably a subtask of the above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See implementation proposal #3019.

## User Stories

### Controller Author with no interest in multi-cluster wanting to old behaviour.

- Do nothing. Controller-runtime behaviour is unchanged.

### Multi-Cluster Integrator wanting to support cluster managers like Cluster API or kind

- Implement the `cluster.Provider` interface, either via polling of the cluster registry
or by watching objects in the hub cluster.
- For every new cluster create an instance of `cluster.Cluster` and call `mgr.Engage`.

### Multi-Cluster Integrator wanting to support apiservers with logical cluster (like kcp)

- Implement the `cluster.Provider` interface by watching the apiserver for logical cluster objects
(`LogicalCluster` CRD in kcp).
- Return a facade `cluster.Cluster` that scopes all operations (client, cache, indexers)
to the logical cluster, but backed by one physical `cluster.Cluster` resource.
- Add cross-cluster indexers to the physical `cluster.Cluster` object.

### Controller Author without self-interest in multi-cluster, but open for adoption in multi-cluster setups

- Replace `mgr.GetClient()` and `mgr.GetCache` with `mgr.GetCluster(req.ClusterName).GetClient()` and `mgr.GetCluster(req.ClusterName).GetCache()`.
- Make manager and controller plumbing vendor'able to allow plugging in multi-cluster provider and BYO request type.
Copy link
Member

@sbueringer sbueringer Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that something like cert-manager would decide on startup which cluster provider should be used and can then only work with one cluster provider at a time?

Phrased differently. Do we also want to support using multiple cluster providers at the same time?

Copy link
Author

@sttts sttts Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • One provider per manager at a time.
  • Low friction to make reconcilers uniform-multi-cluster capable (this basically means using the cluster-enabled request and to call mgr.GetCluster(name) instead of accessing cluster methods directly).
  • If a controller project wants to add support for a number of provider in their repository, this is fine, but not necessarily the goal.
  • Instead it should be easy to instantiate the controllers from an alternative main.go with a provider of your choice.


### Controller Author who wants to support certain multi-cluster setups

- Do the `GetCluster` plumbing as described above.
- Vendor 3rd-party multi-cluster providers and wire them up in `main.go`.

## Risks and Mitigations

- The standard behaviour of controller-runtime is unchanged for single-cluster controllers.
- The activation of the multi-cluster mode is through usage of a `request.ClusterAwareRequest` request type and
attaching the `cluster.Provider` to the manager. To make it clear that the semantics are experimental, we name
the `manager.Options` field `ExperimentalClusterProvider`.
- We only extend these interfaces and structs:
- `ctrl.Manager` with `GetCluster(ctx, clusterName string) (cluster.Cluster, error)` and `cluster.Aware`.
- `cluster.Cluster` with `Name() string`.
We think that the behaviour of these extensions is well understood and hence low risk.
Everything else behind the scenes is an implementation detail that can be changed
at any time.

## Alternatives

- Multi-cluster support could be built outside of core controller-runtime. This would
lead likely to a design with one manager per cluster. This has a number of problems:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - This is precicesly why pkg/cluster exists

Copy link
Member

@sbueringer sbueringer Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about details why it is not possible to implement this outside of CR. (I got the points around adoption across the ecosystem, just wondering about the technical reasons)

We have implemented a component (called ClusterCache) in Cluster API that seems to come very close to what this design is trying to achieve (apart from that it is Cluster API specific of course). Especially since the generic support was added to CR.

Basically ClusterCache in CAPI:

  • discovers Clusters
  • maintains a Cache per Cluster
  • allows retrieving Clients for a Cluster
  • allows adding Watches (kind Sources) for a Cluster
    • this also allows mapping events that we get from these sources back to the one controller with the one shared work queue

xref: https://github.com/kubernetes-sigs/cluster-api/tree/main/controllers/clustercache

P.S. we are not creating multiple Cluster objects, instead we have our own simplified version that only contains what we need (https://github.com/kubernetes-sigs/cluster-api/blob/main/controllers/clustercache/cluster_accessor.go#L85-L89)

P.S.2. I don't understand the last two points in this list

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main "blocker" for this is that the builder is pretty common in 3rdparty controller code. If we do all of this outside of CR, this will very likely mean a fork of pkg/builder. I think everything else in the implementation PR could be done outside of CR.

- only one manager can serve webhooks or metrics
- cluster management must be custom built
- logical cluster support would still require a fork of controller-runtime and
with that a divergence in the ecosystem. The reason is that logical clusters
require a shared workqueue because they share the same apiserver. So for
fair queueing, this needs deep integration into one manager.
- informers facades are not supported in today's cluster/cache implementation.
- We could deepcopy the builder instead of the sources and handlers. This would
lead to one controller and one workqueue per cluster. For the reason outlined
in the previous alternative, this is not desireable.

## Implementation History

- [PR #2207 by @vincepri : WIP: ✨ Cluster Provider and cluster-aware controllers](https://github.com/kubernetes-sigs/controller-runtime/pull/2207) – with extensive review
- [PR #2726 by @sttts replacing #2207: WIP: ✨ Cluster Provider and cluster-aware controllers](https://github.com/kubernetes-sigs/controller-runtime/pull/2726) –
picking up #2207, addressing lots of comments and extending the approach to what kcp needs, with a `fleet-namespace` example that demonstrates a similar setup as kcp with real logical clusters.
- [PR #3019 by @embik, replacing #2726: ✨ WIP: Cluster provider and cluster-aware controllers](https://github.com/kubernetes-sigs/controller-runtime/pull/3019) -
picking up #2726, reworking existing code to support the recent `Typed*` generic changes of the codebase.
- [github.com/kcp-dev/controller-runtime](https://github.com/kcp-dev/controller-runtime) – the kcp controller-runtime fork
Loading